250 likes | 271 Views
The YAZ Toolkit. Sebastian Hammer Index Data - Information Retrieval consultants. Why Toolkits?. Z39.50 is a machine-to-machine protocol Good for software Not friendly for people Toolkits give you a programming interface They hide encoding, network layer, error handling
E N D
The YAZ Toolkit • Sebastian Hammer • Index Data - Information Retrieval consultants
Why Toolkits? • Z39.50 is a machine-to-machine protocol • Good for software • Not friendly for people • Toolkits give you a programming interface • They hide encoding, network layer, error handling • Leave you to concentrate on your application
YAZ • Z39.50 toolkit for Unix, Windows, etc. • ANSI-C, Portable, compact, easy to use • Widely used - constant feedback • Unlimited use license • Optional commercial support available
Implementing Z39.50 • Building a server • Testing using third party clients • Building your own client
The Z39.50 Server • Low-level YAZ API. Full access to Z39.50 protocol • Generic front-end server • Z39.50 server application • Simple high-level API to your ”back-end” • Standard daemon under Unix - multithreaded NT Service
Generic Front-end Server API • #include <backend.h> • bend_initialize • bend_search • bend_fetch • bend_close • (bend_scan, etc.)
bend_initialize • Verify authentication • Initialize instance of database back-end • Create session handle for private use • Return association accept/reject
bend_close • Close database back-end instance (if relevant) • Release memory
bend_search • Parameters • Database name • Result set name • Query • Result • Status • Number of hits
bend_search bend_searchresult*bend_search( void *handle, bend_searchrequest *r);
bend_searchrequest char *setname; int replace_set; int num_bases; char **basenames; Z_Query *query; ODR stream;
Z_Query • Representation of Type-1 (RPN) query in C datatypes • Dangers: • Processing recursive data structure - ”easy”. • Interpreting the query honestly - HARD!
RPN Processing Guidelines • Check for unsupported queries carefully • Attribute types/values • Operators • Operand types • etc. • Process attributes strictly • Provide good defaults for missing attributes
Query Processing Thoughts • It is better to fail a query than to return incorrect results • Look at all attribute types/values. Reject unknown attributes • Set up clients so only good attributes are sent
Memory Management Built-in memory pool manager • Allocation of protocol package elements • temporary memory for request processing • Memory recycled automatically
bend_fetch • Parameters • Result set name • Offset • Result • Status • Record
Retrieval Records • What record syntaxes do you need to support? • MARC = ISO2709 • SUTRS - Simple text format • GRS-1 - structured records • SGML/XML?
The Z39.50 Client • Challenge: Integration with existing User interface / event paradigm • YAZ: C representation of Z39.50 protocol packages • Comstack API for exchanging protocol packages
Z39.50 ASN.1 AttributesPlusTerm ::= SEQUENCE { attributeList SEQUENCE OF AttributeElement; term Term; };
Z39.50 ASN.1 in C typedef struct Z_AttributesPlusTerm { int num_attributes; Z_AttributeElement **attributeList; Z_Term *term; } Z_AttributesPlusTerm;
Comstack • Abstraction over transport service layer • Simplifies exchange of BER-encoded packages • Allows portability over transport stacks • You supply event-handling (”select”)
ZAP - WWW/Z39.50 Gateway Environment • User interface - HTML forms • Results - based on templates containing bits of HTML • Quick prototyping • High performance under Apache webserver
Other Options - scripting • IrTcl - Z39.50 package for TCL(TK) • High-level scripting environment • Incremental prototyping • Build platform-independent GUI clients or WWW gateways
News? • Protocol encoders now ASN.1 compiler-generated • Easy to add or switch to new ASN.1 (eg. ILL) • Protocol package pretty-printing • Threadproof. Solid Windows port • Optional commercial support
Where is it? www.indexdata.dk