290 likes | 415 Views
Subject access in Czechia. Marie.Balikova@nkp.cz. Outline. Knowledge Organization Systems Czech Subject Authority File (CZENAS) Conspectus Categorization Scheme (CCS) Uniform and Subject Information Gateways Topic Map of Library Collection CZENAS in Digital Collections at the NL CR
Subject access in Czechia Marie.Balikova@nkp.cz Cyfrowość bibliotek i archiwów Warszawa, 2009
Outline • Knowledge Organization Systems • Czech Subject Authority File (CZENAS) • Conspectus Categorization Scheme (CCS) • Uniform and Subject Information Gateways • Topic Map of Library Collection • CZENAS in Digital Collections at the NL CR • DL in European context Cyfrowość bibliotek i archiwów Warszawa, 2009
Traditional KOS • represent the lists of words and phrases, or notation symbols organized according to explicit rules with different level of hierarchical structure (from very limited to highly sophisticated) • are used to tag units of information so that they may be more easily retrieved by a search • solve the problems of homographs, synonyms and polysemes • ensure consistency when the same concept can be given different names • consistency of terms is one of the most important aspects in organization and management of information • controlled vocabularies and classification schemes are meant for human users Cyfrowość bibliotek i archiwów Warszawa, 2009
Non traditional KOS Ontologies as a form of knowledge representation, are defined as a systematic account of existence, a specification of a conceptualization • describe concepts and relationships in programmatic ways and enable arbitrary relationships • represent Knowledge Organization Systems which try to capture and describe the real world entities and relationshipsin „mashine“ readable and understandable manner Other searching systems and techniques the user wants not whole documents but brief answers to specific questions: How old is the President? When did Jan Hus die?What is the anthem of European Union? • answering short questions becomes a problem of finding the best combination • of word-level information retrieval (IR) and • syntactic/semantic-level natural language processing (NLP) techniques Cyfrowość bibliotek i archiwów Warszawa, 2009
Participants in the project TRUST Multilingual Semantic and Cognitive Search Engine for Text Retrieval Using Semantic Technologies IST-1999-56416 Question-answering method is used Monolingual Multilingual You could search in: French, Italian, Polish, Portuguese M-CAST: + two languages: English Czech • M-CAST • Question – answering method • M-CAST answer – block of answers • Exact, direct answer • Snippet • Visualization of resource page • Prototype of the system Cyfrowość bibliotek i archiwów Warszawa, 2009
What is the anthem of European Union? Question in English Answer in Polish: „Ody do radości“ Cyfrowość bibliotek i archiwów Warszawa, 2009
Quel est le drapeau de l'Union Européenne?What is the flag of the European Union? Question in French, Answer in Portuguese: „círculo de doze estrelas douradas sobre fundo azul“ Cyfrowość bibliotek i archiwów Warszawa, 2009
Czech National Subject Authority File • a structured controlled vocabulary identifying the basic semantic relationships (equivalence, hierarchical and associative) between terms in natural language that is designed for both, post-coordination and pre-coordination • an integrated indexing and retrieval tool in which verbal (controlled) terms are being linked to the UDC equivalent notations and English terms • a standardized system of controlled terms which could serve the needs of professionals (cataloguers, indexers) and non-professionals, e.g. web content creators as well • to offer them an organizing tool not only to retrieve material, but to tag material as well • non-professionals would like to use a standardized indexing and retrieval tool • but simple in structure (like Dublin Core format), in syntax (descriptor-type system), and with up-to-date terminology Cyfrowość bibliotek i archiwów Warszawa, 2009
UDC - a complementary tool, mapping process Universal Decimal Classification • covers all subjects • provides context to search terms • supports interoperability between information systems • enables • browsing and navigation • broadening and narrowing searches • multilingual access to collections • language independent coding Mapping process between Czech verbal expressions and UDC numbers is being done intellectually • candidates of controlled terms are chosen with document in hand (from bottom up) • in order to suggest terms as specific as needed (not as specific as possible) • single or complex UDC numbers (pre-combined) are linked, • English equivalents of preferred terms are added Cyfrowość bibliotek i archiwów Warszawa, 2009
Structure of authority record Individual entities Link to Wikipedia provides additional information Cyfrowość bibliotek i archiwów Warszawa, 2009
Example of the application of geographic coordinates in authority record for places Cyfrowość bibliotek i archiwów Warszawa, 2009
Example of UDC index of formal descriptors in both Czech and English languages Cyfrowość bibliotek i archiwów Warszawa, 2009
Conspectus Categorization Scheme – concordance tables between UDC and DDC; three hierarchical levels, 1.- 24 Conspectus divisions, 2.- 584 Conspectus categories, 3.- topical authority terms Cyfrowość bibliotek i archiwów Warszawa, 2009
Uniform Information Gateway (UIG) - nation wide portal which unifies access to on-line library services in Czechia Cyfrowość bibliotek i archiwów Warszawa, 2009
Topic map of library collections-an user-friendly subject access for inexperienced library users and for those who prefer to get information on documents location directly Cyfrowość bibliotek i archiwów Warszawa, 2009
Digital library Kramerius - more than 6 mil. of scanned pages. The goal: to digitise and make accessible the periodicals and monographscomprising the national cultural heritage Cyfrowość bibliotek i archiwów Warszawa, 2009
The metadata of digitised documents in the catalogue of NLCR contain subject access points integratedin CzechSubject Authority File, the metadata which form part of the Kramerius digital library contain UDC codes only Cyfrowość bibliotek i archiwów Warszawa, 2009
Digital archive of Czech web resources which are collected with the aim of their long-term preservationConspectus categories scheme (in Czech version only) Cyfrowość bibliotek i archiwów Warszawa, 2009
Example of subject access data in full level record of WebArchiv digital collection with hyperlink to the original web page. For thisthe special Agreement is necessary Cyfrowość bibliotek i archiwów Warszawa, 2009
Manuscriptorium -system for collecting and making accessible on the internet information on historical book resources, linked to a virtual library of digitised documents. Searches in Manuscriptorium database: provided by a variety of access points, like country, settlement, repository, author,place of origin, etc Cyfrowość bibliotek i archiwów Warszawa, 2009
The European Library is a free service that offers access to the resources of the 48 national libraries of Europe in 35 languages Resources digital: books, posters, maps, sound recordings, videos, etc.; bibliographical Quality and reliability are guaranteed collaborating national libraries of Europe Cyfrowość bibliotek i archiwów Warszawa, 2009
Query expansion by synonyms (variant form) Search term: lesní moudrost Preferred form: woodcraft Cyfrowość bibliotek i archiwów Warszawa, 2009
Europeana - portal ofEuropean memory institutions collectionsDescriptive and subject access points - author, place, date of creation,etc. -are added by cooperating institutions. Cyfrowość bibliotek i archiwów Warszawa, 2009
Czech Digital Library -concept covering digitisation, long-term preservation of and access to the entire national cultural heritage in digital form.The National Digital Library covers an important part of national cultural heritage and operates in the broader context of the Czech Digital Library Cyfrowość bibliotek i archiwów Warszawa, 2009
Conclusion • Traditional Knowledge Organization Systems used in Czechiacan be applied in organization and management of both traditional and digital memory institutions collections. • The controlled indexing languages (thesauri, subject heading systems) and classification systems are still very useful, even necessary. • They serve both to professionals (specialists for knowledge organisation systems, librarians, archivists, and curators), and end-users as well. • They are important for development of KOS based on semantic technologies and ontologiesand support standardisation and harmonisation of terminology in specific domains Cyfrowość bibliotek i archiwów Warszawa, 2009
Thank you! Cyfrowość bibliotek i archiwów Warszawa, 2009