1 / 42

Building a large-scale digital library for education

Building a large-scale digital library for education. Carl Lagoze CS502 April 30, 2003. What is the NSDL?. A library of exemplary collections and services with practical educational value A center of innovation in digital libraries applied to education

liluye
Download Presentation

Building a large-scale digital library for education

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a large-scale digital library for education Carl Lagoze CS502 April 30, 2003

  2. What is the NSDL? • A library of exemplary collections and services with practical educational value • A center of innovation in digital libraries applied to education • A community center, focused on digital-library-enabled science education • A network of NSDL-funded projects

  3. browsing annotating searching Open Access Web NSF-funded Collections Publishers filtering quality rating curriculum building Building service, collaboration, and knowledge layers over a variety of resources for a variety of users

  4. Short History of the NSDL 1996 Vision articulated by NSF's Division of Undergraduate Education • 1997 National Research Council workshop • 1998 Preliminary grants through Digital Libraries Initiative 2 • 1998 SMETE-Lib workshop • 1999 NSDL Solicitation • 2000 6 Core Integration demonstration projects + 23 others funded • 2001 1 large Core Integration System project funded • More than 80 independent projects funded • Core Integration funding fixed until 2006

  5. NSF Grant Structurehttp://www.nsf.gov/pubs/2002/nsf02054/nsf02054.html • Collections • Develop and maintain content • Services • For users, collection providers, core integration • Targeted research • Core Integration • Organizational, economic, technical • $US5M of total $US25M total budget

  6. NSDL CI Technical Organization • A collaborative project University Corporation for Atmospheric Research - Dave Fulker Cornell University - William Arms Columbia University - Kate Wittenberg • With additional partners Eastern Michigan University Syracuse University U Mass-Amherst UC-Santa Barbara San Diego Supercomputer Center • Director of Technology - Carl Lagoze

  7. Core Integration Philosophy It is possible to build a very large digital library with a small staff. But ... • Every aspect of the library must be planned with scalability in mind. • Some compromises will be made. • Automation is key.

  8. Perspective on the Budget

  9. Resources for Core Integration Core Integration Budget $4-6 million Staff 25 - 30 Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

  10. NSDL technical mantras • Aggregation rather than collection • Core integration team will not manage any collections • Spectrum of interoperability • Accommodate diversity of participation models • Open interfaces and standards permitting plug in of array of value-added services • One library many portals • Accommodate multiple quality and selection metrics • Tailor presentation of content and nature of services to audience needs • Open toolkit of software and services for library building

  11. Spectrum of interoperability Level Agreements Example Federation Strict use of standards AACR, MARC (syntax, semantic, Z 39.50 and business) Harvesting Digital libraries expose Open Archives metadata; simple metadata harvesting protocol and registry Gathering Digital libraries do not Web crawlers cooperate; services must and search engines seek out information

  12. Translating to first release goals • This is a big task that no one has done before! • Work on the priorities • Focus on one point on spectrum of interoperability • Metadata harvesting • Incorporate NSF funded collections and selected other collections • Leverage existing (or at least emerging) technologies and protocols • OAI, uPortal, Shibboleth, SDLIP, InQuery • Provide reliable base level services • Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence • Plant some seeds for the future • Machine-assisted metadata generation • Automated collection aggregation • Web gathering strategies

  13. Metadata Repository • Central storage of all metadata about all resources in the NSDL • Defines the extent of NSDL collection • Metadata includes collections, items, annotations, etc. • MR main functions • Aggregation • Normalization • redistribution • Ingest of metadata by various means • Harvesting, manual, automatic, cross-walking • Open access to MR contents for service builders via OAI-PMH

  14. Metadata Strategy • Collect and redistribute any native (XML) metadata format • Provide crosswalks to Dublin Core from eight standard formats • Dublin Core, DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD • Concentrate on collection-level metadata • Use automatic generation to augment item-level metadata

  15. Cleanup and crosswalks Harvest Database load Metadata Repository Staging area Collections Importing metadata into the MR

  16. Exporting metadata from the MR

  17. Metadata Triage

  18. Searching What to Index? When possible, full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing). Comprehensive metadata is an alternative, but available for very few of the materials. What Architecture to Use? Few collections support an established search protocol (e.g., Z39.50)

  19. Search system general features • Implement a query language that includes most features that are common in commercial and Web search engines. • Periodically harvest the MR (via OAI-PMH) to incorporate the latest changes in the library. • Allow search on resources’ metadata as well as textual content, when available. • Communication with portals is done via the Simple Digital Library Interoperability Protocol (SDLIP).

  20. Portal Portal Portal Content Search Architecture Metadata Repository Search and Discovery Server OAI OAI Harvester “Document” generator SDLIP Wrapper http/ftp Harvester Search Engine SDLIP http/ftp

  21. Persistent Archive for the NSDL • Provide a persistent copy of the resources identified in the NSDL repository • Provide a mechanism to retrieve prior versions of resources • Verify availability of on-line digital resources that have presence in MR

  22. Persistent Archive Approach • Use data grid technology to: • Implement a persistent logical name space for registering resources • Manage archiving of modules on distributed storage systems • Use OAI harvesting to extract metadata from the NSDL repository • Crawl the web to retrieve resources • Provide OAI interface for reporting validation results • Manage the persistent archive through a separate information repository

  23. Access Management • Authentication: user identity established by origin servers at home institution—NSDL central will run an origin if no other home available • Authorization: access classes of users, collections, & services established by NSDL community • anonymous and pseudo-anonymous access available • Internet2 “Shibboleth” framework satisfies these requirements

  24. Access Management Flow 1. attempt to access collection 2. redirected back to local login browser collection 4. attempt access again 3. login to local jurisdiction organizational boundary institution’s authentication and authorization service (e.g., Kerberos & LDAP) 5. confirm request valid

  25. User Interfaces The Problem Cannot handcraft every web page Must be usable on a very wide range of equipment and with a very diverse group of users The Solution Data driven portals using channels (components that encapsulate a library function). Current NSDL portal technology is uPortal, a free, shareable portal being developed by a college and university consortium. Initial NSDL channels will include simple and advanced Search, Browse, News, Exhibits, Help, and Login/Registration.

  26. Demonstration http://nsdl.org

  27. We have only just begun… • Funding through 2006 • Provide infrastructure that both: • Advances state-of-the-art of digital libraries • Reliably delivers services and resources to targeted users • Making this possible through • Integration of work of partners (NSDL and external) • Co-development with partners • Internal development

  28. Long-term technical capabilities:Facilities for Collaboration • All users can contribute resources to the library • Collections (favorites), value added enhancements (curricula), original contributions • Community formation, long and short term • Persistence of results of community formation

  29. Long-term technical capabilities:Management of Entities • Resources • Services • Relationships • Users

  30. Long-term technical capabilities:Discovery of Entities • Capabilities for humans and agents • Searching through structured queries • Browsing of indexes, vocabularies, classifications

  31. Long-term technical capabilities:Relationship Management • Relationships are first-class objects • Annotations, collections, equivalence, inclusion • Facilities • Identification • Discovery • Persistence • Evolution • Relationships of relationships

  32. NSDL MetaWeb

  33. Long-term technical capabilities:Knowledge layered on data • Ontologies, classification schemes, taxonomies, standards, and authority lists • Organize resources within concept spaces • Cross-walk and establish relationships among concept spaces

  34. Long-term technical capabilities:Control of entities • Access management for controlling the dissemination of intellectual property. • Mechanisms controlling disclosure of information with the goal of protecting privacy (i.e. COPPA) • Mechanisms for limiting inappropriate actions and entities

  35. Long-term technical capabilities:Customization and Personalization • Portals that provide specialized user interfaces and aggregation of collections and services in the library. • Mechanisms for users and communities to specialize their library experience. • Mechanisms to automatically adapt library behavior to user needs and abilities.

  36. Long-term technical capabilities:Accessibility • Platform • Connectivity • Physical Ability • Language

  37. Long-term technical capabilities:Measurement • Usage of the main NSDL portal and supported portals. • Performance of core services and network connections. • Popularity of various resources. • Reliability of access to various resources. • Data and metadata quality. • User demographics (where possible)

  38. Realizing Goals and Capabilities:Building & supporting infrastructure • Maintain and evolve the metadata repository • Maintain and evolve the main portal • Define, disseminate and support a service integration architecture • Develop, integrate, support core services: • Search and discovery • Persistence • Metadata and data normalization & enhancement • Authentication • Annotation • Resource access

  39. Realizing Goals and Capabilities:Defining and building exemplars • General theme: collaborative spaces for specialized communities, disciplines, resources • Motivations: • Develop real products meeting needs of real audiences • Extrapolate from special cases to general infrastructure • Build essential partnerships

  40. Realizing Goals and Capabilities:Defining and building exemplars • Primary life science education • Eisenhower National Clearinghouse • Undergraduate math education • Math Forum • Secondary geospatial education • Alexandria digital library

  41. How do we do this: • Constructing targeted portals/libraries • Primary life science education • Undergraduate mathematics education • Secondary geospatial education • To build generalized architecture • Collaborative spaces • Knowledge management • Automatic data and metadata management

  42. Some Closing Thoughts • Difficulty of building stability on shifting sands • What is low-barrier infrastructure? • Barriers to ‘simple’ OAI and Dublin Core have been relatively high • Multiple problems with metadata from distributed sources • Correctness • Trust • Information content • Resource granularity and identity • Automation is the key to success

More Related