1 / 10

Digital Object Storage and Retrieval (DOSR) Vision

Digital Object Storage and Retrieval (DOSR) Vision. Josh Alspector. Disclaimer. This presentation discusses areas of technology investigation and interest. It does not relate to any existing DARPA program, nor should it be inferred to anticipate a future DARPA program.

norah
Download Presentation

Digital Object Storage and Retrieval (DOSR) Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Object Storage and Retrieval(DOSR)Vision Josh Alspector Approved for Public Release, Distribution Unlimited

  2. Disclaimer This presentation discusses areas of technology investigation and interest. It does not relate to any existing DARPA program, nor should it be inferred to anticipate a future DARPA program. Approved for Public Release, Distribution Unlimited

  3. In 1910 Belgians Paul Otlet and future Nobel Peace Prize laureate Henri La Fontaine opened the Palais Mondial, later renamed the Mundaneum. The Mundaneum’s mission was to collect metadata on every book, journal, and periodical ever published and record it in a card file system that embodied what we would call a faceted classification scheme. By 1934 it contained over 15 million entries. Unique identifiers included embedded links to related documents. Staff responded to search requests received by post and telegraph and returned hand-copied cards by post. In 1934 Otlet conceived a global network of “electric telescopes” that would allow people to search and browse through interlinked documents, images, audio and motion picture recordings. He wrote that, “from his armchair, everyone will hear, see, participate, will even be able to applaud, give ovations, sing in the chorus, add his cries of participation to those of all the others.” Mundaneum Infrastructure “Social Network” Feedback Telegraph and postal “network” Human Search Engine “Hyper-linked” Card Catalog Documents, Images, Recordings Fatal Flaw: Scalability The Mundaneum Approved for Public Release, Distribution Unlimited

  4. Spreadsheets Text files Videos Web pages E-mail Images DOSR Vision • Create a resilient, distributed, scalable, and secure network of information that does not require a completely trusted or stable network of processing nodes [employ network overlays, and advanced cryptographic techniques] • Advance the state-of-the art in automated metadata generation and interoperability [apply machine learning techniques] • Automatically get information where it is needed, or may be needed, using less bandwidth and processing. [integrate user models, compact information retrieval encodings, and distributed content delivery] • Reliably track where information goes, and where it came from [encapsulate provenance and audit information in network-maintained virtual objects] • Enable secure, resilient information storage, characterization, retrieval, and collaboration across barriers of time, geography, community of interest, technology, and administrative domain User and Data Models Automated Metadata Generation What we can find defines what we can do Approved for Public Release, Distribution Unlimited Photos courtesy of U.S. Army, U.S. Navy

  5. Hard Problems • Automated metadata extraction and generation • DoD has many stovepipe systems with limited metadata • Automatic extraction of metadata, especially from non-textual information is an unsolved problem requiring some form of artificial intelligence • Email, papers, presentations, forms, databases do not possess a community-maintained mesh of reciprocal references, so Google-like search, relevance, and ranking algorithms do not work • Scalable security for sharable objects • Decentralized (for scalability) key distribution systems present security challenges • Protection from known cryptographic and corruption attacks is hard; protection from unknown attacks is harder • Usable secure sharing (as convenient as email) is needed or system won’t be used • Scalable, revocable group access to synchronized, encrypted, versioned documents is essential • Scalable replicated storage and parallel data distribution • Globally unique identifiers (GUIDs) for retrieval and update are essential, and must be unbreakable, verifiable, and afford scalable resolution of a retreivable, trackable object • How to track fragmented and replicated objects for persistence and provenance • Object replication for secure, scalable, high-bandwidth distribution (secure BitTorrent-style) • Enhance resiliency and service in network-poor, areas • Respond adaptively to service degradation for high-demand data and large-scale disruptions • Personalization, intelligent agents and user models • Intelligent agents needed to locate content near likely users, based on user models • User models based on authorization, active input and passive tracking Approved for Public Release, Distribution Unlimited

  6. Key Capabilities Object 1 Version 1 Replicas and fragments • Architecture and protocols • Protocols for exchanging objects, metadata, and security controls • Mobile agents and federated requests for information • Persistence of digital objects • Distribute replicas and coded fragments • Global, persistent, verifiable, unique identifiers (GUIDs) • Version-controlled, collaborative updates • Trust, security and provenance • Authorized, authenticated access • Decentralized encryption for scalability • Verifiable provenance and tracking of all objects • Resilience to attacks • Scalability • “Scale-free” architecture • Decentralized, peer-to-peer techniques • Manage latency, consistency and security as scale grows • Metadata and search • Extract metadata from video, maps, images • Relevance feedback • Efficient federated search • Accessibility and User Models • User models include authorization, preferences, location, need-to-know • Content finds you without search • Information locally available is personally relevant Retrieve latest version from closest fragments or replica Object 1 Version 2 update Decentralized, scalable key distribution Scalable resources, storage and participant networks Needed objects migrate to local server for user Approved for Public Release, Distribution Unlimited

  7. Interesting Research Ongoing in… • Automated metadata extraction • Decentralized, self-configuring, location and routing • Federated search • Information retrieval • Personalization and user models • Proxy re-encryption • Scalable security and PKI • Search over encrypted indexes • Securing resilient peer-to-peer networks DOSR Workshop will address these areas Approved for Public Release, Distribution Unlimited

  8. Preliminary Schedule • July 15 Posters • 4:20 pm Break • 4:40 pm Poster Session 1 • 5:20 pm Poster Session 2 • 6:00 pm Adjourn • July 16 Breakouts • 9:00 am Dr. Josh Alspector - DOSR vision and breakout group instructions • 9:30 am Breakout group discussions • Noon Lunch • 1:30 pm Brief out Group 1 • 2:00 pm Brief out Group 2 • 2:30 Break • 2:50 Brief out Group 3 • 3:20 Brief out Group 4 • 3:45 Plenary Session • 4:15 Adjourn July 15 Talks 8:30 am Opening remarks – DARPA Architecture 8:45 am Dr. Robert Kahn - keynote address 9:15 am Dr. Peter Lucas – MAYA 9:35 am Dr. Daniel Crichton – NASA 9:55 am Break Metadata 10:15 am Dr. Ajay Divakaran - Sarnoff Corp. 10:35 am Dr. Randal Burns - JHU 10:55 am Dr. Shmuel Peleg - HU-J 11:15 am Mr. Jason Byassee - Northrop Grumman Security 11:35 am Dr. James Allan - U. Mass-Amherst 11:55 am Dr. Rafail Ostrovsky – UCLA 12:15 pm Lunch 1:40 pm Dr. Urs Muller - Net-Scale Tech. 2:00 pm Dr. Matt Staker - IBM Research 2:20 pm Dr. Angelos Stavrou - Global InfoTek Inc. 2:40 pm Break User Models 3:00 pm Dr. Peter Brusilovsky – U. Pittsburgh 3:20 pm Dr. Michael Walfish - UT-Austin 3:40 pm Dr. Rafael Alonso - SET Corp. 4:00 pm Mr. Peter Haglich - Lockheed Martin Approved for Public Release, Distribution Unlimited

  9. Levels of Success • DoD adopts system internally • Portions of system are made available for open-source uses by Apache • Legal, medical, and financial records management firms adopt GUID’s, protocols, and system components • ISPs and media companies adopt GUID’s, protocols, and system components for subscription services • Amazon, Google and iTunes use GUID’s and protocols Approved for Public Release, Distribution Unlimited

  10. Prior Art • Coda (CMU) • Cooperative File System (MIT) • FARSITE (Microsoft) • Grid (Argonne National Laboratory) • Lustre (now owned by Sun Microsystems) • OceanStore (UC Berkeley) • PASIS (CMU) • Universal Database (Maya Design) Approved for Public Release, Distribution Unlimited

More Related