1 / 21

Full-Text Support in a Database Semantic File System

Full-Text Support in a Database Semantic File System. Kristen LeFevre & Kevin Roundy Computer Sciences 736. Leveraging DBs in File Systems. What do databases have to offer? Transactions Concurrency control Crash recovery Query power (metadata) Extensibility – add new objects/modules

kosey
Download Presentation

Full-Text Support in a Database Semantic File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

  2. Leveraging DBs in File Systems What do databases have to offer? • Transactions • Concurrency control • Crash recovery • Query power (metadata) • Extensibility – add new objects/modules • Efficient Search!

  3. Re-thinking Directories • Current state of directories: • User remembers what, not where Our System: • Search tools for grouping related files • Semantically meaningful directories [Semantic FS] • Files are stored in tables • Directories are just for looks LAME!

  4. Related Work • Semantic Filesystems • Use a DB [Inversion Filesystem] • NFS Meets Databases [Halverson] • NFS for portability, transparency, existing code support, familiar semantics • Server-side caching for performance Bringing ideas together: • Use [Halverson]’s infrastructure to implement semantic filesystem ideas

  5. Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work

  6. System Architecture Standard NFS Clients: ... Client Client NFS Front End NFS Server: Custom Backend Object-Relational Database: M M TS2 M M TS2 Storage Storage

  7. Postgres Capabilities An object-relational DB such as Postgres lets you define and add modules. Case in point: Tsearch2 New type: tsvector Related function: to_tsvector to_tsvector(‘a b a c'); ‘a':1,3 ‘b':2 ‘c':4 Related index: idxFTI Set triggers to do updates

  8. Mapping FS data to DB Schema

  9. [Halverson] Schema fileatt 1 1 1 N N N naming allfiles

  10. Database Schema strstr(a,”.txt”) fileatt 1 1 1 N N N naming allfiles

  11. Database Schema strstr(a,”.txt”) fileatt 1 tsearch2 index 1 1 1 1 allfiles_txt N N N naming allfiles

  12. Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work

  13. Virtual Directories and Text Search • Want to handle 2 types of text queries • Boolean keyword queries • e.g. (‘Kristen’ | ‘Kevin’ | ‘Remzi’) & ‘file’ & ‘system’ • IR rank queries • e.g. Rank files with respect to (‘computer’ & ‘architecture’) • More powerful than grep! • Virtual directories proposed for Semantic File systems • Incorporate full-text queries without “breaking” NFS interface for existing applications

  14. DBMS Full-Text Support • Keyword Search • Text indices support search over keywords • Words extracted from document, stemmed, “stopwords” removed • Rank • Used existing rank() function as a black-box • rank() counts number of times each word appears in document, and whether search terms are near one another • Optionally, normalize by document length • Other notions of IR rank could easily be substituted

  15. Semantics of Virtual Directories • Encountered some tradeoffs • What we did: • Static virtual directories (search once on mkdir) • Directory contents as a snapshot at one point in time • Hard links /CS736 project papers reading questions %nfs% writeup talk outline NFS Thread ideas NFS vs AFS

  16. Semantics of Virtual Directories • Encountered some tradeoffs • Alternatives (all also valid): • Static virtual directory creation with symbolic links • leads to dangling (broken) links • Process query lazily on readdir command • Semantics used in Semantic File System paper • Dynamically update contents of virtual directories on file creation, deletion, or write • Can be implemented using database triggers • More expensive, heavier back-end load

  17. Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work

  18. Roadmap • Overview of System Design and Implementation • Virtual Directories and Full-Text Queries • Live Demonstration • Conclusions & Future Work

  19. Conclusions • Benefits of our proxy architecture: • Standard NFS clients • Postgres as black box • Simple to expose functionality of DB • Use & add DB objects at will

  20. Future Work • Performance evaluation to understand the overhead of new functionality • Dynamic index maintenance (file creation & modification) • Virtual directory creation and text querying • Block-level text writes and caching • Query support for other file types • Mechanisms for extracting and indexing meta-data from additional file types (e.g., image files) • Performance Monitoring, Adaptive Indexing and storage format within the NFS Proxy

  21. Thanks!Questions? Special Thanks: Remzi Arpaci-Dusseau Alan Halverson David DeWitt

More Related