1 / 14

File-Metadata Management System For The LHCb Experiment

CHEP04 Interlaken, 27 September 2004. File-Metadata Management System For The LHCb Experiment. Carmine Cioffi Department of Physics, University of Oxford. Outline. What are Metadata and why we need them in the LHCb experiment. The File-Metadata Management System

derex
Download Presentation

File-Metadata Management System For The LHCb Experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHEP04 Interlaken, 27 September 2004 File-Metadata Management System For The LHCb Experiment Carmine Cioffi Department of Physics, University of Oxford

  2. Outline • What are Metadata and why we need them in the LHCb experiment. • The File-Metadata Management System • The two schema strategy • XML and the warehousing database • Services and specialised views • Relationship between the warehousing database and views. • Web Services • ARDA and future planning File-Metadata Management system

  3. Metadata • Generally speaking, metadata are data which characterise data-files • The two facets of metadata • Job provenance: Everything you ever wanted to know about how a data-file was created • Bookkeeping: How do I identify the datasets I am interested in for my analysis ? • Metadata are needed to get straight to the files of interest, avoiding unnecessary access to the data storage. File-Metadata Management system

  4. The two schema strategy • The two schema strategy consists of having a Database (Warehousing DB) and a View of it, both with their own schema. • The Warehousing DataBase (WDB) is meant to store data in a simple way but be flexible enough to accept new data. • The View is designed to be efficient for the service it is made for. File-Metadata Management system

  5. Entity-Relationshipmodel for WDB File-Metadata Management system

  6. XML and the insertion of data • Due to the key-value strategy the WDB is liable to be corrupted: • Any data with any semantic can be inserted. • Partial information can be inserted. • To prevent this the data must be presented in XML format. In this way, using a predefined DTD/XML-SCHEMA it is possible to verify the correctness of the data. File-Metadata Management system

  7. The DTD for the insertion of a job related metadata • <!ELEMENT Job ( (JobOption|TypedParameter|InputFile|OutputFile)*)> • <!ELEMENT JobOption EMPTY> • <!ELEMENT TypedParameter EMPTY> • <!ELEMENT InputFile EMPTY> • <!ELEMENT OutputFile ((Parameter|Quality)*)> • <!ELEMENT Parameter EMPTY> • <!ELEMENT Quality (Parameter*)> • <!ATTLIST Job ConfigName CDATA #REQUIRED • ConfigVersion CDATA #REQUIRED • Date CDATA #REQUIRED> • <!ATTLIST JobOption Recipient CDATA #REQUIRED • Name CDATA #REQUIRED • Value CDATA #REQUIRED> • <!ATTLIST TypedParameter Name CDATA #REQUIRED • Value CDATA #REQUIRED • Type (Info|Environment_Variable) #REQUIRED> • <!ATTLIST InputFile Name CDATA #REQUIRED> • <!ATTLIST OutputFile Name CDATA #REQUIRED • TypeName CDATA #REQUIRED • TypeVersion CDATA #REQUIRED> • <!ATTLIST Parameter Name CDATA #REQUIRED • Value CDATA #REQUIRED> • <!ATTLIST Quality Group CDATA #REQUIRED • Flag CDATA #REQUIRED> File-Metadata Management system

  8. Services and the specialised views • Sometimes complex SQL queries do not work well for bulk lookups. • But the WDB contains all the information about the file that can be used to generate specialised views for specific service. • Knowing the service, the views can be optimised to give the best performance. File-Metadata Management system

  9. Example of view with service and applications Replica DT_JobSummary XMLRPC FILE_ID REPLICA LOCATION JOB_ID CONFIG DBVERSION EVENTTYPE JOBDATE LABORATORY PROGRAM0 INPUTFILE0 PROGRAM1 INPUTFILE1 PROGRAM2 INPUTFILE2 Jython Web Server DT_FileSummary SERVLETS FILE_ID JOB_ID EVENTTYPE EVENTDESCRIPTION NBEVENTS FILETYPE FILENAME FILESIZE SPECIALISED VIEW SCHEMA Web Browser GANGA application This example shows the specialised view that sits on back of the XMLRPC and SERVLETS Services. These services are used by GANGA and the Web Browser. File-Metadata Management system

  10. Generation of the specialised View Name Value ConfigName ConfigVersion Replica QualityParams DT_JobSummary Date FILE_ID REPLICA LOCATION Jobs JOB_ID CONFIG DBVERSION EVENTTYPE JOBDATE LABORATORY PROGRAM0 INPUTFILE0 PROGRAM1 INPUTFILE1 PROGRAM2 INPUTFILE2 Files LogName DT_FileSummary FILE_ID JOB_ID EVENTTYPE EVENTDESCRIPTION NBEVENTS FILETYPE FILENAME FILESIZE JobParams Type FileParams Name Value TypeParams Name Value Name Value Specialised View Warehouse DB SQL script Done periodically or on demand based on the needs of the experiment (every night for LHCb). This is fast despite the fact that WDB contains many GB. File-Metadata Management system

  11. Some Numbers • LHCb is using ORACLE 9i technology for its DB • It is hosted on a cluster of two ‘Sun Fire 280R’ machine • Each with two processors of 750MHz • 2 GB RAM • 600 GB HD • The DB contains ~20GB of data • Shared between real data and indexing tables • ~2M jobs rows • ~5.5M files rows • ~57M rows in parameters. File-Metadata Management system

  12. LHCb services • Actually LHCb is using two services to access the information from the databases: • Servlet service : • the service allows the selection of datasets based on their history (job provenance) by the web browser. • XML-RPC service: • access to and modification of the WDB data • allow GANGA to access Bookkeeping data. File-Metadata Management system

  13. Collaboration with ARDA • LHCb has engaged a collaboration with ARDA: • Definition of metadata and understanding of LHCb requirements • Elaboration of a new interface for the manipulation of file-metadata. • Possible technology (WSDL). • See how this will fit with the already existing LHCb system. • Stress-test the Bookkeeping services, analysing various behaviours: • Different number of clients • Different queries • Comparison with direct RPC calls • Implement the new defined interface • Using the actual LHCb File-Metadata DB as back-end • Using the technology developed with ARDA File-Metadata Management system

  14. CONCLUSIONS • The two schema strategy works well for LHCb, and with the DC04 its flexibility was well proven, indeed no changes were required to the WDB although new data have been stored. • Because of key-value nature of the WDB it can be easily adapted for warehousing of any data, including that of other experiments. File-Metadata Management system

More Related