1 / 33

FSR – frustrating summary records

FSR – frustrating summary records. file. ^. R. Lambert. Idea. Book-keeping. Log file. Provenance information. Log file. Log file. File Metadata. Log file. File Metadata. File Metadata. File Metadata. Data file. File Metadata. Provenance information. Data file. Event data.

akando
Download Presentation

FSR – frustrating summary records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FSR – frustrating summary records file ^ R. Lambert Core Soft, 26th September 2012

  2. Idea Book-keeping Log file Provenance information Log file Log file File Metadata Log file File Metadata File Metadata File Metadata Data file File Metadata Provenance information Data file Event data Event data Core Soft, 26th September 2012

  3. Gaaah! … and these are justthe savannah bugs/tasks Core Soft, 26th September 2012

  4. Current Structure /Event /FileRecords /GUID /GUID … … FSR Event … Event … Event … /GUID /GUID … FSR /GUID /GUID … FSR Core Soft, 26th September 2012

  5. Current Structure /Event /FileRecords /GUID /GUID … … FSR FSR Event … Event … Event … /GUID /GUID … FSR FSR /GUID /GUID Event Data Custom Algorithms … FSR FSR Provenance Information Automatically Created File Metadata Custom algorithms Core Soft, 26th September 2012

  6. Current Procedure Execute Open File Finalize Output File Input File(s) TES + /Event /Event Pack /Event FSRs FSRs + FSRs FSRs FSRs Provenance Provenance Provenance Core Soft, 26th September 2012

  7. Current Streaming TES Output File(s) + Pack /Event/… + /Event/… Pack Input File(s) + /Event/… /Event Pack FSRs + FSRs FSRs FSRs Provenance Provenance Core Soft, 26th September 2012

  8. Observations • FSRs are sparse, Events are chunky • FSRs have many levels in the tree, Events have few levels • FSRs encode information in the structure, only one “event” Core Soft, 26th September 2012

  9. Observations • FSRs are sparse, Events are chunky • FSRs have many levels in the tree, Events have few levels • FSRs encode information in the structure, only one “event” Core Soft, 26th September 2012

  10. Physical Problems • Many levels = many baskets Core Soft, 26th September 2012

  11. Physical Problems • Many levels = many baskets • Basket size is waaaaaaay too large Basket , ~262144 bits FSR , ~128 bits Core Soft, 26th September 2012

  12. Physical Problems • Many levels = many baskets • Basket size is waaaaaaay too large • Only one “event” written, basket size never optimized I’m sure this is fine. Don’t worry about it. Ask me again in 9 events time. Basket , ~262144 bits FSR , ~128 bits Core Soft, 26th September 2012

  13. Physical Problems • Many levels = many baskets • Basket size is waaaaaaay too large • Only one “event” written, basket size never optimized • Slow to navigate Core Soft, 26th September 2012

  14. Physical Problems • Many levels = many baskets • Basket size is waaaaaaay too large • Only one “event” written, basket size never optimized • Slow to navigate • Huge memory footprint (Gigabytes) • Massive file size increase (Hundreds of Megabytes) • Crazy computing time (30 minutes in finalize) Core Soft, 26th September 2012

  15. Merging in Production Execute Open File Finalize Output File Input File(s) TES + /Event /Event Pack /Event FSRs FSRs + FSRs Merge FSRs FSRs Provenance Provenance Core Soft, 26th September 2012

  16. Best Case Scenario • Normal Production Reconstruction Stripping+Streaming Merging DaVinci Brunel DaVinci DST DST RAW SDST DaVinci uDST uDST ~5 x1 FSRs 0 Daughter 1 Level ~5 x1 +1 FSRs 1 Daughter 2 Levels ~5x1+1 FSRs 0 Daughter 1 Level Core Soft, 26th September 2012

  17. Worst Case Scenario • MC Filtering • Nominally A=1, B~10, C~100… nominally 5,000 FSRs! Brunel Boole Moore Gauss DST DST DIGI SIM A+4 FSRs 3 Daughters 4 levels “A” FSRs 0 Daughter 1 level A+2 FSRs 1 Daughters 2 levels A+3 FSRs 2 Daughters 3 levels DaVinci LHCb DST DST B C (A+4)xB+1 FSRs 3*B Daughters 5 levels ((A+4)xB+1)xC+1 FSRs 3*B*C Daughters 6 levels Core Soft, 26th September 2012

  18. Immediate prospects • Current work around: • Delete and clean up FSRs (might as well not write them) • Parse and merge the XMLSummaries instead • Fixes required • Resurrect provenance information • Write different FSRs out depending on stream • New smarter EventCountFSR • IOFSR (prototype written) • Similar IO information to the XMLSummary • Stores input GUIDs with number of events read • Stores vectors of information for daughter files • Creating IOFSR done by new FSR writer (prototype t.b.d.) Core Soft, 26th September 2012

  19. With new RootCnv • Heard a nice talk last week on basket optimization • Repeat my older tests • Well done Markus and Ivan! But it’s still not great for FSRs SetupProjectDaVinci v32r2p1 Time gaudirun.py ~rlambert/public/forMarkus/fsrMemLeak/options.py #20 files with one event each, but 4681 FSRs in total Core Soft, 26th September 2012

  20. Conceptual problems • Provenance information requires complicated navigation • … and then we throw it away anyway. Great. • Requires a second writer and separate service • Sequencing! • Keep output file open • Create FSR object and register on TES during finalize • Write FSR object to file • Very complicated once Streaming is involved • All FSRs in output streams are identical • Output files don’t necessarily have the same metadata • Different structure suggests different optimal working point Core Soft, 26th September 2012

  21. The Future? • Fixing any of these will fix FSRs for good 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  22. Deconstruction • Fixing any of these will fix FSRs for good • Merging (what we do right now) • Currently throws away provenance information  • Requires very very careful sequencing for finalization order • Requires each FSR type to have some associated merger • OK for small number of FSR types and a lot of manpower • … there are several other options, though… 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  23. Deconstruction • Fixing any of these will fix FSRs for good • Merging (automatic, and IOFSR) • Provenance information will be kept • Have merging done by a service or tool, on FSR baseclass • Do this somehow automatically and always cleanup correctly • OK, but needs a lot of thought about how best to implement 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  24. Deconstruction • Fixing any of these will fix FSRs for good • Event-like FSRs • Treat each FSR as an event, file GUID instead of event number • Write to the same location several times, a mini event container • Completely changes FSR mechanics of Gaudi  • Good for any purpose. 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  25. Deconstruction • Fixing any of these will fix FSRs for good • Flattening (LHCb “Packed” FSR) • Write a master class which holds information at one top level • Let there be only one “FSR” which holds everything • Frequent and invasive schema migration for new FSRs  • Good solution for up to thousands of FSRs 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  26. Deconstruction • Fixing any of these will fix FSRs for good • Flattening (LHCb Packed FSR) • Each writer creates a /FileRecords/Packed location • Vector of data objects, only write out this location • Complicated to work around existing Gaudi FSR system  • Good solution for up to thousands of FSRs 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  27. Deconstruction • Fixing any of these will fix FSRs for good • Flattening (Gaudi Packed FSRs) • Let the TES and Persistent classes be different • Have the persistent class be a __flattened__ tree (auto-unpack) • Requires invasive re-coding of parts of Gaudi  • A good solution for up to thousands of FSRs 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  28. Deconstruction • Fixing any of these will fix FSRs for good • Abandon Root trees completely • Don’t use data objects at all, just append an ntuple to the file • Avoids all baskets and other problems • Completely changes FSR mechanics of Gaudi  • The only solution if 10,000 FSRs are needed 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  29. Deconstruction • Fixing any of these will fix FSRs for good • Get sparse trees in Root to work properly • Have basket size and optimization tuneable tree-by-tree • Needs work from the ROOT team • Only possible on new stack ~6-month time scale  • Perfect solution, no new LHCb/Gaudi code required 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  30. The Future? • Fixing any of these will fix FSRs for good • There is a possible solution for each problem … • What do we persue? • Fixing the whole idea of FSRs is very desirable • Not biting the bullet now, means more manpower later … 10,000 FSRs Sparse Trees in TES Sparse Trees Persisted ROOT Tree used ROOT can’t handle sparse trees Core Soft, 26th September 2012

  31. Summary • FSRs are a new addition to Gaudi, not yet robust • MC Filtering is being addressed • IOFSR to keep provenance information • Currently to add a new FSR type is very complicated • We can work around the problems, but shouldn’t we fix this part of our software which is underperforming? • Automatic merging • Event-like FSRs • Flattening (Three possible directions) • Abandon trees altogether • Fix sparse trees Core Soft, 26th September 2012

  32. End • Backups are often required Core Soft, 26th September 2012

  33. Discussion Core Soft, 26th September 2012

More Related