1 / 28

Future of Analysis Environments Personal views

Future of Analysis Environments Personal views. Rene Brun CERN. Type of data ? Any type ? PAW-like ntuple?. No restrictions. Data. Restricted to histogramming & visualisation ?. Analysis. Structure ? What is modularity? Abstract interfaces? Languages? Parallelism?. Coherent

magda
Download Presentation

Future of Analysis Environments Personal views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Future of Analysis EnvironmentsPersonal views Rene Brun CERN Rene Brun

  2. Type of data ? Any type ? PAW-like ntuple? No restrictions Data Restricted to histogramming & visualisation ? Analysis Structure ? What is modularity? Abstract interfaces? Languages? Parallelism? Coherent Framework of Cooperating systems I/O + UI Object Bus Packages Rene Brun

  3. Type of Data in the past • Event data managed by data structure (bank) managers (zebra, bos..) • a bank is like an object • Final physics data in ntuple format (paw) • ntuple is like a table in a RDBMS • Run/File catalog with adhoc tools (fatmen) • calibrations, geometry, etc, adhoc tools (hepdb) Rene Brun

  4. Type of data: trends-1 • Put everything in an Object Data base • like Objectivity • Choice of RD45 project • Many experiments initially following this line • Abandonned by most experiments recently • Interesting experience with Babar • Solution not suited for PAW-like analysis Rene Brun

  5. Type of data: trends-2 • Put write-once data in an object store • like ROOT in Streamer mode • Use a RDBMS for : • Run/Event catalogs • Geometry, calibrations • eg with ROOT<->Oracle interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/rooObjy/ • or with ROOT <-> Objectivity interface • http://www.phenix.bnl.gov/WWW/publish/onuchin/RDBC/ • Use ROOT split/no-split mode for phys analysis Rene Brun

  6. Framework basic requirements • Dynamic Linking AND Unlinking of user shared libs • User can define new classes interactively • Interpreted code can call compiled code • Compiled code can call interpreted code • Scripts can be dynamically compiled/linked This is the normal operation mode Interesting feature for GUIs & event displays Script Compiler Root >.x file.C++ Rene Brun

  7. Fundamental features of an Object-Oriented Framework OO World Procedural World Persistency services Data DDL Data RTTI Functions Functions KUIP CDF User Interface C++ ROOT C++ Java Rene Brun

  8. Rene Brun

  9. Rene Brun

  10. Automatic Code generation Algorithms Meta information Automatically generated code 40 per cent in ROOT Hand-written code Used by I/O, GUI, Inspectors, browsers interpreter, html, etc Rene Brun

  11. Java - ROOT interface(s) • Read ROOT files from a java program • see Tony Johnson • will be simpler with new ROOT 2.26 supporting automatic schema evolution • Call ROOT classes from a java program • work by Subir Sarkar (hand-coded JNI interface) • could use JACO (see Tony Johnson) • or better use a variant of rootcint (rootjava) • Generate ROOT-Java data classes • TTree::MakeJava like TTree::MakeClass Rene Brun

  12. Java - ROOT interface (s) import root.*; TROOT troot = new TROOT("simple", "Simple Java to root interface"); TApplication app = new TApplication("ROOT Apllication"); System.out.println("TApplication ....."); TBenchmark bench = new TBenchmark(); bench.Start("Hsum"); TRandom random = new TRandom(); TH1F total = new TH1F("total","total distribution",100,-4.0F,4.0F); TH1F main = new TH1F("main","Main contributor",100,-4.0F,4.0F); TH1F s1 = new TH1F("s1","first signal",100,-4.0F,4.0F); TH1F s2 = new TH1F("s2","second signal",100,-4.0F,4.0F); total.Sumw2(); // this makes sure that the sum of squares of weights will be stored total.SetMarkerStyle(21); total.SetMarkerSize(0.7F); main.SetFillColor(16); s1.SetFillColor(42); s2.SetFillColor(46); TCanvas canvas = new TCanvas("c1","The HSUM example",200,10,600,400); canvas.SetGrid(); and so on. Rene Brun

  13. Java - ROOT interface (s) • It is important to cooperate to: • facilitate the Java/C++ integration • Could be interesting for applications where performance is not an issue (event display) • However, I do not believe in a solution where the bulk of data is stored as C++ objects and analyzed with a Java-based system. • It must fun but very inefficient • what do you gain? Rene Brun

  14. Languages for data analysis • Data analysis requires an efficient access to objects (both data and functions). • It requires a powerful programming language: • in interpreted mode • in compiled mode • Transition from interpreted mode to compiled mode must be smooth and transparent. • A scripting language is not the solution • Python is not a solution Rene Brun

  15. GUI Compiled scripts Interpreted scripts Commands Rene Brun

  16. A role for commercial components ? • Data bases • Oracle very likely, others NO • Graphics/UI • NO • but YES for interfaces to commercial systems • Special algorithms like fitting • strong doubts • I strongly believe in the advantages of • Open Source systems • Large news/discussions groups Rene Brun

  17. Our current work • Continuous consolidation of the system • Automatic schema evolution • Common GUI between Unix and Windows • Upgrade UI to new style GUI • Tree query processor reimplemented using the new TSelector facility. • PROOF (Parallel ROOT Facility) (see next) • Interface with other systems, eg G3, G4 • Support thousands of users Rene Brun

  18. The OODBMS dreams Selection Parameters CPU Local DB1 Federation DB2 DB3 OODB Remote DB4 DB5 DB6 Rene Brun

  19. ROOT/PROOF and GRIDs Selection Parameters TagDB CPU Procedure PROOF Local DB1 RDB CPU Proc.C DB2 Remote Proc.C DB3 CPU Proc.C DB4 Proc.C CPU DB5 Proc.C CPU DB6 CPU Rene Brun

  20. What is a modular system ? • Modularity is a nice word. • Everybody claims to be modular. • a system with many small and independent modules? • where is the object bus? • what is the cost of assembling all the pieces in a real application? • a hierarchical system with easily replaceable components? • but with many internal dependencies Rene Brun

  21. What is a modular system ? • a system with well defined interfaces? • where is the object bus? • passing data by reference or value? Collections/Folders? • a system easy to understand (user view) ? • end users like monolithic systems doing everything • a system easy to maintain (developer view) ? • a system that can easily be integrated into other systems? • a theoretical system and no implementation? Modularity is difficult to achieve in a growing system. Rene Brun

  22. Modularity and Dependencies in ROOT By dependency, we mean binary dependency, when one module (shared library) forces the loading of another library. In the past this was a weak point of the system. For example, if you wanted to produce in a batch program some histograms you were required to link your app with all ROOT graphics libs up to X11. Like with PAW This problem was rightly pointed out by many users as something to be fixed. We did this. In the current system only a small set of base libraries are needed when creating e.g. histograms, in batch mode. Besides the decoupling of the graphics system many more abstract layers were introduced to decouple other parts of the system: histogram from its painter, the tree storage system from its query mechanism (treeplayer), fitting from minuit, etc. Following this reorganization none of the lower level libraries depend anymore on higher level libraries. These changes improved besides modularity also overal system performance. Rene Brun

  23. Rene Brun

  24. Rene Brun

  25. Rene Brun

  26. ROOT Quality assurance Rene Brun

  27. A growing users base Rene Brun

  28. Summary • We are implementing a powerful system designed for large scale data analysis with parallel architectures in a GRID context. • The ROOT system is a framework providing a coherent object bus in DAQs, simulation, reconstruction and analysis phases. • We have learnt a lot in the past 5 years, also following our 10 years of experience with PAW. • Developing the system and at the same time supporting a rapidly growing users base is a demanding but also rewarding job. Rene Brun

More Related