730 likes | 1.01k Views
Data Analysis Tools G. Wormser, LAL Orsay. The topics End-user data (statistical) Analysis Tools Event Displays (Data Quality Control) The inputs Feedback from LHC/HEP experiments The various analysis packages HEPVis99 Personal experience from BABAR The key issues Conclusions.
E N D
Data Analysis Tools G. Wormser, LAL Orsay • The topics • End-user data (statistical)Analysis Tools • Event Displays • (Data Quality Control) • The inputs • Feedback from LHC/HEP experiments • The various analysis packages • HEPVis99 • Personal experience from BABAR • The key issues • Conclusions G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Historical perspective : PAW • Very large ‘ productivity boost ’ in the physicists community with the introduction of a universal analysis tool program PAW • very easy to use , available everywhere • Ntuples, MINUIT, presentation package • fortran interpreter • macros/script (KUIP, .kumac) • No integration within experiments framework • No overhead! • But not possible to benefit from infrastructure (no access to code, constants, data not in ntuples,event display) G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The new environment • OO Data structures (ROOT,Objectivity,etc) • Analysis codes and tools in OO language • We want ‘ PAW_OO ’! • Very large datasets • want Better integration within the framework • Very powerfulCPUs • Better interactivity G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
User Basic Requirements • Histo and ‘ tuples ’ • Knowledge of the experiment data structure • Interpreted OO langage • Fitting package • Script/macros • Presentation package G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Example: Detailed requirements from ATLAS • AnT design should be modular and reusable, and allow modules addition and deletion without major changes to the program. • AnT should save and restart analysis procedures in the same state as at the exit time. • AnT should provide a standard mechanism to store information and operations executed in each analysis procedure (i.e. information about a dataset, selection cuts, calibration data used - if attributes were re-calculated in an analysis job) to allow their recalculations with identical results. • AnT should provide a standard mechanism to store information on any errors encountered in any data manipulation (i.e. fitting, mathematical manipulations, display). The information should be stored in an object generated by the data operations. • AnT should provide a standard mechanism to append information on the data related to an analysis (for example - criteria used to select data and conditions used to collect data) to the analysis results. • AnT should provide a standard mechanism to store and view results of the preliminary, the intermediate, and the final stage of analysis. • AnT should allow viewing of results in the interactive form and a possibility to save them, if needed, in a standard format for possible inclusion in informal and formal publications. • AnT should display one or more events simultaneously. • AnT should make it possible to plot, graph and represent graphically in other ways results from simple and multiple data sets. • AnT should be easy enough to learn its basic functionality’s in a short time (~ few hours). G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Lifetime of the experiments>Lifetime of the packages Coexistence of several packages in one experiment Collaborative development of the packages Modularity Interoperability Evolutionarity Portability Maintenance Documentation Users support User extension Technical Requirements G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The various products • ROOT (Statistical + Event Display) • JAS (Statistical + Event Display) • LHC++ (Statistical ) • OpenScientist (Statistical + Event Display) • WIRED (Event Display) • HippoDraw (Statistical) • Colt (Statistical) • No purely commercial products ! G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
What is ROOT • Ambitious replacement for PAW by its main author, R. Brun and his group , written in C++ • Covers all aspects of data analysis: • Data storage (ROOT I/O) • Statistical analysis • C++ interpreter CINT • Event Display • Initially built as all-in-one-package, evolution towards more modularity • ‘ Open source ’ approach • Large and growing users base G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
ROOT users base • ALICE • LHCb test beam (Outer tracking) • CDF,D0 • BABAR (see later) • JLC • STAR and many other nuclear physics projects G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Root class structure G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Some ROOT examples from various expts G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
An Online ROOT application from ALICE G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Fermilab Review committee Evaluation of ROOT (‘ 98) • 1) ROOT is a complete, full-featured package that meets the functional requirements • 2) There are some trivial unacceptable features (use of CMZ, lack of build scripts) which should not be a stumbling block, but will require a formal collaboration with the ROOT team • 3) There is a large, world-wide user base, but so far limited use for serious HEP analysis • 4) ROOT can cope with the CDF and D0 data models • 5) ROOT has an effective internal data format well matched to HEP needs • 6) The present version of CINT is a potential serious drawback (buggy, undocumented, limited C++ features, hard to support, poorly engineered). This will require a decision to enhance/upgrade/replace, which would require significant work. • 7) the user interface is not very friendly • 8) The interconnectedness of the various modules is substantial. External modules must conform to (ROOT specific non-standard) ROOT protocols to be functional. • 9) The package is not highly engineered (ie, it has grown organically rather than been designed). The current implementation reflects this evolution, for example, it has not kept up with the C++ language standard (has its own container classes, etc.) Even beyond CINT, the product has many bugs. • 10) It will require some relatively straightforward customization to support casual users • 11) There is an active and responsive support team with good archives and an active mailing list G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Fermilab review Committee recommendations • RECOMMENDATIONS FOR RUN II: • We recommend that ROOT be adopted as the standard physics analysis package for Run II, contingent on a collaborative agreement with the ROOT team. It should be recognized that this recommendation depends critically on timing and on sharing development with outside collaborators, and the steering committee should assess the validity of these assumptions in evaluating the recommendation. In particular, if the requirement for an immediate choice is being driven by on-line needs (which may not require the full functionality of an off-line analysis package immediately), it needs to be determined if the components of NIRVANA that already exist are adequate for the immediate needs. • LONG-TERM RECOMMENDATIONS: • It is highly likely that by the end of RUN II (or by the time of the LHC) that commercial components will be heavily used for analysis tasks. Commercial offerings should continue to be investigated and made available (perhaps on limited platforms). The Computing Division should also initiate formal collaboration with the LHC++ project so as to have some influence on the choices made and direction taken. These two initiatives, while lower priority than the immediate ROOT support and development needs, should position us to take full advantage of expected evolution of these products. G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
What is JAS • Analysis framework based on JAVA • Developped at SLAC by T. Johnson • See the presentation by M. Ronan after this talk • Aims at similar complete functionality as ROOT • Smaller user community (NLC, BABAR online) G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Java Libraries and API’s • Standard Libraries and API’s • 2D + 3D graphics + GUI (Swing) + Imaging + Printing • Database connectivity (JDBC) + ODMG • Collections, IO (Serialization), Data Compression • Networking, Sockets, SSL, Corba, RMI • Java Beans (components), Help • Multimedia, Sound, Speech • Security, Code Signing, Cryptography • Math, Arbitrary Precision Math • Shared Data (Collaborative Applications) • Huge “Community-Ware” software archive • IBM alone has hundreds of Java resources on its Alphaworks site G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Users Java Code Java Compiler + Debugger Remote Data Analysis TCP/IP Network Data Analysis Engine GUI Padded Cell Experiment Extensions (Event Display) • Data • Zebra • Jazelle • Paw • Root • Objectivity Experiment Interface G. Wormser LAL Orsay, 3 rd LHC Computing Workshop C++ Code
Plot Display Package • 1-d/2-d Histogram/ScatterPlot Display • multiple axes, direct user interaction, overlays, fitting G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
1.0 (Beta) currently available Windows (NT, 95, 98) + Unix (Solaris+Linux) Installed on Solaris at SLAC (/usr/local/bin/jas) Limitations Detailed documentation still under development May still be some changes to user API Download from: http://www-sldnt.slac.stanford.edu/jas 2.0 Pre-release by July 1 More plot types More flexible control of histograms Ability to easily compare multiple datasets More n-tuple handling tools (c.f. HippoDraw) Greatly improved printing JAS Availability G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
More Info • Java Analysis Studio • http://www-sldnt.slac.stanford.edu/jas • Please give us feedback • jas-feedback@sld-mail.slac.stanford.edu • Mailing List: • http://www.slac.stanford.edu/cgi-bin/lwgate/JAS-L/ • Also general mailing list for Java in HEP: • http://www.slac.stanford.edu/cgi-bin/lwgate/HEP-JAVA/ G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Some comments on JAS from D. Ferrero Merlino • Pro • portability, remote execution, GUI • Cons • Interoperability with C++ • Performance • Scripting • LCB recommandations • look for IRIS Explorer alternatives • Investigate JAVA solutions • A technical student joined the DAT section in July • try to integrate HTL and Tags in JAS • Evaluate C++ interoperability G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The Colt Distribution - Open Source Libraries for High Performance Scientific and Technical Computing in Java Wolfgang Hoschek CERN IT/PDP G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The Colt Distribution - Open Source Libraries for High Performance Scientific and Technical Computing in Java Wolfgang Hoschek CERN IT/PDP G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The Colt Distribution - Open Source Libraries for High Performance Scientific and Technical Computing in Java Wolfgang Hoschek CERN IT/PDP G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Colt • Efficient High Level Data structures & algorithms for • Off-line Data Analysis • Histogramming • Monte Carlo Simulation • NTuple like manipulations • Approach • summon some of the best concepts, designs and implementations thought up over time by the community • port or improve them • introduce new approaches where need arises • Results so far • In overlapping areas competitive or superior to toolkits such as STL, Root, HTL, CLHEP, TNT, GSL, C-RAND / WIN-RAND, (all C/C++) as well as IBM Array, JDK 1.2 Collections framework, JGL (all Java), • in terms of performance (!), functionality and (re)usability G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Colt Conclusions • Technology Tracking • Java may soon be a major player in performance sensitive scientific and technical computing • look at LHC time-scale and be prepared for that • Colt distribution • Users need libraries to get their job done • Java lacks foundation toolkits broadly available and conveniently accessible in C/C++ and Fortran • Build an infrastructure for scalable scientific and technical computing in Java • Don’t reinvent the wheel - share ressources in Open Source efforts • Document, package and distribute loosely coupled set of libraries under one single uniform umbrella • Visit http://nicewww.cern.ch/~hoschek/colt/index.htm • and get your hands dirty... G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
What is LHC++ • The OO replacement of CERNLIB • Collaborative approach between CERN/IT division and the LHC experiments • Initial trend : favor commercial products( Objectivity, Iris Explorer) • Iris Explorer has been rejected by the collaborations • (No documents available!) • Present focus : Short term effort to provide a new solution G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
View of Interactivity in LHC++ • Explorer based analysis tool was not accepted by users • Request to create new tool • “PAW-like” functionality (at least) • “PAW-like” interface (command-line) • early prototype required • with restricted functionality G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Requirements for analysis tool • Based on Abstract Interfaces to packages: • Histogramming • Fitting • Plotting • Analysis • UserInterface • Implementation flexible • possible to replace packages with minimal impact on other parts G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Components • Services: • HistogramFactory • HistogramManager • Fitter • Plotter • Analyzer (dyn. loaded C++) • uses HistogramManager to register created histos • access to all exp. Data/tags/... G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Components (II) • Basic Classes • Histograms (1D, 2D for start) • Points (1D, 2D) • coordinates (with (asymmetric) errors) • value (with (asymmetric) errors) • VectorOfPoints • added value to vector<Point> • scaling, shifting, … • IF from histograms to fitting/plotting G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
User interface • re-use scripting language(s) • use SWIG for IF to python (perl, tcl, java(alpha), …) • class model allows for “old-fashioned” and “new-style” analysis models • hist.plot() • vector.fromHistogram(hist); plotter.plot(vector) G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Status • initial design (for prototype) done • implementing first prototype • Histograms • Plotter • Fitter • VectorOfPoints • HistogramManager • Analyzer • … work in progress …. • more news soon ... G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
What is OpenScientist • A ToolKit developped by G. Barrand (LAL Orsay) • Very strong focus on interoperability of various packages and collaborative development • Integrated into the HEPVis collab. • Limited user base G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The key to openness THisto::TObject Rio .root file Histo Obj .DB file d_Histo::ooObj SoPlotter _|_ SbPlottedHistogram Use the adapter pattern G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The NxM issue • A nice idea : automatic production of adapters. • Example : SWIG : tcl_Histo Tcl Tcl> histo Histo python_Histo Python ~~:> histo SWIG ? jni_Histo JAVA histo G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Large Array Set • Huge tuples break the UAF model for storage ! • Introduce the notion of “Large Array”. Storage .s file StorageArray Array Storage2Array Storage2 .s2 file VLargeArray TBranch Rio .root file ooArray Obj .DB file G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
OpenScientist Status • Rio, Riot : the file IO system of ROOT put in a stand alone package (free software). • Objectivity : a commercial object database. • Mesa : a free implementation of OpenGL. • SoFree : a free implementation of Open Inventor. • SGI or TGS Inventor : commercial implementations of Inventor. • HEPVis : a free collaborative set of classes over Open Inventor. • Tcl : a scripting language. • KUIP : the CERN/PAW command language put in a stand alone package. • Lab : the top ‘ Hub ’ package that ties subpackages together to present a coherant environment to work with • HCL : a home made histogram package • Midnight : the rewritting of Minuit in C++ by R.Brun, put in a stand alone package. • It runs on NT and UNIX. It coworks now with Geant4 (display and plotting). G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
A Open Scientist session G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The various approaches from the experiments • ALICE: ROOT(AliROOT) • CMS/ATLAS/LHCb:prospect/evaluate • BABAR: No official tool, ie PAW (JAS online, + ROOT) • CDF/D0:ROOT for RunII G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Some words about AliROOT • The ROOT framework will provide to ALICE: • Data Storage • On-line monitoring • Statistical analysis • Event Display G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Alice Framework G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
CMS/ATLAS/LHCb approach • Define their data model and framework independently (eg GAUDI/LHCb, CARF/CMS) • Objectivity for persistency • Close collaboration with LHC++ effort • Evaluate as many products as resonable using test beam stands • (Produce documents!) • Invest on Event Displays (ATLAS, CMS) G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
LHCb strategy G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
LHCb strategy (2) • Common problems HEP-Analysis • Foundation Libraries (ex NAG, CLHEP) • Toolkits(ex HTL) • LHCb specific Analysis Tools, some will make use of HEP-wide toolkits • mathematical Libraries • Histogramming • Fitting and Minimization • Visualization • Data Access • Components exist in different stages but what about their interfaces? • LHC++ is planning to create interfaces on existing packages G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Atlas Web Page G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
CMS Software Task Breakdown G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
Tracker TestBeam Online Monitoring G. Wormser LAL Orsay, 3 rd LHC Computing Workshop
The trends at HEPVis99 • Collaborative environment • Try to define common interfaces • The Open source approach • How to get out of ‘ One man-one tool ’? • Distributed environment • IDL/CORBA/JAVA • No ROOT participation G. Wormser LAL Orsay, 3 rd LHC Computing Workshop