Status of the Anaphe Project: Status of the Anaphe Project Max Sang
CERN IT/API
max.sang@cern.ch ACAT 2002, Moscow, Russia, June 26 2002
What is Anaphe?: What is Anaphe? An project in CERN IT division to provide a modular OO alternative to CERNLIB
Histograms & Ntuples
Persistency
Plotting and visualisation
Fitting and Minimisation
Interactive analysis
Prepare for the Long Term!: Prepare for the Long Term!
The CERNLIB Legacy: The CERNLIB Legacy Maintenance of CERNLIB still a burden in 2002 (after >20 years). Why?
Big scope - (necessarily) home-grown solutions to every problem; graphics, mathematical libraries, data storage...
High coupling (despite heroic efforts!) - f77 has few language features to help reduce it
Fragility has increased with time
Very complex - only real experts can maintain it
1) Scope - does it need to be home grown?: 1) Scope - does it need to be home grown? Example: mathematical library
Must be high quality and fast (CPU time = money)
CERN IT division shrunk a lot since 70s - who writes it? who maintains it?
But is it HEP specific?
No - and neither are graphics (HIGZ/HPLOT), memory or code management (ZEBRA, PATCHY), etc. etc.
Anaphe Approach I: Scope: Anaphe Approach I: Scope
If we can find a stable, high quality open-source product, we use it.
If we can’t, we pay for a commercial solution.
If it just doesn’t exist, we write it.
More efficient use of manpower
2) Coupling? : 2) Coupling? Fragility (inter-dependency) leads to much higher manpower costs over the s/w lifetime - but takes careful design to avoid
Modern languages help by providing
Classes (1980s)
help to insulate clients from implementation
Interfaces (1990s)
protocol “classes” - no concrete types in client code (programming by contract)
Run time choice (“component architecture”)
Anaphe Approach II: Coupling: Anaphe Approach II: Coupling Much work to minimise dependencies
Strict partitioning into independent packages which communicate only through interfaces.
component architecture with NCCD ~ 1 (see L.Tuura, these proceedings.)
Normal in ‘real world’, but less so in HEP. Anaphe team is 3/7 computer scientists big help (see M.Kasemann, these proceedings).
AIDA (I): AIDA (I) Abstract Interfaces for Data Analysis (see V.Serbo, these proceedings).
Protocols defined (so far):
fitting
plotting
histograms and ntuples
XML format for exchange of histos/ntuples/...
Facade to hide management/persistency
AIDA (II): AIDA (II) Three AIDA2.2-compliant tools (Anaphe, JAS, OpenScientist)
Cross-tool (and cross-language!) uniform protocol for data analysis
Next release 3.x (September 2002)
large improvement in functionality and flexibility
Anaphe Approach III: Layering: Anaphe Approach III: Layering Lizard uses SWIG-generated Python classes which shadow...
...the AIDA interfaces which are implemented using...
...the Anaphe wrappers which are implemented using...
...the Anaphe foundation libraries, which use... CLHEP, Qt, CERNLIB, STL, etc.
Ignominy Analysis of Anaphe : Ignominy Analysis of Anaphe Distribution of tools and utilities for LHC era physics
Combination of commercial, free and HEP software
Claims to be a toolkit
Seems to live up to its toolkit claims
Good work on modularity
Clean design is evident in many places
Dependency diagrams often split naturally into functional units
Thanks to Lassi Tuura (CMS)
History: History Anaphe (then LHC++) started 1997
Foundation libs developed 1997-2000
Interfaces, Wrappers and Lizard prototyped in time for CHEP 2000
First production version Summer 2001
First open-source version Autumn 2001
AIDA 2.2 compliant version Summer 2002
AIDA 3.x compliant version Autumn 2002
Core Home-Grown Components : Core Home-Grown Components Histogramming (HTL)
Plotting (Qplotter)
Fitting and Minimisation (FML)
Ntuples (NtupleTag)
Wrappers implementing the interfaces in terms of other libraries
Interactive framework (Lizard)
“External” Packages: “External” Packages Commercial licensed packages
Objectivity (OO database), NagC
deliberately limited dependencies - will be replaced if and when it makes sense
HEP products
HBOOK, MINUIT, CLHEP, HepODBMS
Open source
Qt, OpenInventor, Python, SWIG, expat, Doxygen, ...
Anaphe Components I: Anaphe Components I Histogram Template Library (HTL)
High performance
Flexible treatment of binning, errors
Transient and persistent (HepODBMS) versions
Fitting and Minimisation Library (FML)
Powerful and flexible OO interface
uses Gemini (thin wrapper for NAG-C, MINUIT, or other future libraries)
Anaphe Components II: Anaphe Components II Qplotter
Based on Qt Free 3 - very powerful C++ graphics & GUI library released under GPL
Several layers of interface complexity
Stand-alone version available (request from US medical physics group)
Anaphe Components III: Anaphe Components III NtupleTag
Allows transparent navigation from ‘tags’ (small number of vars) back to original data
Designed for use with OODB
Faster - tags better clustered than full data
Original version used HepODBMS/Objectivity
Read/write of HBOOK RWN with same interface
Version based on LCG persistency scheme will be produced (when defined)
Anaphe Components IV: Anaphe Components IV AIDA Wrappers
Wrapper classes to implement the AIDA interfaces using functionality from the foundation libraries
Alternatives available where appropriate - just link to whichever you need
Ntuple & Histo : Objectivity or HBOOK
Fitter : NAG-C or MINUIT
Others as they become available (GSL, LCG persistency solution etc.)
Anaphe Components V: Lizard: Anaphe Components V: Lizard Python framework for interactive analysis
All classes and methods from AIDA wrappers mapped into Python commands - plotting, fitting, ntuples, histos.
User modules can be plugged in as required
Analyzer module provides on-the-fly compilation and running of user’s C++ code
Lizard is a ‘Facade’ to the Anaphe libraries
Unified interface but only at top level
ANAPHE Components: ANAPHE Components User Interface
- using Abstract Types Python / SWIG
Objectivity
NAG-C
Qt Analyzer User’s C++
code
Why Python?: Why Python? Object oriented and weakly-typed
maps well to OO languages ‘underneath’, quick and convenient to type
Easy ‘gluing’ of components together
Mapping from C++ and/or Java is automated (SWIG, Boost, Jython)
Huge user base
low risk of serious bugs (or its disappearance!)
lots of free software off the shelf - networking, GUI, OS, scientific etc
News I: AIDA Compliance: News I: AIDA Compliance Refactoring of wrappers to become AIDA compliant - requires change of user interfaces at Lizard level
Required no changes to foundation libs
Shows flexibility of layering
New documentation July 2002
Future major releases will be synchronized with AIDA releases
Next major release September 2002 (AIDA 3)
News II: InternalDevelopment: News II: Internal Development Response to user feedback and (mostly) friendly criticism
Improvements in testing and installation
Clearer and more informative web site (available July 2002)
Lots of work on documentation (examples, tutorials, reference docs) and procedures for responding quickly to user requests
We hope you see a difference!
Medium Term Plans: Medium Term Plans AIDA 3.x compliant version (September)
Exposes much more functionality of underlying libraries
Finer-grained control of plotting, fitting etc at user (Lizard) level
Reading of Root (3.x) files
HBOOK column-wise ntuples
Improvements to foundation libs (ask us!)
Users I: Users I User community small but growing
Some pioneers on CMS & LHCb
Collaboration with JAS & OpenScientist growing - component sharing is now a real possibility
Geant 4 has adopted AIDA as a tool-independent analysis standard so some Geant 4 users are coming to Anaphe
Users II: Users II Data analysis of medical and space physics simulations using Geant 4 - see CERN Courier, June 2002 Anaphe Thanks to M.G.Pia, INFN
Users III: Users III Planetary physics (X-ray fluorescence of simulated Mars soil)
UCL group on LISA (space gravitational wave detector) Thanks to A.Manero, INFN
Distributed Ntuple Analysis: Distributed Ntuple Analysis Would like to do distributed parallel analysis over large data sets with minimal assistance from the end user
e.g., distributed, parallelised ntuple analysis and projection into Histograms in Lizard Analyzer
Abstract Interface hides the complexity, no change in tool(s) or user code needed
First prototype for very general distributed parallel data analysis available (see J.Mosciki, these proceedings). Special case (ntuple analysis) already developed using Lizard and the Anaphe libraries
Summary: Summary Anaphe is a layered set of loosely coupled C++ components for data analysis, plus an interactive Python framework (Lizard)
Only HEP-specific parts written in-house
Developed and maintained by CERN IT
Committed to AIDA compliance
Functionality, stability and user support improving rapidly; gaining users (you..?)
More Information: More Information
http://cern.ch/Anaphe
http://aida.freehep.org
HepLib.Support@cern.ch