acat cdf 2

Uploaded from authorPOINTLite
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

A Component Framework for Distributed Data Analysis in HEP: 

A Component Framework for Distributed Data Analysis in HEP

CERN IT/API R&D Project: 

CERN IT/API R&D Project goals: study requirements of semi-interactive parallel analysis in HEP middleware technology evaluation & choice CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID prototyping (focus on ntuple analysis) young project: Jan 2001: start (0.75 FTE) June 2002: running prototype exists (1.25 FTE) sample Ntuple analysis with Anaphe run-level parallel Geant4 simulation (soon)

How does it fit with the Grid ?: 

How does it fit with the Grid ? Grid-enabled framework for HEP applications this framework will be a Grid component ...via a gateway that understands Grid/JDL framework uses lower level Grid components authentication, security, load balancing distribution aspects parallel cluster computation "institute" or "workgroup" level (Tier 1-3) local computing center remote analysis geographically unlimited

Distributed Analysis: Motivation: 

Distributed Analysis: Motivation why do we want distributed data analysis? move processing close to data for example ntuple job description ~ kB the data itself ~ MB, GB, TB ... rather than downloading gigabyte data let the remote server do the job do it in parallel – faster clusters of cheap PCs

Topology of I/O intensive app.: 

Topology of I/O intensive app. ntuple mostly I/O intensive rather than CPU intensive fast DB access from cluster slow network from user to cluster very small amount of data exchanged between the tasks in comparison to"input" data

Parallel ntuple analysis: 

Parallel ntuple analysis data driven all workers perform same task (similar to SPMD) synchronization quite simple (independent workers) master/worker model

HEP public/workgroup clusters: 

HEP public/workgroup clusters features many users, many jobs diverse applications: ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines dynamic environment users may submit their analysis code mixed CPU and I/O intensive some applications may be preconfigured general analysis e.g. ntuple projections or experiment specific apps load balancing important

Example of ntuple projection: 

Example of ntuple projection example of semi-interactive analysis data: 30 MB HBOOK ntuple / 37K rows / 160 columns time: minutes .. hours timings desktop (400Mhz, 128MB RAM) - c.a. 4 minutes standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a. 45 sec 6 lxplus workers - c.a. 18 sec why 6 * 18 = 45 ? job is small, so big fraction of the time is compilation and dll loading, rather than computation pre-installing application would improve the speed caveat: example running on AFS and public machines

Medicine applications: 

Medicine applications example: brachytherapy optimization of the treatment planning by MC simulation features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines ongoing joint collaboration with G4 and hospital units in Torino, Italy to be deployed soon

Space science applications: 

Space science applications example: LISA MC simulation for gravitational waves experiment features CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines requirements: error recovery important monitoring and diagnostics

Master/Worker model: 

Master/Worker model applications share the same computation model so also share a big part of the framework code but have different non-functional requirements

Architecture principles: 

Architecture principles framework core 100% application independent e.g. Anaphe/Lizard ntuple analysis is just one application thin client approach just create a well-formed job description in XML send via CORBA and read the results back in XML so client may be a standalone application in C++ or python, or integrated into analysis framework (e.g. Lizard) dynamic application repository plugin repository in XML dynamic loading on the server side + meta-tools (admin)

Architecture principles (2): 

Architecture principles (2) component design of the core framework find common parts for all use-cases plug-in use-case specific components do not over-generalize AIDA-based analysis applications using Lizard/Anaphe but any AIDA compliant tool could be used (JAS, OpenScientist) see ACAT talks by V.Serbo "AIDA" and M.Sang "Anaphe" integrated into python environment

Deployment of Distibuted Components: 

Deployment of Distibuted Components layering: abstract middleware dynamic application loading plugin components

Using CORBA and XML: 

Using CORBA and XML inter-operability (shown in the prototype ntuple application) cross-release (muchos gracias XML!) client running Lizard/Anaphe 3.6.6 server running 4.0.0-pre1 cross-language (muchos gracias CORBA!) python CORBA client (~30 lines) C++ CORBA server compact XML data messages 500 bytes to server, 22k bytes from server of XML description factor 106 less than original data (30 MB ntuple) thin client: no need to run Lizard on the client side as an alternative use case scenario

Facade for end-user analysis: 

Facade for end-user analysis 3 groups of user roles developers of distributed analysis applications brand new applications e.g. simulation advanced users with custom ntuple analysis code similar to Lizard Analyzer execute custom algorithm on the parallel ntuple scan interactive users do the standard projections just specify the histogram and ntuple to project user-friendly means: show only the relevant details hide the complexity of the underlying system

Facade for end-user analysis: 

Facade for end-user analysis

Choices for back end s/w: 

Choices for back end s/w For LHC not yet certain (outcome of LCG) Batch Job System (e.g. LSF) limited control -> submit jobs (black box) job queues with CPU limits automatic load balancing, scheduling (task creation and dispatch) prototype: deployed (~10s workers) Dedicated Interactive Cluster custom daemons more control -> explicit creation of tasks load balancing callbacks into specific application prototype: custom PULL load-balancing (~10s workers)

Dedicated Interactive Cluster (1): 

Dedicated Interactive Cluster (1) Daemons per node Dynamic process allocation

Dedicated Interactive Cluster (2): 

Dedicated Interactive Cluster (2) Daemons per user per node Thread pools, per-user policies

Towards a flexible architecture: 

Towards a flexible architecture Corba Component Model (CCM) pluggable components & services make a truly component system on the core architecture level common interface to the service components difficult due to different nature of the services implementations example: load-balancing service Condor - process migration LSF - black-box load balancing custom PULL implenetation - active load balancing but first results very encouraging

Error recovery service: 

Error recovery service The mechanisms daemon control layer make sure that the core framework process are alive periodical ping – need to be hierarchized to be scalable worker sandbox protect from the seg-faults in the user applications memory corruption exceptions signals based on standard Unix mechanisms: child processes and signals

Other services: 

Other services Interactive data analysis connection-oriented vs connectionless monitoring and fault recovery User environment replication do not rely on the common filesystem (e.g. AFS) distribution of application code binary exchange possible for homogeneous clusters distribution of local setup data configuration files, etc… binary dependencies (shared libraries etc)

Optimization: 

Optimization Optimizing distributed I/O access to data clustering of the data in the DB on the per-task basis depends on the experiment-specific I/O solution Load balancing framework is not directly addressing low level issues ...but the design must be LB-aware partition the initial data set and assign data chunks to tasks how big chunks? static/adaptive algorithm? push vs pull model for dispatching tasks etc.

Long term evolution: 

Long term evolution Full production in 2007 (LHC startup) software evolution and policy distributed technology (CORBA, RMI, DCOM, sockets, ...) persistency technology (LCG RTAGs -> ODBMS, RDBMS, RIO) programming/scripting languages (C++, Java, python,...) hardware evolution what will come out of Grid? Globus LCG, DataGrid, CrossGrid (interactive apps) ...

Limitations : 

Limitations Model limited to Master/Worker More complex synchronization patterns some particular cpu-intensive applications require fine-grained synchronization between workers - this is NOT provided by the framework and must be achieved by other means (e.g MPI) Intra-cluster scope: NOT a global metacomputer Grid-enabled gateway to enter Grid universe otherwise the framework is independent thanks to Abstract Interfaces

Similar project in HEP: 

Similar project in HEP PIAF (history) using PAW TOP-C G4 examples for parallelism at event-level BlueOx Java using JAS for analysis some space for communality via AIDA PROOF based on ROOT

Summary: 

Summary first prototype ready and working proof of concept for up to 50 workers ~1000 workers needs to be checked deployment comming soon integration with Lizard analysis tool medical apps active R&D in component architecture relation to LCG (?)

That's about it: 

That's about it cern.ch/moscicki/work cern.ch/anaphe aida.freehep.org

Data Exchange Protocol API: 

Data Exchange Protocol API /* NTupleProtocol.h */ class HistogramParams : public DXP::DataObject { public: HistogramParams(DXP::DataObject *parent) : DXP::DataObject(parent), nbins(this), xmin(this), xmax(this) {} DXP::Long nbins; DXP::Double xmin; DXP::Double xmax; }; class JobResult : public DXP::DataObject { public: JobResult(DXP::DataObject *parent) : DXP::DataObject(parent), histoXML(this), jobData(this) {} DXP::String histoXML; JobData jobData; };