Virtual Organizations, Security and Knowledge Discovery in the CrossGrid Project: Virtual Organizations, Security and Knowledge Discovery in the CrossGrid Project Jesús Marco
CrossGrid WP4 (International Testbed)
Instituto de Física de Cantabria
Consejo Superior de Investigaciones Científicas, CSIC
Santander, SPAIN
http://www.eu-crossgrid.org
The EU CrossGrid Project: The EU CrossGrid Project European Project ( ~5 M€, March 2002-2005)
proposed to CPA9, 6th IST call, V FP
Polish (Cracow & Poznan) / Spanish (CSIC & CESGA) / German (FZK) initiative with the support of CERN (thanks to Fab!)
CYFRONET (Cracow) is the coordinator of the project (Michal Turala, project leader)
Objectives:
Extension of GRID in Europe, assuring interoperability with DataGrid
Interactive Applications (“human in the loop”):
Environmental fields (meteorology/air pollution, flooding)
High Energy Physics (interactive analysis over distributed datasets)
Medicine (vascular surgery preparation)
Need:
Develop corresponding middleware and tools
Deploy on a pan-european testbed
Partners:
Poland (CYFRONET, PSNC, ICM, INP, INS), Spain (CSIC: IFCA, IFIC, RedIRIS, UAB, USC), Germany (FZK, USTUTT, TUM), Slovakia (II SAS), Ireland (TCD), Portugal (LIP), Austria (U.Linz), The Nederlands(UvA), Greece (DEMO, AuTH), Cyprus (UCY)
Industry: Datamat (I), Algosystems (Gr)
VO, SEC & KD in CrossGrid: VO, SEC & KD in CrossGrid CrossGrid interactive applications require:
Complex but Secure Virtual Organizations
CrossGrid middleware provides a framework for development
Friendly secure use: Roaming Access Server (Portal/Migrating Desktop)
Scheduling for collaborative work to VO resources
CrossGrid testbed:
Relies on local site support for management and security
uses Globus basic grid security: GSI
follows EU-DataGrid in deployment for interoperability:
Certification Authorities
Virtual Organization LDAP
Next: VOMS
Knowledge Discovery:
Development of Grid-adapted Data Mining Techniques accessing Distributed Databases with published Metadata Catalogs
Flood management: Flood management Goal:
Flooding risk prediction
Method:
Cascade of simulations
Meteorological
Hydrological
Hydraulic
Virtual Organization Need Grid in interactive mode (simulation results for “what-if” )
seamlessly connect together experts, data and computing resources needed for quick decisions
highly automated early warning system, based on hydro-meteorological (snowmelt) rainfall-runoff simulations
Grid Security Infrastructure (GSI): Grid Security Infrastructure (GSI) Globus Toolkit implements GSI protocols and APIs, to address Grid security needs
GSI protocols extends standard well-known public key authentication protocols for authentication and message protection
X.509 identity certificates
SSL/TLS
GSI supports standard API, GSSAPI, for supporting a number of applications
SSH, GridFTP
Grid Security Infrastructure (GSI): Grid Security Infrastructure (GSI) GSI is: PKI
(CAs and
Certificates) SSL/
TLS Proxies and Delegation PKI for
credentials SSL for
Authentication
And message
protection Proxies and delegation (GSI
Extensions) for secure single
Sign-on
EU-DataGrid Security Services: EU-DataGrid Security Services
The CrossGrid Testbed: The CrossGrid Testbed 16 sites (small & large) in 9 countries, connected through Géant + NReNs
+ Grid Services: EDG middleware (based on Globus) RB, VO, RC… UCY Nikosia DEMO Athens Auth Thessaloniki CYFRONET Cracow ICM & IPJ Warsaw PSNC Poznan CSIC IFIC Valencia UAB Barcelona CSIC-UC IFCA
Santander CSIC RedIris Madrid LIP Lisbon USC Santiago TCD Dublin UvA Amsterdam FZK Karlsruhe Géant
Computing resources: Computing resources Site testbed
LCFG configuration server
User Interface
Gatekeeper (Computing Element)
Worker Nodes
Storage Element
16 sites:
115 CPUs (Worker Nodes)
4 TB (Storage Elements)
Grid services (LIP)
Information Index
Top MDS Information Server, points to site Information Servers
Resource Broker
Matchmaking and load balancing scheduler
Replica Catalogue
Database for physical replica file location
Certificate Proxy Server
Short lived certificates for long lived processes, used by RB
Virtual Organization Server
Database for user authentication (CROSSGRID VO)
Monitoring
Mapcenter: network monitoring system
National Certification Authority machines
CrossGrid CA page: CrossGrid CA page
Working on RA procedure : Working on RA procedure
VO server in CrossGrid: VO server in CrossGrid
Overview of VOMS: Overview of VOMS MyProxy user CA certificate: dn, ca, Pkey proxy cert: dn, cert, Pkey, VOMS cred. (short lifetime) TrustManager doit pre-process: parameters-> obj.id + req. op. obj.id -> acl dn,attrs,acl, req.op ->yes/no authz auth WebServices Authz dn,attrs,acl, req.op ->yes/no doit auth authz map dn -> DB role TrustManager LCMAPS dn -> userid, krb ticket GSI LCAS dn,attrs,acl, req.op ->yes/no doit auth authz map GSI doit pre-process: parameters-> obj.id + req. op. GACL: obj.id -> acl dn,attrs,acl, req.op ->yes/no authz auth coarse grained (e.g. Spitfire) coarse grained (e.g. gatekeeper) fine grained (e.g. RepMec) fine grained (e.g. SE, /grid) Java proxy cert proxy cert proxy cert mod_ssl doit pre-process: parameters-> obj.id + req. op. GACL: obj.id -> acl dn,attrs,acl, req.op ->yes/no authz auth C web fine grained (e.g. GridSite) proxy cert VOMS VOMS cred: VO, group(s), role(s) certificate proxy cert delegation: cert+key (long lifetime) delegation: cert+key (short lifetime) re-newal request request focus is on VOMSdetails are in D7.6 Security Design
VOMS Overview: VOMS Overview Provides info about the user’s relationship with his VO(‘s)
groups, roles (admin, student, ...), capabilities (free form string), temporal bounds
Features
single login: voms-proxy-init only at the beginning of the session (replaces grid-proxy-init);
expiration time: the authorization information is only valid for a limited period of time (possibly different from the proxy certificate itself);
backward compatibility: the extra VO related information is in the user’s proxy certificate, which can be still used with non VOMS-aware services;
multiple VO’s: the user may authenticate himself with multiple VO’s and create an aggregate proxy certificate;
security: all client-server communications are secured and authenticated.
VOMS Architecture: VOMS Architecture DB JDBC GSI https vomsd voms-proxy-init mkgridmap DBI https VOMS server soap + SSL MySQL db – with history and audit records User query server and client (C++)
Java Web Service based administration interface
Perl client (batch processing)
Web browser client (generic administrative tasks)
Web server interface for mkgridmap
User’s Authorization in EDG 2.x: User’s Authorization in EDG 2.x VO-VOMS authentication & authorization info user cert (long life) VO-VOMS VO-VOMS VO-VOMS host cert (long life) authz cert (short life) service cert (short life) authz cert (short life) proxy cert (short life) voms-proxy-init crl update registration registration LCAS edg-java-security
Local Site Authorization: Local Centre Authorization Service (LCAS)
Handles authorization requests to local fabric
authorization decisions based on proxy user certificate and job specification;
supports grid-mapfile mechanism.
Plug-in framework (hooks for external authorization plugins)
allowed users (grid-mapfile or allowed_users.db), banned users (ban_users.db), available timeslots (timeslots.db)
plugin for VOMS (to process authorization data)
Local Credential Mapping Service (LCMAPS)
provides local credentials needed for jobs in fabric
mapping based on user identity, VO affiliation, local site policy Local Site Authorization
Knowledge Discovery: Knowledge Discovery Will CrossGrid VO “export” or “discover” knowledge ?
Likely for Meteo applications
Partially only for HEP applications
First step: extending KDD to the Grid environment:
Data-mining on distributed databases (task 1.3-1.4, HEP & Meteo large databases)
Distributed query using:
Metadata + Replica catalogs
Interactive Database Server modules (i.e. O/R DBMS, PAW)
Queries in XML format
Distributed via MPICH-G2 in master-slave scheme
Mid-Large size databases, o(TB)
Data-mining algorithms adapted to the Grid:
Distributed Neural Network training
Self-Organizing Maps
Distributed also using MPICH-G2
Tests started ! Encouraging first results!
Modeling, benchmarking, performance prediction (CrossGrid WP2 tools)
Architecture: Architecture
Challenging issues to be discussed with other projects: Challenging issues to be discussed with other projects On-line Authentication mechanisms?
Proxy use for portals/roaming access
User understanding of Virtual Organizations:
Membership features
Permanent storage (personal/group/vo/external)
Optimal use (from accounting, scheduling to replication and resilience)
Active Security Policies (Grid-patrols)
Metadata publication for distributed databases
Transition to OGSA/OGSI:
Adapting current middleware
OGSA-DAI use
Distributed mechanisms (MPICH-G3?)
New knowledge discovery mechanisms
Summary: Summary Virtual Organizations & Security are key points in the CrossGrid project
Experience from real working testbed, thanks to the use of Globus GSI and EU-DataGrid middleware
Considerable effort on deployment (CA,RA,VO, Sites management):
an interoperable pan-european community (CrossGrid + DataGrid)
VOMS (EDG) opens new possibilities for VO
CrossGrid will make clear to the user the VO possibilities but also the security issues to assure a friendly environment:
Portal proxy-based secure access also to be “almost transparent”
User group and roles together with resource discovery and monitoring
Knowledge discovery can be seen as a final ideal environment for specific application users, progressing along this direction:
Data Mining on Distributed Databases prototypes being tested on a realistic Grid environment