The Grid:From Parallel to Virtualized Parallel Computing: The Grid: From Parallel to Virtualized Parallel Computing Michael Welzl http://www.welzl.at
DPS NSG Team http://dps.uibk.ac.at/nsg
Institute of Computer Science
University of Innsbruck
Habilitation talk
TU Darmstadt
14 June 2007
Outline: Outline Grid introduction
Middleware
first step towards virtualization
Research efforts
further steps towards virtualization
Conclusion
Grid Computing: Grid Computing A brief introduction
Introducing the Grid: Introducing the Grid History: parallel processing at a growing scale
Parallel CPU architectures
Multiprocessor machines
Clusters
(“Massively Distributed“) computers on the Internet GRID
logical consequence of HPC
metaphor: power grid just plug in, don‘t care where (processing) power comes from, don‘t care how it reaches you
Common definition: The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi institutional virtual organizations [Ian Foster, Carl Kesselman and Steven Tuecke, “The Anatomy of the Grid – Enabling Scalable Virtual Organizations”, International Journal on Supercomputer Applications, 2001]
Scope: Scope Definition quite broad (“resource sharing“)
Reasonable - e.g., computers also have harddisks
But also led to some confusion - e.g., new research areas / buzzwords: Wireless Grid, Data Grid, Semantic / Knowledge Grid, Pervasive Grid, [this space reserved for your favorite research area] Grid
Example of confusion due to broad Grid interpretation: “One of the first applications of Grid technologies will be in remote training and education. Imagine the productivity gains if we had routine access to virtual lecture rooms! (..) What if we were able to walk up to a local ‘power wall‘ and give a lecture fully electronically in a virtual environment with interactive Web materials to an audience gathered from around the country - and then simply walk back to the office instead of going back to a hotel or an airplane?“ [I. Foster, C. Kesselman (eds): “The Grid: Blueprint for a New Computing Infrastructure“, 2nd edition, Elsevier Inc. / MKP, 2004]
Clear, narrower scope is advisable for thinking/talking about the Grid
Traditional goal: processing power
Grid people = parallel people; thus, main goal has not changed much
The next Web?: The next Web? Ways of looking at the Internet
Communication medium (email)
Truly large kiosk (web)
The Grid way of looking at the Internet
Infrastructure for Virtual Teams
Most of the time...
the “real and specific goal“ is High Performance Computing
Virtual Organizations and Virtual Teams are well defined i.e. not an „open“ system, e.g. security is a big issue
Virtual Teams
Geographically distributed
Organizationally distributed
Yet work on a common problem It has been called“the next web“ But Web 2.0 is already here :-)
Virtual Organizations and Virtual Teams: Virtual Organizations and Virtual Teams Distributed resources and people
Linked by networks, crossing admin domains
Sharing resources, common goals
Dynamic
Austrian Grid E-science Grid applications: Austrian Grid E-science Grid applications Medical Sciences
Distributed Heart Simulation
Virtual Lung Biopsy
Virtual Eye Surgery
Medical Multimedia Data Management and Distribution
Virtual Arterial Tree Tomography and Morphometry
High-Energy Physics
CERN experiment analyses
Applied Numerical Simulation
Distributed Scientific Computing: Advanced Computational Methods in Life Science
Computational Engineering
High Dimensional Improper Integration Procedures
Astrophysical Simulations and Solar Observations
Astrophysical Simulations
Hydrodynamic Simulations
Federation of Distributed Archives of Solar Observation
Meteorologal Simulations
Environmental GRID Applications
Example: CERN Large Hadron Collider: Example: CERN Large Hadron Collider Largest machine built by humans: particle accelerator and collider with a circumference of 27 kilometers
Will generate 10 Petabytes (107 Gigabytes) of information per year … starting 2007!
This information must be processed and stored somewhere
Beyond the scope of a single institution to manage this problem
Projects: LCG (LHC Computing Grid), EGEE (Enabling Grids for E-sciencE)
Complexity: Complexity Grid poses difficult problems
Heterogeneity and dynamicity of resources
Secure access to resources with different users in various roles, belonging to VTs which belong to VOs
Efficient assignment of data and tasks to machines (“scheduling“)
Grid requirements: Grid requirements Computer scientists can tackle these problems
Grid application users and programmers are often not computer scientists
Important goal: ease of use
Programmer should not worry (too much) about the Grid
User should worry even less
Ultimate goal: write and use an application as if using a single computer (power grid metaphor)
How do computer scientists simplify?
Abstraction.
We build layers.
In a Grid, we typically have Middleware.
Grid Middleware: Grid Middleware
Grid computing without middleware: Grid computing without middleware Example manual Grid application execution
scp code to 10 machines
log in to the 10 machines via ssh and start “application > result“ everywhere
Estimate running time, or let application tell you that it‘s done (e.g. via TCP/IP communication in app code)
retrieve result files via scp
Tedious process - so write a script file
Do this again for every application / environment?
What if your colleagues need something similar?
Standards needed, tools introduced
Toolkits: Toolkits Most famous: Globus Toolkit
Evolution from GT2 via GT3 to GT4 influenced the whole Grid community
Reference implementation of Open Grid Forum (OGF) standards
Other well-known examples
Condor
Exists since mid-1980‘s
No Grid back then - system gradually evolved towards it
Traditional goal: harvest CPU power of normal user workstations many Grid issues always had to be addressed anyway
Special interfaces now enable Condor-Globus communication (“Condor-G“)
Unicore (used in D-Grid)
gLite (used in EGEE)
Issues that these middlewares (should) address
Load Balancing, error management
Authentification, Authorization and Accounting (AAA)
Resource discovery, naming
Resource access and monitoring
Resource reservation and QoS management
Grid Resource Allocation Manager (GRAM): Grid Resource Allocation Manager (GRAM) Globus tool for job execution
Unified, resource independent replacement for steps in “manual Grid“ example
Unified way to set environment variables: Resource Specification Language (RSL) (stdout = x, arguments = y, ..)
Steps 1-4 become
Blocking: “globus-job-run -stage hostname applicationname“
-stage option copies code to remote machine
Different architectures: recompilation needed – but not supported!
Nonblocking: scp code, then “globus-job-submit hostname applicationname“ (staging not yet supported)
Obtain unique URL, continuously use it to query job status
When done, use “globus-job-get-output URL stdout“ to retrieve stdout
More complex systems are built on top of GRAM
E.g. Message Passing Interface (MPI) for the Grid: MPICH-G2
GRAM /2: GRAM /2 GRAM leaves a lot of questions unanswered
How to recompile application for different architectures? (automatically + in a unified way)
What if your computer‘s IP address changes?
What if the 10 accessed computer‘s IP addresses change?
What if two of the computers becomes unavailable?
What if 3 other users start to work with 5 of the 10 computers?
A tool for each problem...
General-purpose Architecture for Reservation and Allocation (GARA) Integrated QoS via “advance reservation“ of resources (CPU, Disk, Network)
Monitoring and Discovery System (MDS) for locating and monitoring resources
Resource Broker (Globus: do it yourself; Condor: “matchmaker“) translates requirement specification (CPU, memory, ..) into IP address
Diversity of complex tools standardized + available in Globus, addressing some but not all of the issues need for an architecture
Evolution: moving towards an architecture: Evolution: moving towards an architecture OGSI / OGSA: Open Grid Service Infrastructure / Architecture
Open Grid Forum (OGF) standards
OGSA = service-oriented architecture; key concept for virtualization use a resource = call a service
OGSI = Web Services + state management
failed: too complex, not compliant with Web Service standards Source: Globus presentation by Ian Foster
Research towards the power outlet: Research towards the power outlet
Current SoA: Current SoA Standards are only specified when mechanisms are known to work
Globus only includes such working elements
Lots of important features missing
Practical issues with existing middlewares
Submitting a Globus job is very slow (Austrian Grid: approx. 20 seconds) significant granularity limit for parallelization!
Globus is a huge piece of software
Currently, some confusion about right location of features
On top of middleware? (research on top of Globus)
In middleware? (other Middleware projects)
In the OS? (XtreemOS)
Upcoming slides concern mechanisms which are mostly on top and partially within middleware
Automatic parallelization in Grids: Automatic parallelization in Grids Scheduling; important issue for “power outlet“ goal!
Automatic distribution of tasks and inter-task data transmissions = scheduling
Grid scheduling encompasses
Resource Discovery
Authorization Filtering, Application Requirement Definition, Minimal Requirement Filtering
System Selection
Dynamic Information Gathering
System Selection
Job Execution
(optional) Advance Reservation
Job Submission
Preparation Tasks
Monitoring Progress
Job Completion
Clean-up Tasks
So far, most scheduling efforts consider embarassingly parallel applications - typically parameter sweeps (no dependencies)
Condor case study: Condor case study Application name, parameters, etc. + requirements specified in ClassAds
“Requirements = Memory >= 256 && Disk > 10000; Rank = (KFLOPS*10000) + Memory“ only use computers which match requirements (else error), order them by rank
Explicit support for parameter sweeps: loop variables
Resources registered with description; “central manager“ checks pool against application ClassAds (“matchmaking“) every 5 minutes, assigns jobs
Checkpointing in Condor: need to recompile applications, link with special library (redirects syscalls)
Save current state for fault tolerance or vacating jobs
Because preempted by higher priority job, machine busy, or user demands it
Used in Grid Application Development Software Project (GrADS) for rescheduling (dynamic scheduling) and metascheduling (negotiation between multiple applications); ClassAds language extended
e.g., aggregation functions such as Max, Min, Sum
Grid workflow applications: Grid workflow applications Dependencies between applications (or large parts of applications) typically specified in Directed Acyclic Graph (DAG)
Condor: DAG manager (DAGMan) uses .dag file for simple dependencies
“Do not run job ‘B’ until job ‘A’ has completed successfully”
DAGMan scheduling: for all tasks do...
Find task with earliest starting time
Allocate it to processor with Earlierst Finish Time
Remove task from list
GriPhyN (Grid Physics Network) facilitates workflow design with “Pegasus“ (Planning for Execution in Grids) framework
Specification of abstract workflow: identify application components, formulate workflow specifying the execution order, using logical names for components and files
Automatic generation of concrete workflow (map components to resources)
Concrete workflow submitted to Condor-G/DAGMan
Grid Workflow Applications /2: Grid Workflow Applications /2 Components are built, Web (Grid) Services are defined, Activities are specified
Several projects (e.g. K-WF Grid) and systems (e.g. ASKALON) exist
Most applications have simple workflows
E.g. Montage: dissects space image, distributes processing, merges results
Scheduling example: HEFT algorithmStep 1 - task prioritizing: Scheduling example: HEFT algorithm Step 1 - task prioritizing Rank of a task: longest “distance“ to the end (Mean processing + transfer costs)
Tasks are sorted by decreasing rank order
Step 2 - processor selection (EFT): Step 2 - processor selection (EFT) 1 2 4 FT(T1, P1) = 1
FT(T1, P2) = 1
FT(T2, P1) = 1+0.5=1.5
FT(T2, P2) = 1+3+1.5=5.5
FT(T4, P1) = 1.5+1.5=3
FT(T4, P2) = 1.5+2+2.5=6
FT(T3, P1) = 3+2=5
FT(T3, P2) = 1.5+1+2=4.5
FT(T5, P1) = 4.5+2+0.5=7
FT(T5, P2) = 3+7+0.5=10.5 Processor idle + task ready Data transfer Task processing
HEFT discussion: HEFT discussion HEFT is not a solution, just a heuristic
problem is known to be NP-complete
Outperformed competitors (DAGMan scheduling, genetic algorithm) in ASKALON real-life experiments
Still, many improvements possible e.g., other functions than mean, and extension for rescheduling suggested
Heterogeneous network capacities and traffic interactions ignored Not detected!
Conclusion: Conclusion
How far have we come?: How far have we come? Remember: systems on last slides are still research
Not standardized, not part of reference middleware implementations
Right place (OS / Middleware / App) for some functions still undecided
A lot is still manual
Basically three choices for deploying an application on the Grid
Simply use it if it‘s a parameter sweep
“Gridify“ it (rewrite using customized allocation - e.g. MPICH-G2)
Utilize a workflow tool
Convergence between P2P systems and Grids has only just begun
Several issues and possible improvements
Large number of layers are a mismatch for high performance demands
Network usage simplistic, no customized mechanisms
Open issues: layering inefficiencyExample: loss of “connection“ semantics: Open issues: layering inefficiency Example: loss of “connection“ semantics IP TCP HTTP 1.0 SOAP Stateless Connection state Stateless Connection state Web Service Grid Service Doesn‘t care, can do both Stateless Stateful Breaking the chain
Open issues: Open issues Strangely, parallel processing background seems to be ignored
E.g., work on task-processor mapping + P2P overlays such as hypercube = ? Microcode Instruction level parallelism Arbitrary parallel applications Parametersweeps Workflow applications
Thank you!: Thank you! Questions?
Backup slides: Backup slides
Research gap: Grid-specificnetwork enhancements: Traditional Internet applications (web browser, ftp, ..) Driving a racing car on a public road Applications with special network properties and requirements Bringing the Grid to its full potential ! Research gap: Grid-specific network enhancements
Grid-network peculiarities: Grid-network peculiarities Special behavior
Predictable traffic pattern - this is totally new to the Internet!
Web: users create traffic
FTP download: starts ... ends
Streaming video: either CBR or depends on content! (head movement, ..)
Could be exploited by congestion control mechanisms
Distinction: Bulk data transfer (e.g. GridFTP) vs. control messages (e.g. SOAP)
File transfers are often “pushed“ and not “pulled“
Distributed System which is active for a while
overlay based network enhancements possible
Multicast
P2P paradigm: “do work for others for the sake of enhancing the whole system (in your own interest)“ can be applied - e.g. act as a PEP, ...
sophisticated network measurements possible
can exploit longevity and distributed infrastructure
Special requirements
file transfer delay predictions
note: useless without knowing about shared bottlenecks
QoS, but for file transfers only (“advance reservation“)
What is EC-GIN?: What is EC-GIN? European project: Europe-China Grid InterNetworking
STREP in IST FP6 Call 6
2.2 MEuro, 11 partners (7 Europe + 4 China)
Networkers developing mechanisms for Grids
Research Challenges: Research Challenges Research Challenges:
How to model Grid traffic?
Much is known about web traffic (e.g. self-similarity) - but the Grid is different!
How to simulate a Grid-network?
Necessary for checking various environment conditions
May require traffic model (above)
Currently, Grid-Sim / Net-Sim are two separate worlds (different goals, assumptions, tools, people)
How to specify network requirements?
Explicit or implicit, guaranteed or “elastic“, various possible levels of granularity
How to align network and Grid economics?
Combined usage based pricing for various resources including the network
What P2P methods are suitable for the Grid?
What is the right means for storing short-lived performance data?
Problem: How Grid people see the Internet: Problem: How Grid people see the Internet Abstraction - simply use what is available
still: performance = main goal
Wrong. Quote from a paper review:
“In fact, any solution that requires changing the TCP/IP protocol stack is practically unapplicable to real-world scenarios, (..).“
How to change this view
Create awareness - e.g. GGF GHPN-RG published documents such as “net issues with grids“, “overview of transport protocols“
Develop solutions and publish them! (EC-GIN, GridNets) Just like Web Service community Absolutely not like Web Service community ! Existing transport system (TCP/IP + Routing + ..) works well
QoS makes things better, the Grid needs it!
we now have a chance for that, thanks to IPv6
A time-to-market issue: A time-to-market issue Result: thesis + running code; tests in collaboration with different research areas Result: thesis + simulation code; perhaps early real-life prototype (if students did well) Typical Grid project Typical Network project
Machine-only communication: Machine-only communication Trend in networks: from support of Human-Human Communication
email, chat
via Human-Machine Communication
web surfing, file downloads (P2P systems), streaming media
to Machine-machine Communication
Growing number of commercial web service based applications
New “hype“ technologies: Sensor nets, Autonomic Computing vision
Semantic Web (Services): first big step for supporting machine-only communication at a high level
So far, no steps at a lower level
This would be like RTP, RTCP, SIP, DCCP, ... for multimedia apps: not absolutely necessary, but advantageous
The long-term value of Grid-net research: The long-term value of Grid-net research A subset of Grid-net developments will be useful for other machine-only communication systems!
Key for achieving this: change viewpoint from “what can we do for the Grid“ to “what can the Grid do for us“ (or from “what does the Grid need“ to “what does the Grid mean to us“)
Large stacks: Large stacks IP TCP HTTP SOAP Middleware WS-RF Grid apps
The Grid and P2P systems: The Grid and P2P systems Look quite similar
Goal in both cases: resource sharing
Major difference: clearly defined VOs / VTs
No incentive considerations
Availability not such a big problem as in P2P case
It is an issue, but at larger time scales
(e.g. computers in student labs should be available after 22:00, but are sometimes shut down by tutors)
Scalability not such a big issue as in P2P case
...so far! convergence as Grids grow
coordinated resource sharing and problem solving in dynamic, multi institutional virtual organizations (Grid, P2P)
How the tools are applied in practice: How the tools are applied in practice Web Browser Compute Server Data Catalog Data Viewer Tool Certificate
authority Chat Tool Credential Repository Web Portal Compute Server Resources implement standard access & management interfaces Collective services aggregate &/or virtualize resources Users work with client applications Application services organize VOs & enable access to other services Database
service Database
service Database
service Simulation Tool Camera Camera Telepresence Monitor Registration Service Source: Globus presentation by Ian Foster
Slide44: Data Mgmt Security Common Runtime Execution Mgmt Info Services Web Services Components Non-WS Components Pre-WS
Authentication
Authorization GridFTP Pre-WS
Grid Resource
Alloc. & Mgmt Pre-WS Monitoring
& Discovery C Common
Libraries Authentication
Authorization Reliable
File
Transfer Data Access
& Integration Grid Resource
Allocation &
Management Index Java WS Core Community Authorization Replica
Location eXtensible
IO (XIO) Credential
Mgmt Community
Scheduling
Framework Delegation Example: Globus Toolkit version 4 (GT4) Data
Replication Trigger C WS Core Python WS Core WebMDS Workspace
Management Grid Telecontrol Protocol Contrib/ Preview Core Depre- cated Source: Globus presentation by Ian Foster
Automatic parallelization: Automatic parallelization Has been addressed in the past
Microcode parallelism (pipelining in CPU)
Relatively easy: simple dependencies
Instruction level parallelism
More complex dependencies
Can automatically be analyzed by compiler
Reordering, loop unrolling, .. for (i=1; i<100; i++)
a[i] = a[i] + b[i] * c[i]; /* Thread 1 */
for (i=1; i<50; i++)
a[i] = a[i] + b[i] * c[i];
/* Thread 2 */
for (i=50; i<100; i++)
a[i] = a[i] + b[i] * c[i]; (Intel C++ compiler)
Automatic parallelization /2: Automatic parallelization /2 Parallel Computing: complete applications parallelized
Very complex dependencies
Decomposition methods + mapping of tasks onto processors: usually not automatic (depends on problem and interconnection network)
Algorithm specific methods developed (matrix operations, sorting, ..)
Some parts can be automatized, but not everything explicit parallelism (OpenMP) and even allocation (MPI) quite popular
Some research efforts on half-automatic parallelization (“manual“ aid)
Programmer knows about problem-specific locality needs (interacting code elements)
Examples:
Java extensions such as JavaSymphony [Thomas Fahringer, Alexandru Jugravu]
HPF+ HALO concept [Siegfried Benkner]
Slide47: Source: http://www.dps.uibk.ac.at/projects/teuta/