ecsucresisoct19 06

Uploaded from authorPOINTLite
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Cyberinfrastructure  to integrate simulation, data and sensors for collaborative eScience in CRESIS : 

Cyberinfrastructure  to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org

Abstract: 

Abstract Cyberinfrastructure supports eScience or collaborative science with distributed scientists, computers, data repositories and sensors. We describe the emerging Grid software for eScience and the underlying Cyberinfrastructure such as the TeraGrid. We  give one examples in detail: iSERVO – the International Solid Earth Research Virtual Organization supporting Earthquake Science This illustrates Computing Grids, Geographical Information System Grids, Sensor Grids We suggest implications for CReSIS – Center for Remote Sensing of Ice Sheets

Why Cyberinfrastructure Useful: 

Why Cyberinfrastructure Useful Supports distributed science – data, people, computers Exploits Internet technology (Web2.0) adding management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components Cyberinfrastructure is in general a distributed collection of parallel systems Grids are made of services that are “just” programs or data sources packaged for distributed access

e-moreorlessanything and the Grid: 

e-moreorlessanything and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. The growing use of outsourcing is one example The Grid provides the information technology e-infrastructure for e-moreorlessanything. A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported

TeraGrid: Integrating NSF Cyberinfrastructure: 

TeraGrid: Integrating NSF Cyberinfrastructure TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.

Virtual Observatory Astronomy Grid Integrate Experiments: 

Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Visible + X-ray Dust Map Galaxy Density Map

Grid Capabilities for Science: 

Grid Capabilities for Science Open technologies for any large scale distributed system that is adopted by industry, many sciences and many countries (including UK, EU, USA, Asia) Security, Reliability, Management and state standards Service and messaging specifications User interfaces via portals and portlets virtualizing to desktops, email, PDA’s etc. ~20 TeraGrid Science Gateways (their name for portals) OGCE Portal technology effort led by Indiana Uniform approach to access distributed (super)computers supporting single (large) jobs and spawning lots of related jobs Data and meta-data architecture supporting real-time and archives as well as federation Links to Semantic web and annotation Grid (Web service) workflow with standards and several successful instantiations (such as Taverna and MyLead) Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC, SERVO; LTER and NEON for Environment http://www.nsf.gov/od/oci/ci-v7.pdf

APEC Cooperation for Earthquake Simulation: 

APEC Cooperation for Earthquake Simulation ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication iSERVO is Infrastructure to support work of ACES SERVOGrid is (completed) US Grid that is a prototype of iSERVO http://www.quakes.uq.edu.au/ACES/ Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies

Slide9: 

Database Analysis and Visualization Portal Repositories Federated Databases Data Filter Services Streaming Data Sensors SERVOGrid Research Simulations Research Education Customization Services From Research to Education Education Grid Computer Farm Grid of Grids: Research Grid and Education Grid Sensor Grid Database Grid Compute Grid

SERVOGrid and Cyberinfrastructure: 

SERVOGrid and Cyberinfrastructure Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport Internet scale managed services that link computers data repositories sensors instruments and people There is a portal and services in SERVOGrid for Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ….. Job management and monitoring web services for running the above codes. File management web services for moving files between various machines. Geographical Information System services Quaketables earthquake specific database Sensors as well as databases Context (dynamic metadata) and UDDI system long term metadata services Services support streaming real-time data

Some Grid Concepts I: 

Some Grid Concepts I Services are “just” (distributed) programs sending and receiving messages with well defined syntax Interfaces (input-output) must be open; innards can be open source (allowing you to modify) or proprietary Services can be any language from Fortran, Shell scripts, C, C#, C++, Java, Python, Perl – your choice!! Web Services supported by all vendors (IBM, Microsoft …) Service overhead will be just a few milliseconds (more now) which is < typical network transit time Any program that is distributed can be a Web service Any program taking execution time ≥ 20ms can be an efficient Web service

Web services: 

Web services Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. Web Services interact by exchanging messages in SOAP format The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

A typical Web Service: 

A typical Web Service In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python Payment Credit Card Warehouse Shipping control WSDL interfaces WSDL interfaces Web Services Web Services

Some Grid Concepts II: 

Some Grid Concepts II Systems are built from contributions from many different groups – you do not need one “vendor” for all components as Web services allow interoperability between components One reason DoD likes Grids (called Net-Centric computing) Grids are distributed in services and data allowing anybody to store their data and to produce “their” view Some think that University Library of future will curate/store data of their faculty “2 level programming model”: Classic programming of services and services are composed using workflow consistent with industry standards (BPEL) Grid of Grids: (System of Systems) Realistically Grid-like systems will be built using multiple technologies and “standards” –integrate separate Grids for Sensors, GIS, Visualization, computing etc. with OGSA (Open Grid Service Architecture from OGF) system Grid (Security, registry) into a single Grid Existing codes UNCHANGED; wrap as a service with metadata

Slide16: 

TeraGrid User Portal

LEAD Gateway Portal: 

LEAD Gateway Portal NSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow

Grid Workflow Data Assimilation in Earth Science: 

Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts

SERVOGrid has a portal: 

SERVOGrid has a portal The Portal is built from portlets – providing user interface fragments for each service that are composed into the full interface – uses OGCE technology as does planetary science VLAB portal with University of Minnesota

GIS and Sensor Grids: 

GIS and Sensor Grids OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors GML Geography Markup language defines specification of geo-referenced data SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information Grid workflow links services that are designed to support streaming input and output messages We built Grid (Web) service implementations of these specifications for NASA’s SERVOGrid Use Google maps as front end to WMS and WFS

Grid Workflow Datamining in Earth Science: 

Grid Workflow Datamining in Earth Science Work with Scripps Institute Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California NASA GPS Earthquake

Slide22: 

Earth/Atmosphere Grids built as Grids of (library) Grids Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations Physical Network Registry Metadata Earthquake Data, Filters & Simulation Services Earthquake SERVOGrid Ice Sheet PolarGrid … … Data Access/Storage Portals Visualization Grid Collaboration Grid Sensor Grid Compute Grid GIS Grid Core Grid Services

CReSIS PolarGrid: 

CReSIS PolarGrid Important CReSIS-specific Cyberinfrastructure components include Managed data from sensors and satellites Data analysis such as SAR processing – possibly with parallel algorithms Electromagnetic simulations (currently commercial codes) to design instrument antennas 3D simulations of ice-sheets (glaciers) with non-uniform meshes GIS Geographical Information Systems Also need capabilities present in many Grids Portal i.e. Science Gateway Submitting multiple sequential or parallel jobs

What should we do?: 

What should we do? Identify existing programs that should be wrapped as Grid services One can do this even for commercial codes as one keeps existing codes (Fortran, C++) unchanged and constructs a “metadata” wrapper defining where programs and its data are located and how to invoke. Identify where parallel versions needed and if help needed in creating these Parallel codes can be Grid services Electromagnetic codes are commercial – in principle parallel Ice sheet models can be parallelized for high resolution simulations Scope out system; Computational needs -Identify value of TeraGrid; data storage needs; network requirements Examine data model and produce a data Grid architecture Use databases? Distributed? Metadata? Files? What are key performance issues? Examine integration of GIS with Grid Services Design and implement Science Gateway Are there important visualization requirements outside GIS? Are there key issues from security? Bring up core services such as registries Need infrastructure to run services (Linux PC)

Benefits of CReSIS PolarGrid: 

Benefits of CReSIS PolarGrid Shared resources support collaboration among CReSIS scientists Integration of Polar related data with appropriate compute resources enabling research on specific topics and studies across topics Polar Science Gateway accessing common services (programs), data and their integration as workflow Access to TeraGrid with same interface for large scale simulations Can share common capabilities (SAR analysis, GIS) with related Grids such as SERVOGrid, GEON, LEAD etc. Modular Grid services allow exchange of new capabilities preserving systems e.g. Change EM Simulation service Management of dynamic heterogeneous data

SERVO/QuakeSim Services Eye Chart: 

SERVO/QuakeSim Services Eye Chart

Service Eye Chart Continued: 

Service Eye Chart Continued

Key GIS and Related Services: 

Key GIS and Related Services