logging in or signing up casc yelick Pumbaa Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 62 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript WTEC Panel on High End Computing in JapanSite visits: March 29 - April 3, 2004: WTEC Panel on High End Computing in Japan Site visits: March 29 - April 3, 2004 Study Commissioned By: National Coordination Office Department of Energy National Science Foundation National Aeronautics and Space Administration WTEC Overview: WTEC Overview Provides assessments of research and development This was one of 55 international technology assessments done by WTEC WTEC Process Write proposals for NSF “umbrella” grants Put together a coalition of sponsors Recruit a panel of experts Conduct the study with on-site visits Publish a report Full text reports at wtec.orgPurpose & Scope of this Study: Purpose & Scope of this Study Gather information on current status and future trends in Japanese high end computing Govt agencies, research communities, vendors Focus on long-term HEC research in Japan Compare Japanese and U.S. HEC R&D Provide review of ES development process and operational experience Include user experience and its impact on computer science and computational science communities Report on follow-on projects Determine HEC areas amenable for Japan-U.S. cooperation to accelerate future advancesWTEC HEC Panel Members: WTEC HEC Panel Members Al Trivelpiece (Panel Chair) Former Director Oak Ridge National Laboratory Rupak Biswas Group Lead, NAS Division NASA Ames Research Center Jack Dongarra Director, Innovative Computing Lab University of Tennessee & Oak Ridge National Laboratory Peter Paul Deputy Director, S&T Brookhaven National Laboratory Horst Simon (Advisor) Director, NERSC Lawrence Berkeley National Lab Kathy Yelick Computer Science Professor University of California, Berkeley Dan Reed (Advisor) Computer Science Professor University of North Carolina, Chapel Hill Praveen Chaudhari (Advisor) Director Brookhaven National LaboratorySites Visited (1): Sites Visited (1) Earth Simulator Center Frontier Research System for Global Change National Institute for Fusion Science (NIFS) Japan Aerospace Exploration Agency (JAXA) University of Tokyo Tokyo Institute of Technology National Institute of Advanced Industrial S&T (AIST) High Energy Accelerator Research Org. (KEK) Tsukuba University Inst. of Physical and Chemical Research (RIKEN) National Research Grid Initiative (NAREGI) Research Org. for Information Sci. & Tech. (RIST) Japan Atomic Energy Research Institute (JAERI)Sites Visited (2): Sites Visited (2) Council for Science and Technology Policy (CSTP) Ministry of Education, Culture, Sports, Science, and Technology (MEXT) Ministry of Economy, Trade, and Industry (METI) Fujitsu Hitachi IBM-Japan Sony Computer Entertainment Inc. (SECI) NECHEC Business and Government Environment in Japan: HEC Business and Government Environment in JapanGovernment Agencies: Government Agencies Council for Science & Tech. Policy (CSTP) Cabinet Office, PM resides over monthly meetings Sets strategic directions for S&T Rates proposals submitted to MEXT, METI and others Ministry of Education, Culture, Sports, Science, and Technology (MEXT) Funds most of S&T R&D activities in Japan Funded the Earth Simulator Ministry of Economy, Trade, & Industry (METI) Administers industrial policy Funds R&D projects with ties to industry Not interested in HEC, except for gridsBusiness and Government: Business and Government New Independent Administrative Institution (IAI) model Some research institutes had already converted Universities were being converted during our visit Govt. funds institution as whole; control own budget Funding being cut annually as well Commercial viability of vector supers is problematic. Only NEC still committed to this architectural model Commodity PC clusters increasingly prevalent All three Japanese vendors have cluster productsBusiness Partnerships: Business Partnerships Each of the Japanese vendors is partnered with a US vendor NEC and Cray ? Fujitsu and Sun Microsystems Hitachi and IBM HEC Hardware in Japan: HEC Hardware in JapanArchitecture/Systems Continuum: Architecture/Systems Continuum Commodity processor with commodity interconnect Clusters Pentium, Itanium, Opteron, Alpha, PowerPC GigE, Infiniband, Myrinet, Quadrics, SCI NEC TX7 Fujitsu IA-Cluster Commodity processor with custom interconnect SGI Altix Intel Itanium 2 Cray Red Storm AMD Opteron Fujitsu PrimePower Sparc based Custom processor with custom interconnect Cray X1 NEC SX-7 Hitachi SR11000 Loosely Coupled Tightly CoupledFujitsu PRIMEPOWER HPC2500: Fujitsu PRIMEPOWER HPC2500 SMP Node 8‐128CPUs ・ ・ ・ ・ High Speed Optical Interconnect 128Nodes Crossbar Network for Uniform Mem. Access (SMP within node) to High Speed Optical Interconnect ・ ・ ・ System Board x16 Channel to I/O Device D T U to Channels <DTU Board> memory CPU Adapter Adapter CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU <System Board> DTU : Data Transfer Unit PCIBOX … … D T U D T U D T U SMP Node 8‐128CPUs SMP Node 8‐128CPUs SMP Node 8‐128CPUs memory <System Board> Channel 4GB/s x4 Fujitsu IA-Cluster: System Configuration: Fujitsu IA-Cluster: System Configuration FUJITSU PRIMERGY (1U) PRIMERGY BX300 Max. 20 blades in a 3U chassis PRIMERGY RXI600 IPF(1.5GHz): 2~4CPU Giga Ethernet Switch InfiniBand or Myrinet Switch Compute Nodes Compute Network InfiniBand, Myrinet Control Network Compute Node Compute Network System Configuration Latest Installation of FUJITSU HPC Systems: Latest Installation of FUJITSU HPC SystemsSlide16: HITACHI’s HPC system VOS3/HAP,HI-OSF/1-MJ HI-UX/MPP '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 M-200H IAP '92 '93 '94 '95 '96 '97 '98 '99 '00 0.01 0.1 1 10 S-820 S-810 20 10 5 80 60 40 20 15 M-280H IAP M-680 IAP 100 S-3800 S-3800 S-3600 Peak Performance [GFLOPS] 480 140 180 120 SR2201 Vector-Scalar combined type B A 1,000 10,000 F1 E1 C SR8000 SR11000 100,000 '01 '02 '03 G1 D Vector Automatic Vectorization Automatic Pseudo Vectorization Auto Parallelization ‘04 ‘05 POWER4+ AIX 5L Vector-Scalar Combined type H1 First Japanese Vector Supercomputer Single CPU peak performance 8GFlops (Fastest in the world) First commercially available distributed memory parallel processor Single CPU peak performance 3GFlops Integrated Array Processor system First HPC machine combined with vector processing and scalar processing Scalar Parallel (MPP type)Slide17: Vector Register MS Vector SR8000 Pseudo Vector Processing (PVP) Arithmetic Unit Pipelining Pipelining Prefetch Read data from main memory to cache before calculation - Accelerate sequential data access Preload - Read data from main memory to Floating Registers before calculation - Accelerate stride memory access and indirectly addressed memory access Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak PVP Feature MS Cache Load Floating-point Registers (FPRs) Arithmetic Unit Pseudo VectorHitachi SR11000: Hitachi SR11000 Based on IBM Power 4+ SMP with 16 processors/node 109 Gflop/s / node(6.8 Gflop/s / p) IBM uses 32 in their machine IBM Federation switch Hitachi: 6 planes for 16 proc/node IBM uses 8 planes for 32 proc/node Pseudo vector processing features Minimal hardware enhancements Fast synchronization No preload like SR 8000 Hitachi’s Compiler effort is separate from IBM Automatic vectorization, no plans for HPF 3 customers for the SR 11000, National Institute for Material Science Tsukuba - 64 nodes (7 Tflop/s) Okasaki Institute for Molecular Science - 50 nodes (5.5 Tflops) Institute for Statistic Math Institute - 4 nodesSlide19: Vector Register MS Vector SR11000 Pseudo Vector Processing (PVP) Arithmetic Unit Pipelining Pipelining Prefetch Read data from main memory to cache before calculation - Accelerate sequential data access Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak PVP Feature MS Cache Load Floating-point Registers (FPRs) Arithmetic Unit Pseudo VectorSR11000 Next Model: Continuing IBM partnership Power5 processor Greatly enhanced memory bandwidth - Flat Memory Interleaving Hardware Barrier Synchronisation Register SR11000 Next ModelNEC HPC Products: Express5800/ 1160Xa Middle - Small Size Capacity Computing SX-6/7 Series NEC HPC Products Parallel Vector Processors IA-64 SERVER TX7 Parallel PC- Clusters IA-32 Workstations TX7 SERIES Express 5800/50 Series High-End Capability Computing Express 5800 Parallel Linux ClusterTX7 Itanium² Server: TX7 Itanium² Server cc-NUMA architecture employs a chipset and crossbar switch developed in-house by NEC achieves near uniform high-speed memory access. up to 32 Itanium² Processors up to 128 GB of RAM Linux operating system with NEC enhancements more than 100GF on Linpack file server functionality for SX SX-Series Evolution: SX Series -THE FIRST COMPUTER IN THE WORLD SURPASSING 1GFLOPS SX-4 Series -CMOS INNOVATIVE TECHNOLOGY -ENTIRELY AIR-COOLING SX-5 Series -HIGH SUSTAINED PERFORMANCE -Large Capacity SHARED MEMORY 1983 1989 1998 1994 SX-3 Series -SHARED MEMORY・MULTI-FUNCTION PROCESSOR -UNIX OS SX-6 Series - SINGLE-CHIP VECTOR PROCESSOR -GREATER SCALABILITY 2001 NEXT GENERATION SX USE THE LATEST TECHNOLOGY TO BUILD UP AND DEVELOP THE NEW SUPERCOMPUTER The Latest Technology Always in SX-Series SX-Series EvolutionNEC SX-7/160M5: NEC SX-7/160M5 SX-6: 8 proc/node 8 GFlop/s, 16 GB processor to memory SX-7: 32 proc/node 8.825 GFlop/s, 256 GB, processor to memory Special Purpose: GRAPE-6: Special Purpose: GRAPE-6 The 6th generation of GRAPE (Gravity Pipe) Project Gravity (N-Body) calculation for many particles with 31 Gflop/s / chip 32 chips / board - 0.99 Tflop/s / board 64 boards of full system is installed in University of Tokyo - 63 Tflop/s On each board, all particles data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated No software! Gordon Bell Prize at SC for a number of years (Prof. Makino, U. Tokyo) Sony PlayStation2: Sony PlayStation2 Emotion Engine: 6 Gflop/s peak Superscalar MIPS 300 MHz core + vector coprocessor + graphics/DRAM About $200 70M sold PS1 100M sold 8K D cache; 32 MB memory not expandable OS goes here as well 32 bit fl pt; not IEEE 2.4GB/s to memory (.38 B/Flop) Potential 20 fl pt ops/cycle FPU w/FMAC+FDIV VPU1 w/4FMAC+FDIV VPU2 w/4FMAC+FDIV EFU w/FMAC+FDIVHigh-Performance Chips Embedded Applications: High-Performance Chips Embedded Applications The driving market is gaming (PC and game consoles) Motivation for almost all the technology developments. Demonstrate that arithmetic is quite cheap. Today there are three big problems with these apparent non-standard "off-the-shelf" chips. Most of these chips have very limited memory bandwidth and little if any support for inter-node communication. Integer or only 32 bit floating point No software support to map scientific applications to these processors; minimal general-purpose programming tools. Poor memory capacity for program storage Not clear that they do much for scientific computing. Developing "custom" software is much more expensive than developing custom hardware. TOP500 Data: TOP500 DataTop 20 Computers Where They are Located: Top 20 Computers Where They are LocatedEfficiency is Declining Over time: Efficiency is Declining Over time Analysis of top 100 machines in 1994 and 2004 Shows the # of machines in the top 100 that achieve a given efficiency on the Linpack benchmark In 1994 40 machines had >90% efficiency In 2004 50 have < 50% efficiency ESS Impact on Climate Modeling: ESS Impact on Climate Modeling NERSC IBM SP3: 1 simulated year per compute day on 112 processors ORNL/NCAR IBM SP4: ~2 simulated years per compute day on 96 processors ORNL/NCAR IBM SP4: 3 simulated years per compute day on 192 processors ESS: 40 simulated years per compute day on unknown number of processors (probably ~128) Cray X1 rumor: 14 simulated years per compute day on 128 procs. Source: Michael WehnerTechnology Transfer from Research: Technology Transfer from Research Numerical Wind Tunnel → Fujitsu VPP500 cp-pacs → Hitachi SR2201 Earth Simulator → NEC SX-6 Grape, MDM, eHPC, … → ?(MD-engine) Government projects encouraged new architectures. New technologies were commercialized. Hardware Summary: Hardware Summary The commercial viability of "traditional" supercomputing architectures with vector processors and high-bandwidth memory subsystems is problematic. NEC only remaining in Japan Clusters are replacing traditional high-bandwidth systems HEC Software in Japan: HEC Software in JapanSoftware Overview: Software Overview Emphasis on vendor software Fujitsu, Hitachi, NEC Earth Simulator software Languages and compilers Persistent effort in High Performance Fortran Including HPF/JA extensions Use of common libraries Little academic work for supercomputers: vendors supply tools Support for clusters Achievements HPF on the Earth Simulator: Achievements HPF on the Earth Simulator PFES Oceanic General Circulation Model based on Princeton Ocean Model Achieved 9.85TFLOPS with 376 nodes 41% of the peak performance Impact3D Plasma fluid code using Total Variation Diminishing (TVD) scheme Achieved 14.9 TFLOPS with 512 nodes 45% of the peak performanceSlide37: HPF/JA Extensions HPF research in language and compilers HPF 2.0 extends HPF 1.0 for irregular apps HPF/JA further extends HPF for performance REFLECT: placement of near-neighbor communication LOCAL: communication not needed for a scope Extended ON HOME: partial computation replication Compiler doesn’t need full interprocedural communication and availability analyses HPF/JA was a consortium effort by vendors NEC, Hitachi, FujitsuVectorization and Parallelization on the Earth Simulator (NEC): Vectorization and Parallelization on the Earth Simulator (NEC) Processor Node AP AP AP AP AP AP AP AP Intra-node Parallelization HPF Open MP Vectorization Inter-node Parallelization HPF MPI Automatic parallelizationHitachi: Hitachi Example of applied image DO j=1,m b Intra-node elementwise parallel processing (COMPAS) DO i=1,l DO k=1,n Vector processing in IP (With PVP) Inter-node parallelization (With parallel libraries) Inner DO loop Parallelized with parallel libraries (HPF,MPI,PVM,etc.) COMPAS (Automatic parallelization) PVP (Automatic pseudo vectorization) Node IP PVP: Pseudo Vector Processing COMPAS: CO-operative Micro-Processors in single Address Space IP : Instruction Processor Inter-nodeConclusions: Conclusions Longer sustained effort on HPF than in the US Part of the Earth Simulator Vision Successful on two of the large codes, including GB prize Languages extensions were also needed MPI is dominant model for internode communication Although larger nodes on Vector/Parallel means smaller degree of MPI parallelism Combined with automatic vectorization within nodes Other familiar tools developed outside Japan: numerical libraries, debuggers, etc.Grid Computing in Japan: Grid Computing in Japan Kathy Yelick U.C. Berkeley and Lawrence Berkeley National LaboratoryOutline: Outline Motivation for Grid Computing in Japan E-Business, E-Government, Science Summary of grid efforts Labs, Universities, Grid Research Contributions Hardware Middleware Applications Funding summaryGrid Motivation: Grid Motivation e-Japan: create a "knowledge-emergent society," where everyone can utilize IT In 2001, Japan internet usage was at the lowest level among major industrial nations Four strategies to address this: Ultra high speed network infrastructure Facilitate electronic commerce Realize electronic government Key is information sharing across agencies and society Nurturing high quality human resources Training, support of researchers, etc.Overview of Grid Projects in Japan: Overview of Grid Projects in Japan Super-SINET (NII) National Research Grid Initiative (NAREGI) Campus Grid(Titech) Grid Technology Research Center (AIST) Information Technology Based Lab (ITBL) Applications: VizGrid (JAIST) BioGrid (Osaka-U) Japan Virtual Observatory (JVO)Slide45: SuperSINET: All Optical Production Research Network 10Gbps Photonic Backbone GbEther Bridges for peer-connection 6,000+km dark fiber 100+ e-e lambda and 300+Gb/s Operational since Jan. 2002NAREGI: National Research Grid Initiative: NAREGI: National Research Grid Initiative Funded by MEXT: Ministry of Education, Culture, Sports,Science and Technology 5 year project (FY2003-FY2007) 2 B Yen(~17M$) budget in FY2003 Collaboration of National Labs. Universities and Industry in the R&D activities Applications in IT and Nano-science Acquisition of Computer Resources underwayNAREGI Goals: NAREGI Goals Develop a Grid Software System: R&D in Grid Middleware and Upper Layer Prototype for future Grid Infrastructure in scientific research in Japan Provide a Testbed 100+Tflop/s expected by 2007 Demonstrate High-end Grid Computing Environment can be applied to Nano-science Simulations over the Super SINET Participate in International Collaboration U.S., Europe, Asian Pacific Contribute to standards activities, e.g., GGFSlide48: ~3000 CPUs ~17 Tflops Center for GRID R&D (NII) ~5 Tflops Comp. Nano-science Center (IMS) ~10 Tflops Osaka Univ. BioGrid TiTech Campus Grid AIST SuperCluster Kyushu Univ. Small Test App Clusters NAREGI Phase 1 Testbed Super-SINET (10Gbps)IT-Based Laboratory (ITBL): IT-Based Laboratory (ITBL) ITBL(IT-based Laboratory) Government Labs: NAL, RIKEN, NIED, NIMS, JST, JAERI Project period: 2001-2005 (3-stage project) Total funding ~$105M Applications: mechanical simulation, computational biology, material science, environment, earthquake engineering Step 1: Supercomputer centers of government lab are networked via SuperSINET Step 2: “Virtual Research Environment”: Grid-enabling laboratory applications Step 3: Sharing information among researchers from widely distributed disciplines and institutionsTitech Campus Grid: Titech Campus Grid Blade systems with 800 PC processors total Spread over 13 locations, 2 Titech campuses Connected via Super TITANET (1-4 Gbps) 1.2 Tflops 25 Tbytes storage 24-processor Satellite Systems @ each of 12 departments Super SINET (10 Gbps) to other Grids Grid-wide Single System Image via Grid middleware: Globus, Ninf-G, Condor, NWS, … Super TITANET (1-4Gbps) Suzukake-dai Campus Oo-okayama Campus NEC Express 5800 Series Blade Servers 30km High Density GSIC Main Clusters (256 processers) x 2 systems in just 5 cabinets with MyrinetSlide51: NII System for Grid R&D (5 Tflops, 700GB) Slide52: IMS System for Grid R&D (10 Tflops, 5TB)AIST Super Cluster for Grid R&D: AIST Super Cluster for Grid R&D M64 P32 Myrinet 10,800mm 10,200mm P32: IBM eServer325 Opteron 2.0GHz, 6GB 2way x 1074 node Myrinet 2000 8.59TFlops/peak M64: Intel Tiger 4 Madison 1.3GHz, 16GB 4way x 131 node Myrinet 2000 2.72TFlops/peak F32: Linux Networx Xeon 3.06GHz, 2GB 2way x 256+ node GbE 3.13TFlops/peak total 14.5TFlops/peak, 3188 CPUsNAREGI Grid Software Stack: NAREGI Grid Software Stack WP6: Grid-Enabled Apps WP3: Grid PSE WP3: Grid Workflow WP1: SuperScheduler WP1: Grid Monitoring & Accounting WP2: Grid Programming -Grid RPC -Grid MPI WP3: Grid Visualization WP1: Grid VM (Globus,Condor,UNICOREOGSA) WP5: High-Performance & Secure Grid Networking WP4: Packaging Note: WP = “Work Package”R&D in Grid Software and Networking Area (Work Packages): WP-1: Lower and Middle-Tier Software for Resource Management: Matsuoka (Titech), Kohno(ECU), Aida (Titech) WP-2: Grid Programming Middleware: Sekiguchi (AIST), Ishikawa(AIST) WP-3: User-Level Grid Tools & PSE: Miura (NII), Sato (Tsukuba-u), Kawata(Utsunomiya-u) WP-4: Packaging and Configuration Management: Miura (NII) WP-5: Networking, Security & User Management Shimojo (Osaka-u), Oie ( Kyushu Tech.), Imase(Osaka-u) WP-6: Grid-enabling tools for Nanoscience Apps. Aoyagi (Kyushu-u) R&D in Grid Software and Networking Area (Work Packages)WP-1: Lower and Middle-Tier Software for Resource Management: WP-1: Lower and Middle-Tier Software for Resource Management Unicore Condor Globus Interoperability Adoption of ClassAds Framework Meta-scheduler Scheduling Schema, Workflow Engine, Broker Function Grid Information Service Attaches to multiple monitoring frameworks User and job auditing and accounting Self-Configurable Management & Monitoring GridVM (Lightweight Grid Virtual Machine) Support for co-scheduling, resource Control Node (IP) virtualization Interfacing with OGSA (Open Grid Services Architecture) WP-2:Grid Programming GridRPC/Ninf-G2 : WP-2:Grid Programming GridRPC/Ninf-G2 Server side Client side Client GRAM 3. invoke Executable 4. connect back Numerical Library IDL Compiler Remote Executable 1. interface request 2. interface reply fork MDS Interface Information LDIF File retrieve IDL FILE generate GridRPC: Programming with Remote Procedure Calls (RPC) on the Grid GridRPC API standardization by GGF Ninf-G is a reference implementation of GridRPC Implemented on Globus Toolkit (C and Java APIs) Used by groups outside JapanWP-2:Grid ProgrammingGridMPI: WP-2:Grid Programming GridMPI GridMPI: Programming with MPI on the Grid Environment to run MPI applications efficiently in the Grid. Flexible and heterogeneous process invocation on each compute node GridADI and Latency-aware communication topology: Optimizes communication over non-uniform latency Hides the differences of lower-level communication libraries Extremely efficient implementation based on MPI on Score (Not MPICHI-PM)WP-3: User-Level Grid Tools & PSEs: WP-3: User-Level Grid Tools & PSEs Grid Workflow Workflow Language Definition GUI(Task Flow Representation) Visualization Tools Real-time volume visualization on the Grid PSE /Portals Multiphysics/Coupled Simulation Application Pool Collaboration with Nano-science Applications GroupWP-4: Packaging and Configuration Management: WP-4: Packaging and Configuration Management Collaboration with WP1 management Activities Selection of packagers to use Interface with autonomous configuration management (WP1) Test Procedure and Harness Testing Infrastructure c.f. NSF NMI packaging and testing WP-5: Network Measurement, Management & Control: WP-5: Network Measurement, Management & Control Traffic measurement on SuperSINET Optimal QoS Routing based on user policies and network measurements Robust TCP/IP Control for Grids Grid CA/User Grid Account Management and DeploymentSlide62: ITBL Grid Applications Plan to Use Mixture of Computational Technologies Grid for the Bell Detector : Grid for the Bell Detector KEK computing center Osaka U. Nagoya U. 1TB/day ~100Mbps ~ 1TB/day (planned) 400 GB/day ~45 Mbps 170 GB/day NFS e+e- Bo Bo Tohoku U. 10Gbps The Belle detector SuperSINET backbone of the Belle network U. Tokyo Tokyo Institute of Technology USA Korea Taiwan Etc.Slide64: ITBL Grid Applications: Fusion GridAdaptation of Nano-science Applications to Grid Environment: Adaptation of Nano-science Applications to Grid Environment Analysis of Nanoscience Applications Parallel Structure Granularity Resource Requirement Latency Tolerance Coupled Simulation RISM: Reference Interaction Site Model FMO: Fragment Molecular Orbital MethodRiken Grid: Riken Grid User User User User User Front end server Web portal Globus ITBL Computer resource pool RIKEN Super Combined ClusterGrid Summary: Grid Summary More emphasis on Grids than expected More government support More application involvement Higher level tools Computational, data, business grids included Research contributions from Japan on: Clusters computing Grid Middleware Heavily involved in international collaborations You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
casc yelick Pumbaa Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 62 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript WTEC Panel on High End Computing in JapanSite visits: March 29 - April 3, 2004: WTEC Panel on High End Computing in Japan Site visits: March 29 - April 3, 2004 Study Commissioned By: National Coordination Office Department of Energy National Science Foundation National Aeronautics and Space Administration WTEC Overview: WTEC Overview Provides assessments of research and development This was one of 55 international technology assessments done by WTEC WTEC Process Write proposals for NSF “umbrella” grants Put together a coalition of sponsors Recruit a panel of experts Conduct the study with on-site visits Publish a report Full text reports at wtec.orgPurpose & Scope of this Study: Purpose & Scope of this Study Gather information on current status and future trends in Japanese high end computing Govt agencies, research communities, vendors Focus on long-term HEC research in Japan Compare Japanese and U.S. HEC R&D Provide review of ES development process and operational experience Include user experience and its impact on computer science and computational science communities Report on follow-on projects Determine HEC areas amenable for Japan-U.S. cooperation to accelerate future advancesWTEC HEC Panel Members: WTEC HEC Panel Members Al Trivelpiece (Panel Chair) Former Director Oak Ridge National Laboratory Rupak Biswas Group Lead, NAS Division NASA Ames Research Center Jack Dongarra Director, Innovative Computing Lab University of Tennessee & Oak Ridge National Laboratory Peter Paul Deputy Director, S&T Brookhaven National Laboratory Horst Simon (Advisor) Director, NERSC Lawrence Berkeley National Lab Kathy Yelick Computer Science Professor University of California, Berkeley Dan Reed (Advisor) Computer Science Professor University of North Carolina, Chapel Hill Praveen Chaudhari (Advisor) Director Brookhaven National LaboratorySites Visited (1): Sites Visited (1) Earth Simulator Center Frontier Research System for Global Change National Institute for Fusion Science (NIFS) Japan Aerospace Exploration Agency (JAXA) University of Tokyo Tokyo Institute of Technology National Institute of Advanced Industrial S&T (AIST) High Energy Accelerator Research Org. (KEK) Tsukuba University Inst. of Physical and Chemical Research (RIKEN) National Research Grid Initiative (NAREGI) Research Org. for Information Sci. & Tech. (RIST) Japan Atomic Energy Research Institute (JAERI)Sites Visited (2): Sites Visited (2) Council for Science and Technology Policy (CSTP) Ministry of Education, Culture, Sports, Science, and Technology (MEXT) Ministry of Economy, Trade, and Industry (METI) Fujitsu Hitachi IBM-Japan Sony Computer Entertainment Inc. (SECI) NECHEC Business and Government Environment in Japan: HEC Business and Government Environment in JapanGovernment Agencies: Government Agencies Council for Science & Tech. Policy (CSTP) Cabinet Office, PM resides over monthly meetings Sets strategic directions for S&T Rates proposals submitted to MEXT, METI and others Ministry of Education, Culture, Sports, Science, and Technology (MEXT) Funds most of S&T R&D activities in Japan Funded the Earth Simulator Ministry of Economy, Trade, & Industry (METI) Administers industrial policy Funds R&D projects with ties to industry Not interested in HEC, except for gridsBusiness and Government: Business and Government New Independent Administrative Institution (IAI) model Some research institutes had already converted Universities were being converted during our visit Govt. funds institution as whole; control own budget Funding being cut annually as well Commercial viability of vector supers is problematic. Only NEC still committed to this architectural model Commodity PC clusters increasingly prevalent All three Japanese vendors have cluster productsBusiness Partnerships: Business Partnerships Each of the Japanese vendors is partnered with a US vendor NEC and Cray ? Fujitsu and Sun Microsystems Hitachi and IBM HEC Hardware in Japan: HEC Hardware in JapanArchitecture/Systems Continuum: Architecture/Systems Continuum Commodity processor with commodity interconnect Clusters Pentium, Itanium, Opteron, Alpha, PowerPC GigE, Infiniband, Myrinet, Quadrics, SCI NEC TX7 Fujitsu IA-Cluster Commodity processor with custom interconnect SGI Altix Intel Itanium 2 Cray Red Storm AMD Opteron Fujitsu PrimePower Sparc based Custom processor with custom interconnect Cray X1 NEC SX-7 Hitachi SR11000 Loosely Coupled Tightly CoupledFujitsu PRIMEPOWER HPC2500: Fujitsu PRIMEPOWER HPC2500 SMP Node 8‐128CPUs ・ ・ ・ ・ High Speed Optical Interconnect 128Nodes Crossbar Network for Uniform Mem. Access (SMP within node) to High Speed Optical Interconnect ・ ・ ・ System Board x16 Channel to I/O Device D T U to Channels <DTU Board> memory CPU Adapter Adapter CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU <System Board> DTU : Data Transfer Unit PCIBOX … … D T U D T U D T U SMP Node 8‐128CPUs SMP Node 8‐128CPUs SMP Node 8‐128CPUs memory <System Board> Channel 4GB/s x4 Fujitsu IA-Cluster: System Configuration: Fujitsu IA-Cluster: System Configuration FUJITSU PRIMERGY (1U) PRIMERGY BX300 Max. 20 blades in a 3U chassis PRIMERGY RXI600 IPF(1.5GHz): 2~4CPU Giga Ethernet Switch InfiniBand or Myrinet Switch Compute Nodes Compute Network InfiniBand, Myrinet Control Network Compute Node Compute Network System Configuration Latest Installation of FUJITSU HPC Systems: Latest Installation of FUJITSU HPC SystemsSlide16: HITACHI’s HPC system VOS3/HAP,HI-OSF/1-MJ HI-UX/MPP '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 M-200H IAP '92 '93 '94 '95 '96 '97 '98 '99 '00 0.01 0.1 1 10 S-820 S-810 20 10 5 80 60 40 20 15 M-280H IAP M-680 IAP 100 S-3800 S-3800 S-3600 Peak Performance [GFLOPS] 480 140 180 120 SR2201 Vector-Scalar combined type B A 1,000 10,000 F1 E1 C SR8000 SR11000 100,000 '01 '02 '03 G1 D Vector Automatic Vectorization Automatic Pseudo Vectorization Auto Parallelization ‘04 ‘05 POWER4+ AIX 5L Vector-Scalar Combined type H1 First Japanese Vector Supercomputer Single CPU peak performance 8GFlops (Fastest in the world) First commercially available distributed memory parallel processor Single CPU peak performance 3GFlops Integrated Array Processor system First HPC machine combined with vector processing and scalar processing Scalar Parallel (MPP type)Slide17: Vector Register MS Vector SR8000 Pseudo Vector Processing (PVP) Arithmetic Unit Pipelining Pipelining Prefetch Read data from main memory to cache before calculation - Accelerate sequential data access Preload - Read data from main memory to Floating Registers before calculation - Accelerate stride memory access and indirectly addressed memory access Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak PVP Feature MS Cache Load Floating-point Registers (FPRs) Arithmetic Unit Pseudo VectorHitachi SR11000: Hitachi SR11000 Based on IBM Power 4+ SMP with 16 processors/node 109 Gflop/s / node(6.8 Gflop/s / p) IBM uses 32 in their machine IBM Federation switch Hitachi: 6 planes for 16 proc/node IBM uses 8 planes for 32 proc/node Pseudo vector processing features Minimal hardware enhancements Fast synchronization No preload like SR 8000 Hitachi’s Compiler effort is separate from IBM Automatic vectorization, no plans for HPF 3 customers for the SR 11000, National Institute for Material Science Tsukuba - 64 nodes (7 Tflop/s) Okasaki Institute for Molecular Science - 50 nodes (5.5 Tflops) Institute for Statistic Math Institute - 4 nodesSlide19: Vector Register MS Vector SR11000 Pseudo Vector Processing (PVP) Arithmetic Unit Pipelining Pipelining Prefetch Read data from main memory to cache before calculation - Accelerate sequential data access Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak PVP Feature MS Cache Load Floating-point Registers (FPRs) Arithmetic Unit Pseudo VectorSR11000 Next Model: Continuing IBM partnership Power5 processor Greatly enhanced memory bandwidth - Flat Memory Interleaving Hardware Barrier Synchronisation Register SR11000 Next ModelNEC HPC Products: Express5800/ 1160Xa Middle - Small Size Capacity Computing SX-6/7 Series NEC HPC Products Parallel Vector Processors IA-64 SERVER TX7 Parallel PC- Clusters IA-32 Workstations TX7 SERIES Express 5800/50 Series High-End Capability Computing Express 5800 Parallel Linux ClusterTX7 Itanium² Server: TX7 Itanium² Server cc-NUMA architecture employs a chipset and crossbar switch developed in-house by NEC achieves near uniform high-speed memory access. up to 32 Itanium² Processors up to 128 GB of RAM Linux operating system with NEC enhancements more than 100GF on Linpack file server functionality for SX SX-Series Evolution: SX Series -THE FIRST COMPUTER IN THE WORLD SURPASSING 1GFLOPS SX-4 Series -CMOS INNOVATIVE TECHNOLOGY -ENTIRELY AIR-COOLING SX-5 Series -HIGH SUSTAINED PERFORMANCE -Large Capacity SHARED MEMORY 1983 1989 1998 1994 SX-3 Series -SHARED MEMORY・MULTI-FUNCTION PROCESSOR -UNIX OS SX-6 Series - SINGLE-CHIP VECTOR PROCESSOR -GREATER SCALABILITY 2001 NEXT GENERATION SX USE THE LATEST TECHNOLOGY TO BUILD UP AND DEVELOP THE NEW SUPERCOMPUTER The Latest Technology Always in SX-Series SX-Series EvolutionNEC SX-7/160M5: NEC SX-7/160M5 SX-6: 8 proc/node 8 GFlop/s, 16 GB processor to memory SX-7: 32 proc/node 8.825 GFlop/s, 256 GB, processor to memory Special Purpose: GRAPE-6: Special Purpose: GRAPE-6 The 6th generation of GRAPE (Gravity Pipe) Project Gravity (N-Body) calculation for many particles with 31 Gflop/s / chip 32 chips / board - 0.99 Tflop/s / board 64 boards of full system is installed in University of Tokyo - 63 Tflop/s On each board, all particles data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated No software! Gordon Bell Prize at SC for a number of years (Prof. Makino, U. Tokyo) Sony PlayStation2: Sony PlayStation2 Emotion Engine: 6 Gflop/s peak Superscalar MIPS 300 MHz core + vector coprocessor + graphics/DRAM About $200 70M sold PS1 100M sold 8K D cache; 32 MB memory not expandable OS goes here as well 32 bit fl pt; not IEEE 2.4GB/s to memory (.38 B/Flop) Potential 20 fl pt ops/cycle FPU w/FMAC+FDIV VPU1 w/4FMAC+FDIV VPU2 w/4FMAC+FDIV EFU w/FMAC+FDIVHigh-Performance Chips Embedded Applications: High-Performance Chips Embedded Applications The driving market is gaming (PC and game consoles) Motivation for almost all the technology developments. Demonstrate that arithmetic is quite cheap. Today there are three big problems with these apparent non-standard "off-the-shelf" chips. Most of these chips have very limited memory bandwidth and little if any support for inter-node communication. Integer or only 32 bit floating point No software support to map scientific applications to these processors; minimal general-purpose programming tools. Poor memory capacity for program storage Not clear that they do much for scientific computing. Developing "custom" software is much more expensive than developing custom hardware. TOP500 Data: TOP500 DataTop 20 Computers Where They are Located: Top 20 Computers Where They are LocatedEfficiency is Declining Over time: Efficiency is Declining Over time Analysis of top 100 machines in 1994 and 2004 Shows the # of machines in the top 100 that achieve a given efficiency on the Linpack benchmark In 1994 40 machines had >90% efficiency In 2004 50 have < 50% efficiency ESS Impact on Climate Modeling: ESS Impact on Climate Modeling NERSC IBM SP3: 1 simulated year per compute day on 112 processors ORNL/NCAR IBM SP4: ~2 simulated years per compute day on 96 processors ORNL/NCAR IBM SP4: 3 simulated years per compute day on 192 processors ESS: 40 simulated years per compute day on unknown number of processors (probably ~128) Cray X1 rumor: 14 simulated years per compute day on 128 procs. Source: Michael WehnerTechnology Transfer from Research: Technology Transfer from Research Numerical Wind Tunnel → Fujitsu VPP500 cp-pacs → Hitachi SR2201 Earth Simulator → NEC SX-6 Grape, MDM, eHPC, … → ?(MD-engine) Government projects encouraged new architectures. New technologies were commercialized. Hardware Summary: Hardware Summary The commercial viability of "traditional" supercomputing architectures with vector processors and high-bandwidth memory subsystems is problematic. NEC only remaining in Japan Clusters are replacing traditional high-bandwidth systems HEC Software in Japan: HEC Software in JapanSoftware Overview: Software Overview Emphasis on vendor software Fujitsu, Hitachi, NEC Earth Simulator software Languages and compilers Persistent effort in High Performance Fortran Including HPF/JA extensions Use of common libraries Little academic work for supercomputers: vendors supply tools Support for clusters Achievements HPF on the Earth Simulator: Achievements HPF on the Earth Simulator PFES Oceanic General Circulation Model based on Princeton Ocean Model Achieved 9.85TFLOPS with 376 nodes 41% of the peak performance Impact3D Plasma fluid code using Total Variation Diminishing (TVD) scheme Achieved 14.9 TFLOPS with 512 nodes 45% of the peak performanceSlide37: HPF/JA Extensions HPF research in language and compilers HPF 2.0 extends HPF 1.0 for irregular apps HPF/JA further extends HPF for performance REFLECT: placement of near-neighbor communication LOCAL: communication not needed for a scope Extended ON HOME: partial computation replication Compiler doesn’t need full interprocedural communication and availability analyses HPF/JA was a consortium effort by vendors NEC, Hitachi, FujitsuVectorization and Parallelization on the Earth Simulator (NEC): Vectorization and Parallelization on the Earth Simulator (NEC) Processor Node AP AP AP AP AP AP AP AP Intra-node Parallelization HPF Open MP Vectorization Inter-node Parallelization HPF MPI Automatic parallelizationHitachi: Hitachi Example of applied image DO j=1,m b Intra-node elementwise parallel processing (COMPAS) DO i=1,l DO k=1,n Vector processing in IP (With PVP) Inter-node parallelization (With parallel libraries) Inner DO loop Parallelized with parallel libraries (HPF,MPI,PVM,etc.) COMPAS (Automatic parallelization) PVP (Automatic pseudo vectorization) Node IP PVP: Pseudo Vector Processing COMPAS: CO-operative Micro-Processors in single Address Space IP : Instruction Processor Inter-nodeConclusions: Conclusions Longer sustained effort on HPF than in the US Part of the Earth Simulator Vision Successful on two of the large codes, including GB prize Languages extensions were also needed MPI is dominant model for internode communication Although larger nodes on Vector/Parallel means smaller degree of MPI parallelism Combined with automatic vectorization within nodes Other familiar tools developed outside Japan: numerical libraries, debuggers, etc.Grid Computing in Japan: Grid Computing in Japan Kathy Yelick U.C. Berkeley and Lawrence Berkeley National LaboratoryOutline: Outline Motivation for Grid Computing in Japan E-Business, E-Government, Science Summary of grid efforts Labs, Universities, Grid Research Contributions Hardware Middleware Applications Funding summaryGrid Motivation: Grid Motivation e-Japan: create a "knowledge-emergent society," where everyone can utilize IT In 2001, Japan internet usage was at the lowest level among major industrial nations Four strategies to address this: Ultra high speed network infrastructure Facilitate electronic commerce Realize electronic government Key is information sharing across agencies and society Nurturing high quality human resources Training, support of researchers, etc.Overview of Grid Projects in Japan: Overview of Grid Projects in Japan Super-SINET (NII) National Research Grid Initiative (NAREGI) Campus Grid(Titech) Grid Technology Research Center (AIST) Information Technology Based Lab (ITBL) Applications: VizGrid (JAIST) BioGrid (Osaka-U) Japan Virtual Observatory (JVO)Slide45: SuperSINET: All Optical Production Research Network 10Gbps Photonic Backbone GbEther Bridges for peer-connection 6,000+km dark fiber 100+ e-e lambda and 300+Gb/s Operational since Jan. 2002NAREGI: National Research Grid Initiative: NAREGI: National Research Grid Initiative Funded by MEXT: Ministry of Education, Culture, Sports,Science and Technology 5 year project (FY2003-FY2007) 2 B Yen(~17M$) budget in FY2003 Collaboration of National Labs. Universities and Industry in the R&D activities Applications in IT and Nano-science Acquisition of Computer Resources underwayNAREGI Goals: NAREGI Goals Develop a Grid Software System: R&D in Grid Middleware and Upper Layer Prototype for future Grid Infrastructure in scientific research in Japan Provide a Testbed 100+Tflop/s expected by 2007 Demonstrate High-end Grid Computing Environment can be applied to Nano-science Simulations over the Super SINET Participate in International Collaboration U.S., Europe, Asian Pacific Contribute to standards activities, e.g., GGFSlide48: ~3000 CPUs ~17 Tflops Center for GRID R&D (NII) ~5 Tflops Comp. Nano-science Center (IMS) ~10 Tflops Osaka Univ. BioGrid TiTech Campus Grid AIST SuperCluster Kyushu Univ. Small Test App Clusters NAREGI Phase 1 Testbed Super-SINET (10Gbps)IT-Based Laboratory (ITBL): IT-Based Laboratory (ITBL) ITBL(IT-based Laboratory) Government Labs: NAL, RIKEN, NIED, NIMS, JST, JAERI Project period: 2001-2005 (3-stage project) Total funding ~$105M Applications: mechanical simulation, computational biology, material science, environment, earthquake engineering Step 1: Supercomputer centers of government lab are networked via SuperSINET Step 2: “Virtual Research Environment”: Grid-enabling laboratory applications Step 3: Sharing information among researchers from widely distributed disciplines and institutionsTitech Campus Grid: Titech Campus Grid Blade systems with 800 PC processors total Spread over 13 locations, 2 Titech campuses Connected via Super TITANET (1-4 Gbps) 1.2 Tflops 25 Tbytes storage 24-processor Satellite Systems @ each of 12 departments Super SINET (10 Gbps) to other Grids Grid-wide Single System Image via Grid middleware: Globus, Ninf-G, Condor, NWS, … Super TITANET (1-4Gbps) Suzukake-dai Campus Oo-okayama Campus NEC Express 5800 Series Blade Servers 30km High Density GSIC Main Clusters (256 processers) x 2 systems in just 5 cabinets with MyrinetSlide51: NII System for Grid R&D (5 Tflops, 700GB) Slide52: IMS System for Grid R&D (10 Tflops, 5TB)AIST Super Cluster for Grid R&D: AIST Super Cluster for Grid R&D M64 P32 Myrinet 10,800mm 10,200mm P32: IBM eServer325 Opteron 2.0GHz, 6GB 2way x 1074 node Myrinet 2000 8.59TFlops/peak M64: Intel Tiger 4 Madison 1.3GHz, 16GB 4way x 131 node Myrinet 2000 2.72TFlops/peak F32: Linux Networx Xeon 3.06GHz, 2GB 2way x 256+ node GbE 3.13TFlops/peak total 14.5TFlops/peak, 3188 CPUsNAREGI Grid Software Stack: NAREGI Grid Software Stack WP6: Grid-Enabled Apps WP3: Grid PSE WP3: Grid Workflow WP1: SuperScheduler WP1: Grid Monitoring & Accounting WP2: Grid Programming -Grid RPC -Grid MPI WP3: Grid Visualization WP1: Grid VM (Globus,Condor,UNICOREOGSA) WP5: High-Performance & Secure Grid Networking WP4: Packaging Note: WP = “Work Package”R&D in Grid Software and Networking Area (Work Packages): WP-1: Lower and Middle-Tier Software for Resource Management: Matsuoka (Titech), Kohno(ECU), Aida (Titech) WP-2: Grid Programming Middleware: Sekiguchi (AIST), Ishikawa(AIST) WP-3: User-Level Grid Tools & PSE: Miura (NII), Sato (Tsukuba-u), Kawata(Utsunomiya-u) WP-4: Packaging and Configuration Management: Miura (NII) WP-5: Networking, Security & User Management Shimojo (Osaka-u), Oie ( Kyushu Tech.), Imase(Osaka-u) WP-6: Grid-enabling tools for Nanoscience Apps. Aoyagi (Kyushu-u) R&D in Grid Software and Networking Area (Work Packages)WP-1: Lower and Middle-Tier Software for Resource Management: WP-1: Lower and Middle-Tier Software for Resource Management Unicore Condor Globus Interoperability Adoption of ClassAds Framework Meta-scheduler Scheduling Schema, Workflow Engine, Broker Function Grid Information Service Attaches to multiple monitoring frameworks User and job auditing and accounting Self-Configurable Management & Monitoring GridVM (Lightweight Grid Virtual Machine) Support for co-scheduling, resource Control Node (IP) virtualization Interfacing with OGSA (Open Grid Services Architecture) WP-2:Grid Programming GridRPC/Ninf-G2 : WP-2:Grid Programming GridRPC/Ninf-G2 Server side Client side Client GRAM 3. invoke Executable 4. connect back Numerical Library IDL Compiler Remote Executable 1. interface request 2. interface reply fork MDS Interface Information LDIF File retrieve IDL FILE generate GridRPC: Programming with Remote Procedure Calls (RPC) on the Grid GridRPC API standardization by GGF Ninf-G is a reference implementation of GridRPC Implemented on Globus Toolkit (C and Java APIs) Used by groups outside JapanWP-2:Grid ProgrammingGridMPI: WP-2:Grid Programming GridMPI GridMPI: Programming with MPI on the Grid Environment to run MPI applications efficiently in the Grid. Flexible and heterogeneous process invocation on each compute node GridADI and Latency-aware communication topology: Optimizes communication over non-uniform latency Hides the differences of lower-level communication libraries Extremely efficient implementation based on MPI on Score (Not MPICHI-PM)WP-3: User-Level Grid Tools & PSEs: WP-3: User-Level Grid Tools & PSEs Grid Workflow Workflow Language Definition GUI(Task Flow Representation) Visualization Tools Real-time volume visualization on the Grid PSE /Portals Multiphysics/Coupled Simulation Application Pool Collaboration with Nano-science Applications GroupWP-4: Packaging and Configuration Management: WP-4: Packaging and Configuration Management Collaboration with WP1 management Activities Selection of packagers to use Interface with autonomous configuration management (WP1) Test Procedure and Harness Testing Infrastructure c.f. NSF NMI packaging and testing WP-5: Network Measurement, Management & Control: WP-5: Network Measurement, Management & Control Traffic measurement on SuperSINET Optimal QoS Routing based on user policies and network measurements Robust TCP/IP Control for Grids Grid CA/User Grid Account Management and DeploymentSlide62: ITBL Grid Applications Plan to Use Mixture of Computational Technologies Grid for the Bell Detector : Grid for the Bell Detector KEK computing center Osaka U. Nagoya U. 1TB/day ~100Mbps ~ 1TB/day (planned) 400 GB/day ~45 Mbps 170 GB/day NFS e+e- Bo Bo Tohoku U. 10Gbps The Belle detector SuperSINET backbone of the Belle network U. Tokyo Tokyo Institute of Technology USA Korea Taiwan Etc.Slide64: ITBL Grid Applications: Fusion GridAdaptation of Nano-science Applications to Grid Environment: Adaptation of Nano-science Applications to Grid Environment Analysis of Nanoscience Applications Parallel Structure Granularity Resource Requirement Latency Tolerance Coupled Simulation RISM: Reference Interaction Site Model FMO: Fragment Molecular Orbital MethodRiken Grid: Riken Grid User User User User User Front end server Web portal Globus ITBL Computer resource pool RIKEN Super Combined ClusterGrid Summary: Grid Summary More emphasis on Grids than expected More government support More application involvement Higher level tools Computational, data, business grids included Research contributions from Japan on: Clusters computing Grid Middleware Heavily involved in international collaborations