GGF9 Gfarm

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Grid Datafarm and File System Services: 

Grid Datafarm and File System Services Osamu Tatebe Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST)

ATLAS/Grid Datafarm project: CERN LHC Experiment: 

ATLAS/Grid Datafarm project: CERN LHC Experiment Truck ATLAS Detector 40mx20m 7000 Tons LHC Perimeter 26.7km ~2000 physicists from 35 countries Collaboration between KEK, AIST, Titech, and ICEPP, U Tokyo

Petascale Data-intensive Computing Requirements: 

Petascale Data-intensive Computing Requirements Peta/Exabyte scale files Scalable parallel I/O throughput > 100GB/s, hopefully > 1TB/s within a system and between systems Scalable computational power > 1TFLOPS, hopefully > 10TFLOPS Efficiently global sharing with group-oriented authentication and access control Resource Management and Scheduling System monitoring and administration Fault Tolerance / Dynamic re-configuration Global Computing Environment

Grid Datafarm (1): Global virtual file system [CCGrid 2002]: 

Grid Datafarm (1): Global virtual file system [CCGrid 2002] World-wide virtual file system Transparent access to dispersed file data in a Grid Map from virtual directory tree to physical file Fault tolerance and access-concentration avoidance by file replication Grid File System File replica creation Virtual Directory Tree mapping

Grid Datafarm (2): High-performance data processing [CCGrid 2002]: 

Grid Datafarm (2): High-performance data processing [CCGrid 2002] World-wide parallel and distributed processing Aggregate of files = superfile Data processing of superfiles = parallel and distributed data processing of member files Local file view File-affinity scheduling Grid File System Virtual CPU Newspapers in a year (superfile) 365 newspapers World-wide Parallel & distributed processing

Extreme I/O bandwidth support example: gfgrep - parallel grep: 

Extreme I/O bandwidth support example: gfgrep - parallel grep % gfrun –G gfarm:input gfgrep –o gfarm:output regexp gfarm:input CERN.CH KEK.JP input.1 input.2 input.3 input.4 open(“gfarm:input”, &f1) create(“gfarm:output”, &f2) set_view_local(f1) set_view_local(f2) close(f1); close(f2) grep regexp Host2.ch Host1.ch Host3.ch Host4.jp gfarm:input Host1.ch Host2.ch Host3.ch Host4.jp Host5.jp gfmd input.5 Host5.jp File affinity scheduling

Design of AIST Gfarm Cluster I: 

Design of AIST Gfarm Cluster I Cluster node (High density and High performance) 1U, Dual 2.8GHz Xeon, GbE 800GB RAID with 4 3.5” 200GB HDDs + 3ware RAID 97 MB/s on writes, 130 MB/s on reads 80-node experimental cluster (operational from Feb 2003) Force10 E600 181st position in TOP500 (520.7 GFlops, peak 1000.8 GFlops) 70TB Gfarm file system with 384 IDE disks 7.7 GB/s on writes, 9.8 GB/s on reads for a 1.7TB file 1.6 GB/s (= 13.8 Gbps) on file replication of a 640GB file with 32 streams

World-wide Grid Datafarm Testbed: 

World-wide Grid Datafarm Testbed Total disk capacity: 80 TB, disk I/O bandwidth: 12 GB/s KEK Titech AIST SDSC Indiana U Tsukuba U Kasetsert U, Thiland

Gfarm filesystem metadata: 

Gfarm filesystem metadata File status File ID Owner, file type, access permission, access times Num. of fragments, a command history File fragment status File ID, fragment index Fragment file size, checksum type, checksum Directories List of file IDs and logical filenames Replica catalog File ID, fragment index, filesystem node Filesystem node status hostname, architecture, #CPUs, . . . File status File fragment Directories Replica catalog Filesystem node Gfarm filesystem metadata Virtual File system Metadata Services Replica Location Services

Filesystem metadata operation: 

Filesystem metadata operation No direct manipulation Metadata is consistently managed via file operations only open() refers to the metadata close() updates or checks the metadata rename(), unlink(), chown(), chmod(), utime(), . . . New replication API Creation and deletion Inquiry and management