ahm poster gridblast 2004

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

A GT3 based BLAST grid service for biomedical research Micha Bayer1, Aileen Campbell2 & Davy Virdee2 1National e-Science Centre, e-Science Hub, Kelvin Building, University of Glasgow, Glasgow G12 8QQ 2Edikt, National e-Science Centre, e-Science Institute, 15 South College Street, Edinburgh EH8 9AA Overview BLAST is a well-known program for biological sequence comparison used to compare query sequences to a set of target sequences in order to find similar sequences in the target set can be extremely compute intensive we present a parallel implementation of BLAST delivered via a GT3 grid service part of the BRIDGES project, a UK e-Science project aimed at providing a grid based environment for research into the genetic causes of hypertension (http://www.brc.dcs.gla.ac.uk/projects/bridges/) Parallel BLAST to achieve maximum performance in a grid context, we have parallelised BLAST multiple query sequences are partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches we have provided our own java based scheduler which distributes sub-jobs across an array of resources System Architecture grid service uses GT3.0.2 core only we have provided our own wrappers for OpenPBS client side and the Condor submission components a scheduler component examines the input, polls resources for available processors and farms out subtasks to the resources details of resources (i.e. clusters) are held in single XML config file – adding new resources is easy target databases are located on execute nodes or on cluster masternode to minimise stage-in time – these need updating regularly Design Issues no suitable metaschedulers available at time of designing the system – had to write our own system only uses GT3 core as a thin layer between client side and scheduler since full GT3 was due to be replaced by WSRF – minimises future porting effort Compute Resources Used ScotGRID compute cluster at Glasgow Univ.: a 250 processor Linux cluster Condor pool at National e-Science Centre, Glasgow Univ.: 25 desktop machines, single processors Client Side users of service range from expert to low computer literacy delivery mechanism chosen was therefore via BRIDGES web portal (see below) Java based graphical client to service is downloaded via Java webstart allows for easy, centralised updates also provides good opportunity to explore client side Globus Scheduler Algorithm parse input and count no. of query sequences poll resources and establish total no. of idle nodes set number of sub-jobs to be run to be equal to total no. of idle nodes calculate no. of sequences to be run per sub-job n (= no. of idle nodes/no. of sequences) while there are sequences left save n sequences to a sub-job input file if the number of idle nodes is 0 make up small number of sub-jobs (currently hardcoded to 5) and evenly distribute these into queues across resources else for each resource send i subjobs to the resource as separate threads where is the number of idle nodes on the resource when results are complete save to file in the original input file order return this to the user Summary We have constructed a parallelised BLAST service that farms out multiple query sequences as subjobs to a pool of resources. Our scheduler runs over OpenBPS and Condor resources via our own java wrappers. Client side delivery is through a Java GUI delivered via a web portal and Java Webstart. Contact / Further Information BRIDGES website and portal at http://www.brc.dcs.gla.ac.uk/projects/bridges/ email contact: michab@dcs.gla.ac.uk