FuGE springPSI2006

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

FuGE: A framework for developing standards for functional genomics: 

FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester

Overview: 

Overview Challenge of building data standards Introduction to FuGE Current status Formats developed using FuGE

Data Standards for HT Genomics: 

Data Standards for HT Genomics Major challenge developing standards: Technology still evolving Heterogeneous data formats (and data types) from software and instruments “Important” info about starting sample is almost unlimited Large quantities of metadata to validate results BUT: Most of these problems are shared by microarrays, proteomics, metabolomics etc.

Experiment Workflow: 

Experiment Workflow Material Treatment Material Material Treatment Material Treatment Material Data Acquisition Data Data Transformation Data Data

Functional Genomics Experiment (FuGE) Object Model: 

Functional Genomics Experiment (FuGE) Object Model Merges of MAGE and PEDRo models where attempted Results where and even more complex model that still left other FG technologies untouched Main motivation was reuse MAGE sample prep and ontology components FuGE project was created as independent project from MGED and PSI Model of common components across FG to enable synergy between standards Sample description, protocols, investigation structure http://fuge.sourceforge.net

Architecture Details: 

Architecture Details FuGE mainly represented as UML model UML 1.4 using Magic Draw 9.5 Uses AndroMDA to produces platform specific models XML Schema Language Bindings and API’s Java, Perl, C, etc. Database schema

FuGE Structure: 

FuGE Common Bio Description Audit Ontology Protocol Reference Investigation Data Material Conceptual Molecule Common: General data format management Auditing Referencing external resources Protocols Bio: Investigation structure Data Materials (organisms, solutions, compounds) Theoretical molecules e.g. sequences FuGE Structure

FuGE Workflow: 

FuGE Workflow

FuGE is an Enabler: 

FuGE is an Enabler Serve as a basis for developing new formats PSI-GPS and MGED are using FuGE for developing their new data formats Existing formats can be tied together using FuGE mzData does not describe biosource separation procedure (gels, LC, etc.) CPAS from FHCRC does this

Use 1: Extending FuGE: 

Use 1: Extending FuGE

Use 2: Tie Together External Formats: 

Protocol definition says “See ExternalData file for parameters” (rather than storing params in Protocol) Use 2: Tie Together External Formats Protocol ProtocolApplication Material ExternalData mzData file File format definition Parser will exist to extract data / parameters from mzData file Material can be used to describe the sample. This connects the MS data with a separation workflow inputMaterial outputData

Status of FuGE: 

Status of FuGE Milestone 1 release - Sep 2005 Milestone 2 release - Dec 2005 Acceptance by PSI and MGED at this time Milestone 3 – Spring 2006 Milestone 2 of GelML and spML Version 1.0 – Fall 2006

FuGE Extensions: 

FuGE Extensions MAGE V2 Format for microarray data and annotations GelML Format for methods + results of 2D gels Milestone 1 Dec 2005 Release scheduled for Spring/Summer 2006 spML Sample processing: liquid chromatography, capillary electrophoresis, centrifugation Milestone 1 Dec 2005 CPAS uses a FuGE-inspired manifest for experiments Metabolomics community considering PRIDE contemplating FuGE for data format Flow Cytometry community interested MIACA?

Summary: 

Summary FuGE should help convergence of omics data formats: Single description of the sample for all types of experiment Shared representation of protocols Investigation and workflow structure for integrating different omics projects Good starting point, proven development methodology

Acknowledgements: 

Acknowledgements Other FuGE developers Andrew Jones (Manchester) Michael Miller (Rosetta), Paul Spellman (Lawrence Berkley) MGED, PSI, Fred Hutch CRC, Genologics, and various Contact: angel@mail.med.upenn.edu

While I have your attention…: 

While I have your attention… Space cost Ultra expensive ~$19/GB ($380 for 20GB) Cheap (TerraStation NAS) ~$0.80/GB ($16) Ultra Cheap ($500 PC) ~ $0.50 ($10) MIAPE confounding factors Will never have a complete list We are implicitly telling investigators that they don’t know how to do good science (a Bad Thing) Instead require quality assessment statistics on the data (variance, reproducibility, etc.)