Lecture 6.3: From DNA to Protein : Lecture 6.3: From DNA to Protein Dr. Joanne Fox
Day 6: Saturday February 21st, 2004
13:45 – 15:15pm
From DNA to Protein : From DNA to Protein
Objectives : Objectives Review protein sequence features and databases
Review the structural diversity of amino acids and protein sequences
Highlight several physiochemical and structural features which can be calculated from protein sequences
Show how proteomics utilizes methods and techniques for measuring, comparing and assessing protein features
Outline: : Outline: Protein sequence features
Databases of protein sequences
Basics of protein structure
1o structure, prediction of Mw and pI
2o structure, prediction methods
3o structure, methods for predicting folds
Proteomics
Current methods
Cutting edge technology
Amino Acids : Amino Acids The general formula for an amino acid
R is commonly one of 20 different side chains
At pH 7 both the amino and carboxyl groups are ionized amino
group alpha
carbon carboxyl
group side chain
group
Peptide Bonds : Peptide Bonds Amino acids are joined together by an amide linkage called a peptide bond.
The two bonds on either side of the rigid planar peptide unit exhibit a high degree rotation peptide
bonds rotation occurs here
Families of Amino Acids : Families of Amino Acids The common amino acids are grouped according to whether their side chains are:
acidic D, E
basic K, R, H
uncharged polar N, Q, S, T, Y
nonpolar G, A, V, L, I, P, F, M, W, C
Hydrophilic amino acids (uncharged polar) are usually on the outside of a protein whereas nonpolar residues cluster on the inside of protein
Basic or acidic amino acids are very polar and are generally found on the outside of protein molecules
Protein Sequence Features : Protein Sequence Features Proteins exhibit far more sequence and chemical complexity than DNA or RNA
Properties and structure are defined by the sequence and side chains of their constituent amino acids
The “engines” of life
>95% of all drugs target proteins
Favorite topic of post-genomic era
Protein Sequence Databases : Protein Sequence Databases Where does protein sequence information reside?
Entrez Cross Database Search
http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Swissprot & TrEMBL
http://ca.expasy.org/sprot/
PIR
http://pir.georgetown.edu/
As of December 2003, all of this information is integrated into unified protein database called Uniprot.
Uniprot
http://www.pir.uniprot.org/
Entrez Cross Database Search : Entrez Cross Database Search Protein: sequence database gives access to translated protein sequences from Genbank/EMBL/DDBJ
Complete set of deduced protein sequences
Redundancy problem
Swissprot & TrEMBL : Swissprot & TrEMBL Swissprot is an expert curated database
Function, domain structure, post-translational modifications, variants, reactions, similarities
TrEMBL (translated EMBL)
Computer annotated supplement to Swissprot
PIR – Protein Information Resource : PIR – Protein Information Resource Annotated database which includes protein family classification information
The Uniprot Knowledgebase : The Uniprot Knowledgebase Contains all of the information in Swiss-Prot, TrEMBL, and PIR. This new unified database was launched in December 2003.
Basics of Protein Structure : Primary
Secondary
Tertiary Basics of Protein Structure
Molecular Weight : Molecular Weight Quick formula = 110 X number of residues
Accurate determination of mass by mass spectrometry
Tools exist for accurately calculating mass of peptides based on amino acid composition
Molecular Weight & Proteomics : Molecular Weight & Proteomics 2-D Gel QTOF Mass Spectrometry
Isoelectric Point : Isoelectric Point The pH at which a protein has a net charge=0
Basics of Protein Structure : Basics of Protein Structure Primary
Secondary
Tertiary
Common Secondary Structure Elements : Common Secondary Structure Elements The Alpha Helix
Common Secondary Structure Elements : Common Secondary Structure Elements The Beta Sheet
Secondary Structure:Phi & Psi Angles Defined : Secondary Structure: Phi & Psi Angles Defined
Rotational constraints emerge from interactions with bulky groups (ie. side chains).
Phi & Psi angles define the secondary structure adopted by a protein.
Ramachandran Plot : Ramachandran Plot
Supersecondary Structure : Supersecondary Structure
Secondary Structure & Protein Folding : Secondary Structure & Protein Folding Understanding the forces of hydrophobicity: nonpolar
side chains polar
side chains unfolded or partially
folded polypeptide folded conformation Hydrogen bonds can
form with polar side chains
on outside of the protein hydrophobic core contains
nonpolar side chains
Hydrophobicity is a property which can be calculated for protein sequences : Hydrophobicity is a property which can be calculated for protein sequences Hydrophobicity Scales:
Used to calculate hydrophobicity
Based on experimental evidence indicating hydrophobic/hydrophilic properties of each aa
Solubility, Stability, Location and/or Globularity of protein sequences can be predicted
Hydrophobicity Profile : hydrophobic +
hydrophilic - Hydrophobicity Profile Moving segment approach
Correlation of this technique with 3D structure score NH2 protein sequence COOH interior residues exterior
The a-helix is a common secondary structure element : The a-helix is a common secondary structure element A helical wheel is a representation of the 3D structure of the a-helix.
Projection of aa side chains onto a plane perpendicular to axis of helix
Hydrophobic arcs stabilize helical interactions
Amphipathic helices are common
nonpolar acidic
Secondary Structure Prediction : The presence of secondary structure elements can be predicted.
Current algorithms rely on:
statistics (Chou-Fasman, GOR)
homology or nearest neighbor comparisons (Levin)
physico-chemical properties (Lim, Eisenberg)
pattern matching (Cohen, Rooman)
neural networks (Qian & Sejnowski, Karplus)
evolutionary methods (Barton, Niemann)
and combined approaches (Rost, Levin, Argos)
Secondary Structure Prediction
Chou-Fasman Algorithm : Chou-Fasman Algorithm Assign each residue a Pa, Pb, Pc value
Take a window of 7 residues and calculate a window-averaged value for all Pa, Pb, Pc
Assign the average value for each of the secondary structures to the middle residue
Move down one residue and repeat steps 2 thru 3 until finished
Scan and assign SS to the highest P/residue
Chou-Fasman Statistics : Chou-Fasman Statistics
The PhD Approach : The PhD Approach PRFILE...
The PhD Algorithm : The PhD Algorithm Search the SWISS-PROT database and select high scoring homologues
Create a sequence “profile” from the resulting multiple alignment
Include global sequence info in the profile
Input the profile into a trained two-layer neural network to predict the structure and to “clean-up” the prediction
Predicting via Neural Nets & PSSM : Predicting via Neural Nets & PSSM PHDhtm
http://www.embl-heidelberg.de/predictprotein/
TMAP
http://www.mbb.ki.se/tmap/index.html
TMPred
http://www.ch.embnet.org/software/TMPRED_form.html
Prediction Performance : Prediction Performance
Best of the Best : Best of the Best PredictProtein-PHD (72%)
http://cubic.bioc.columbia.edu/predictprotein
Jpred (73-75%)
http://www.compbio.dundee.ac.uk/~www-jpred/
PREDATOR (75%)
http://www.hgmp.mrc.ac.uk/Registered/Option/predator.html
PSIpred (77%)
http://bioinf.cs.ucl.ac.uk/psipred/
Basics of Protein Structure : Basics of Protein Structure Primary
Secondary
Tertiary
Tertiary Structure : Tertiary Structure
Protein Structure Databases : Protein Structure Databases Where does protein structural information reside?
PDB:
http://www.rcsb.org/pdb/
MMDB:
http://www.ncbi.nlm.nih.gov/Structure/
FSSP:
http://www.ebi.ac.uk/dali/fssp/
SCOP:
http://scop.mrc-lmb.cam.ac.uk/scop/
CATH:
http://www.biochem.ucl.ac.uk/bsm/cath_new/
Structural Proteomics : Structural Proteomics Aim to delineate total repertoire of protein folds
Provide 3D portraits for all proteins in an organism
Goal: Use structure to infer function.
Compare structure of unknown protein to known set of structures
More sensitive than primary sequence comparisons
The Protein Fold Universe : The Protein Fold Universe How
Big
Is
It??? 500?
2000?
10000? 8 ?
Structures in PDB : Structures in PDB PDB = 19860 structures Jan 03
PDB = 23997 structures Jan 04
“structural genomics”
search = 156 structures Jan 03
search = 478 structures Jan 04
Structural Proteomics : Structural Proteomics 10000 20000 30000 40000 50000 60000 70000 80000 0 Sequences Structures 90000 100000
Unique folds in PDB : Unique folds in PDB
Prediction Methods for 3D structure : Prediction Methods for 3D structure Intermediate Steps
Predict secondary structure
Calculate solvent accessibility
Methods for 3D structure prediction based on:
Threading, Homology Modeling or Fold recognition
Similarity in amino acid sequence implies similar structure/function
Ab Initio Techniques
Numerical methods designed to simulate the structure and dynamics of marcromolecules
Proteomics : Proteomics The study of the expression, location, interaction, function and structure of all the proteins in a given cell or organism
Expressional Proteomics
Functional Proteomics
Structural Proteomics
Proteomics : Proteomics Expressional Proteomics
2D or Capillary Electrophoresis, protein chips
Mass Spectrometry, Laser induced fluorescence
Functional Proteomics
Mass Spectrometry, micro-assays, protein chips
Yeast or Bacterial 2-hybrid systems
Structural Proteomics
High throughput X-ray crystallography
High throughput NMR spectroscopy
2D Gel Principles : 2D Gel Principles SDS
PAGE
Mass Spec Principles : Mass Spec Principles Ionizer Sample + _ Mass Filter Detector
Ionization Methods : Ionization Methods 370 nm UV laser MALDI cyano-hydroxy
cinnamic acid Gold tip needle Fluid (no salt) ESI + _
Protein ID Protocol : Protein ID Protocol
Computational Tools for Protein Identification : Computational Tools for Protein Identification PeptIdent
http://us.expasy.org/tools/peptident.html
Mascot
http://www.matrixscience.com/search_form_select.html
ProteinProspector
http://prospector.ucsf.edu/
MOWSE
http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse
PeptideSearch
http://www.mann.embl-heidelberg.de/ GroupPages/PageLink/peptidesearchpage.html
AACompSim/AACompIdent
http://www.expasy.ch/tools Covered in
Lab 6.4
Proteomics : Proteomics Human proteome estimated to contain 500,000+ proteins
The next “big wave” in bioinformatics
How to deal with so much data?
How to link structure to function to sequence?
How to show or store temporal and spatial data?
How to use it in drug discovery & development? Proteomics Workshop
July 19 – 24th, 2004
Calgary, Alberta
The Cutting Edge of Proteomics : The Cutting Edge of Proteomics Evolution of Proteomes
Structural Genomics
Quantitative Mass Spectrometry and Protein Chip Technology
Chemical Proteomics
Proteome Scale Analysis of Networks, i.e., signal transduction, Y2H experiments
Global Proteome Interaction Mapping in C. elegans : Global Proteome Interaction Mapping in C. elegans Science
23 January 2004
303: 540 Science
7 January 2000 287: 116 see also
Yeast Two Hybrid (Y2H) on the genomic scale : Yeast Two Hybrid (Y2H) on the genomic scale Global interaction map of C. elegans
Use proteome as bait in Y2H experiment
Detect all pairwise interactions
Create global protein:protein interaction network
Protein:Protein Interaction Networks : Protein:Protein Interaction Networks
DNA vs Protein Chip Technology : DNA vs Protein Chip Technology DNA microtechnology
Can successfully read 1000’s of side by side measurements of RNA levels
BUT RNA ≠ protein = function
Protein Microarray Technology
Goal: develop protein chip with proteins in active state.
Proteins more challenging to prepare than DNA/RNA
Protein functionality depends on state, modifications, binding partners, localization etc.
Protein Chip - Methods : Protein Chip - Methods Attachment Methods:
Diffusion
Absorption
nitrocellulose
Covalent Crosslinking
Reactive surfaces
Affinity Attachment
Affinity tags
Protein Chip - Applications : Protein Chip - Applications Antibody Chip
Detect Ag-Ab interactions
Protein Chip
Protein:protein
Protein:drug
Enzyme:substrate
Ligand Chip
And more….
Protein Chips : Protein Chips
Summary : Summary Protein sequence, and subsequently protein sequence databases, are much more complex than DNA
Prediction of protein structure is a complex problem at both the 2D and 3D levels
Proteomics initiatives based on different technologies are making inroads into the study of protein structure and function on a global level