PDB

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Protein Database:

Protein Database Bioinformatics Lab

Sequence Databases:

Sequence Databases GenBank --DNA sequences and derived protein sequences EMBL --DNA sequences and derived protein sequences DDBJ --DNA sequences and derived protein sequences SWISS-PROT --Protein sequences PDB -- three-dimensional structures of protein

PowerPoint Presentation:

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences . A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. GenBank,EMBL & DDBJ

GenBank,EMBL & DDBJ:

GenBank,EMBL & DDBJ GenBank Release 122.0,Feb.15,2001. 10,897,000 sequence records 11,720,000,000 bases EMBL Release 66,Mar.2,2000 11,169,673 11,916,112,872 DDBJ,the Center for operating DDBJ, National Institute of Genetics (NIG),Japan,established in April 1995.

Protein Databases:

Protein Databases There are many styles in protein databases,such as protein sequences,motif,classification,structure, structure alignment, curation GenBANK,EMBL and DDBJ(derived sequences, http://www.ncbi.nlm.nih.gov/gorf/gorf.html ) SWISS-PROT ,PIR (sequences) PROSITE,PRINTS(sequence motifs) HSSP,FSSP(classification,alignment) PDB(3-D structure)

SWISS-PROT/TrEMBL:

SWISS-PROT/ TrEMBL Annotated protein sequences, Established in 1986 Developed by the SWISS-PROT groups at SIB and at EBI . Maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva( 日内瓦) and the EMBL Data Library (now the EMBL Outstation - The European Bioinformatics Institute (EBI)). Website: http://www.expasy.ch/

Different Features of SWISS-PROT :

Different Features of SWISS-PROT Format follows as closely as possible that of EMBL’s Curated protein sequence database Three differences: Strives to provide a high level of annotations( 力争) Minimal level of redundancy( 冗余最少) High level of integration with other databases ( 综合性高)

Three Distinct Criteria:

Three Distinct Criteria 1. Annotation The sequence data; the citation information (bibliographical references) and the taxonomic data (description of the biological source of the protein) such as protein functions,post-translational modifications ,domains and sites,secondary structure,quaternary structure,similarities to other proteins,diseases associated with deficiencies in the protein,sequence conflicts, variants, etc.

PowerPoint Presentation:

2. Minimal Redundancy Many sequence databases contain, for a given protein sequence, separate entries which correspond to different literature reports. SWISS-PROT is as much as possible to merge all these data so as to minimize the redundancy. If conflicts exist between various sequencing reports, they are indicated in the feature table of the corresponding entry.

PowerPoint Presentation:

3. Integration With Other Databases SWISS-PROT and TrEMBL - Protein sequences PROSITE - Protein families and domains SWISS-2DPAGE - Two-dimensional polyacrylamide gel electrophoresis 聚丙烯酰胺电泳 SWISS-3DIMAGE - 3D images of proteins and other biological macromolecules SWISS-MODEL Repository - Automatically generated protein models CD40Lbase - CD40 ligand defects ( 配合体缺失) ENZYME - Enzyme nomenclature ( 酶命名) SeqAnalRef - Sequence analysis bibliographic references ( 序列分析目录参考)

SWISS-PROT/TrEMBL:

SWISS-PROT/TrEMBL TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT SWISS-PROT Release 39.15 of 19-Mar-2001: 94,152 entries TrEMBL Release 16.2 of 23-Mar-2001: 436,924 entries

SWISS-PROT FORMAT:

SWISS-PROT FORMAT Line code Content Occurrence in an entry ID Identification Once; starts the entry AC Accession number(s) One or more DT Date Three times DE Description One or more GN Gene name(s) Optional OS Organism species One or more OG Organelle Optional OC Organism classification One or more RN Reference number One or more RP Reference position One or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RA Reference authors One or more RT Reference title Optional RL Reference location One or more CC Comments or notes Optional DR Database cross-references Optional KW Keywords Optional FT Feature table data Optional SQ Sequence header Once   ( blanks) sequence data One or more // Termination line Once; ends the entry

Access to SWISS-PROT and TrEMBL :

Access to SWISS-PROT and TrEMBL SRS - Access to SWISS-PROT, TrEMBL and other databases using the Sequence Retrieval System Full text search in SWISS-PROT and TrEMBL by accession number or ID (AC or ID line; SWISS-PROT and TrEMBL) by description or identification (any word in the DE, OS, OG, GN and ID lines; SWISS-PROT and TrEMBL) by author (RA line; SWISS-PROT and TrEMBL) by citation (RL line; SWISS-PROT only) Retrieve a list of SWISS-PROT/TrEMBL entries Randomly retrieve a SWISS-PROT/TrEMBL entry

Protein Data Bank:

Protein Data Bank PDB is three-dimensional structure of proteins,some nuclei acids involved PDB is operated by RCSB ( Research Collaboratory for Structural Bioinformatics ) ,funded by NSF, DOE, and two units of NIH:NIGMS National Institute Of General Medical Sciences and NLM National Library Of Medicine . Established at BNL Brookhaven National Laboratories in 1971,as an archive for biological macromolecular crystal structures In 1980s, the number of deposited structures began to increase dramatically. October 1998, the management of the PDB became the responsibility of RCSB. Website http://www.rcsb.org

PDB Holdings List: 27-Mar-2001:

PDB Holdings List: 27-Mar-2001 Molecule Type Proteins, Peptides, and Viruses Protein/ Nucleic Acid Complexes Nucleic Acids Carbohydrates total Exp. Tech. X-ray Diffraction and other 11045 526 552 14 12137 NMR 1832 71 366 4 2273 Theoretical Modeling 281 19 21 0 321 total 13158 616 939 18 14731 Structure Factor Files 968 NMR Restraint Files

PDB Content Growth:

PDB Content Growth

PDB Growth in New Folds:

PDB Growth in New Folds

PDB Data File Format:

PDB Data File Format There are mainly two formats:PDB and CIF PDB is fixed format in its columns CIF is free format

PDB Format:

PDB Format HEADER : First line of the entry, contains PDB ID code, classification, and date of deposition. OBSLTE : Statement that the entry has been removed from distribution and list of the ID code(s) which replaced it. TITLE : Description of the experiment represented in the entry. CAVEAT : Severe error indicator. Entries with this record must be used with care. COMPND : Description of macromolecular contents of the entry. SOURCE : Biological source of macromolecules in the entry. KEYWDS : List of keywords describing the macromolecule. EXPDTA : Experimental technique used for the structure determination. AUTHOR : List of contributors. REVDAT : Revision date and related information. SPRSDE : List of entries withdrawn from release and replaced by current entry. JRNL : Literature citation that defines the coordinate set. REMARK : General remarks, some are structured and some are free form. DBREF : Reference to the entry in the sequence database(s). SEQADV : Identification of conflicts between PDB and the named sequence database. SEQRES : Primary sequence of backbone residues. MODRES : Identification of modifications to standard residues. HET : Identification of non-standard groups or residues (heterogens) HETNAM : Compound name of the heterogens. HETSYN : Synonymous compound names for heterogens. FORMUL : Chemical formula of non-standard groups. HELIX : Identification of helical substructures.

PowerPoint Presentation:

SHEET : Identification of sheet substructures. TURN : Identification of turns. SSBOND : Identification of disulfide bonds. LINK : Identification of inter-residue bonds. HYDBND : Identification of hydrogen bonds. SLTBRG : Identification of salt bridges CISPEP : Identification of peptide residues in cis conformation. SITE : Identification of groups comprising important sites. CRYST1 : Unit cell parameters, space group, and Z. ORIGXn : Transformation from orthogonal coordinates to the submitted coordinates (n = 1, 2, or 3). SCALEn : Transformation from orthogonal coordinates to fractional crystallographic coordinates (n = 1, 2, or 3). MTRIXn : Transformations expressing non-crystallographic symmetry (n = 1, 2, or 3). There may be multiple sets of these records. TVECT : Translation vector for infinite covalently connected structures. MODEL : Specification of model number for multiple structures in a single coordinate entry. ATOM : Atomic coordinate records for standard groups. SIGATM : Standard deviations of atomic parameters. ANISOU : Anisotropic temperature factors. SIGUIJ : Standard deviations of anisotropic temperature factors. TER : Chain terminator. HETATM : Atomic coordinate records for heterogens. ENDMDL : End-of-model record for multiple structures in a single coordinate entry. CONECT : Connectivity records. MASTER : Control record for bookkeeping. END : Last record in the file.

An Example of PDB:

An Example of PDB HEADER IMMUNOGLOBULIN 09-MAY-89 2MCG 2MCG 2 COMPND IMMUNOGLOBULIN LAMBDA LIGHT CHAIN DIMER (/MCG$) 2MCG 3 COMPND 2 (TRIGONAL FORM) 2MCG 4 SOURCE HUMAN (HOMO $SAPIENS) 2MCG 5 AUTHOR K.R.ELY,J.N.HERRON,A.B.EDMUNDSON 2MCG 6 REVDAT 2 15-JUL-92 2MCGA 1 SPRSDE 2MCGA 1 SPRSDE 15-OCT-90 2MCG 1MCG 2MCGA 2 JRNL AUTH K.R.ELY,J.N.HERRON,M.HARKER,A.B.EDMUNDSON 2MCG 9 JRNL TITL THREE-DIMENSIONAL STRUCTURE OF A LIGHT CHAIN 2MCG 10 REMARK 1 REFERENCE 1 2MCG 16 REMARK 1 AUTH A.B.EDMUNDSON,K.R.ELY,J.N.HERRON,B.D.CHESON 2MCG 17 SEQRES 1 1 216 PCA SER ALA LEU THR GLN PRO PRO SER ALA SER GLY SER 2MCG 183 FORMUL 3 HOH *318(H2 O1) 2MCG 217 SSBOND 1 CYS 1 22 CYS 1 90 2MCG 218 CRYST1 72.300 72.300 185.900 90.00 90.00 120.00 P 31 2 1 6 2MCG 223 ORIGX1 0.013831 0.007985 0.000000 0.00000 2MCG 224 ORIGX2 0.000000 0.015971 0.000000 0.00000 2MCG 225 ORIGX3 0.000000 0.000000 0.005379 0.00000 2MCG 226 SCALE1 0.013831 0.007985 0.000000 0.00000 2MCG 227 SCALE2 0.000000 0.015971 0.000000 0.00000 2MCG 228 SCALE3 0.000000 0.000000 0.005379 0.00000 2MCG 229 ATOM 1 N PCA 1 1 23.624 -24.231 101.873 1.00 17.85 2MCG 230 ATOM 2 CA PCA 1 1 23.296 -22.902 102.481 1.00 17.38 2MCG 231 ATOM 3 C PCA 1 1 24.304 -22.495 103.531 1.00 16.74 2MCG 232 ATOM 4 O PCA 1 1 23.962 -21.756 104.487 1.00 16.81 2MCG 233 ATOM 5 CB PCA 1 1 21.845 -23.057 103.035 1.00 18.02 2MCG 234 ATOM 6 CG PCA 1 1 21.816 -24.552 103.492 1.00 18.36 2MCG 235 ATOM 7 CD PCA 1 1 23.109 -25.217 102.974 1.00 18.57 2MCG 236 ATOM 8 OE PCA 1 1 23.354 -26.423 103.256 1.00 19.02 2MCG 237 TER 3214 SER 2 216 2MCG3443 HETATM 3215 O HOH 1 26.302 -28.430 111.973 1.00 4.66 2MCG3444 CONECT 145 144 660 2MCG3762 MASTER 170 0 0 0 0 0 0 6 3530 2 10 34 2MCGA 5 END 2MCG3773

Fragment of CIF example:

Fragment of CIF example #################### # ATOM_SITE # #################### loop_ _atom_site.label_seq_id _atom_site.group_PDB _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.auth_seq_id _atom_site.label_alt_id _atom_site.cartn_x _atom_site.cartn_y _atom_site.cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.footnote_id _atom_site.label_entity_id _atom_site.id 1 ATOM N N GLY A 1 . -8.863 16.944 14.289 1.00 21.88 1 1 1 1 ATOM C CA GLY A 1 . -9.929 17.026 13.244 1.00 22.85 1 1 2 1 ATOM C C GLY A 1 . -10.051 15.625 12.618 1.00 43.92 1 1 3 1 ATOM O O GLY A 1 . -9.782 14.728 13.407 1.00 25.22 1 1 4

3-D Structure from PDB:

3- D Structure from PDB 20 Amino acids http://www.clunet.edu/BioDev/omm/aa/aa.htm http://www.nyu.edu/pages/mathmol/library/life/ http://inquiry.uiuc.edu/bioweb/tutorial/amino_acids.htm

PowerPoint Presentation:

Phenylalanine Glycine Histidine Isoleucine Lysine Leucine Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophane Glutamic acid Alanine Cysteine Aspartic acid Tryosine

How to Construct 3-D Molecule:

How to Construct 3-D Molecule Read coordinates from PDB( 找相配结构) Set up data structure of molecules Form bonds among atoms and groups Calculate secondary structure Implement 3-D graphical algorithms Render 3-D graph in various style, wires, sticks, balls, ribbons, and the like.

Bonds among atoms:

Bonds among atoms ATOM 20 N LEU 1 4 30.279 -25.716 105.041 1.00 10.60 2MCG 249 ATOM 21 CA LEU 1 4 31.406 -26.518 104.496 1.00 9.39 2MCG 250 ATOM 22 C LEU 1 4 32.658 -25.786 105.165 1.00 8.90 2MCG 251 ATOM 23 O LEU 1 4 32.890 -24.586 104.967 1.00 8.74 2MCG 252 ATOM 24 CB LEU 1 4 31.615 -26.794 103.141 1.00 8.79 2MCG 253 ATOM 25 CG LEU 1 4 31.552 -27.440 101.860 1.00 8.37 2MCG 254 ATOM 26 CD1 LEU 1 4 32.732 -26.945 100.970 1.00 7.99 2MCG 255 ATOM 27 CD2 LEU 1 4 31.706 -28.963 102.016 1.00 8.09 2MCG 256 Leucine LEU L( 亮氨酸)

Bonds between groups:

Bonds between groups ATOM 9 N SER 1 2 25.548 -22.930 103.333 1.00 16.05 2MCG 238 ATOM 10 CA SER 1 2 26.608 -22.758 104.327 1.00 15.38 2MCG 239 ATOM 11 C SER 1 2 27.351 -24.076 104.604 1.00 14.81 2MCG 240 ATOM 12 O SER 1 2 27.530 -24.949 103.740 1.00 15.00 2MCG 241 ATOM 13 CB SER 1 2 25.887 -22.406 105.682 1.00 15.73 2MCG 242 ATOM 14 OG SER 1 2 25.193 -23.586 106.117 1.00 15.14 2MCG 243 ATOM 15 N ALA 1 3 27.758 -24.228 105.876 1.00 13.72 2MCG 244 ATOM 16 CA ALA 1 3 28.328 -25.397 106.456 1.00 12.33 2MCG 245 ATOM 17 C ALA 1 3 29.255 -26.303 105.686 1.00 11.58 2MCG 246 ATOM 18 O ALA 1 3 29.033 -27.552 105.641 1.00 11.28 2MCG 247 ATOM 19 CB ALA 1 3 27.101 -26.228 106.998 1.00 12.39 2MCG 248 ATOM 20 N LEU 1 4 30.279 -25.716 105.041 1.00 10.60 2MCG 249 ATOM 21 CA LEU 1 4 31.406 -26.518 104.496 1.00 9.39 2MCG 250 ATOM 22 C LEU 1 4 32.658 -25.786 105.165 1.00 8.90 2MCG 251 ATOM 23 O LEU 1 4 32.890 -24.586 104.967 1.00 8.74 2MCG 252 ATOM 24 CB LEU 1 4 31.615 -26.794 103.141 1.00 8.79 2MCG 253 ATOM 25 CG LEU 1 4 31.552 -27.440 101.860 1.00 8.37 2MCG 254 ATOM 26 CD1 LEU 1 4 32.732 -26.945 100.970 1.00 7.99 2MCG 255 ATOM 27 CD2 LEU 1 4 31.706 -28.963 102.016 1.00 8.09 2MCG 256

Nucleic Acid Database(NDB) :

Nucleic Acid Database(NDB) The NDB Project is funded by the National Science Foundation and the Department of Energy The goal of NDBP is to assemble and distribute structural information about nucleic acids The format of NDB is the same as PDB.

Molvie1.0:

Molvie1.0 A visual and interactive environment to display,analyze,fold and compare molecular structure. Developed in Java AWT by us. Java application/applet,really embedded in webpage.(http://www.cs.ucsb.edu/~mli/Bioinf/software/index.html)

Some features:

Some features Molvie 1.0 is programmed in Java, hence it is platform-independent. There is no limit on the number of molecules, atoms, residues or the number of animation frames displayed, as long as there is enough in computer memory. Molvie has many rendering( 表现) styles. Molvie can display two molecules simultaneously and allows the user to align secondary structure by dragging the mouse. Molvie also allows the users to click at some part of the 3-D structure of a protein and displays the corresponding primary amino acid sequences.

Molvie Application Screen:

Molvie Application Screen

Molvie Applet Screen:

Molvie Applet Screen

PowerPoint Presentation:

Show Molvie

authorStream Live Help