Day1 915am Rieder

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD NIEHS Variation Workshop January 30-31, 2006

Slide2: 

SNP Resources: SNP discovery and cataloging SNP discovery/genotyping: Genome-wide approaches SNP Consortium HapMap The current state of SNP resources Comprehensive SNP discovery NIEHS SNPs - Environmental Genome Project SNP Databases - 'How to' Manual for finding SNPs In class - Tutorial

Slide3: 

Genetic Markers: Overview RFLPs (SNPs circa 1980) Microsatellites (SSLP; di-, tri-, tetranucleotide repeats) 1/50,000 bp Linkage Studies - 300-400 markers (~1 Mbp) Multi-allelic/High heterozygosity/informative Complex genotyping assays Single Nucleotide Polymorphisms (SNPs) Most frequent genetic variant (base substitutions) 1/1000 bp (comparing randomly selected chromosomes) Biallelic/less informative Simplified genotyping platforms (+/- calling)

Slide4: 

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt; 1- 5% MAF) - 1/300 bp How has SNP discovery progressed toward this goal?

Slide5: 

Finding SNPs: Marker Discovery and Methods SNP discovery has proceeded in two distinct phases: 1 - SNP Identification Define the alleles Map this to a unique place in the genome 2 - SNP Characterization Determination of the genotype in many individuals Population frequency of SNPs

Slide6: 

Finding SNPs: Marker Discovery and Methods SNP Discovery has proceeded in two distinct phases: 1 - SNP Discovery**/Characterization 2 - SNP Discovery/Characterization**

Slide7: 

Finding SNPs: Marker Discovery and Methods $ 45 Million - 2 years (1999, 2001 - 2003) Goals: Identify 300,000 SNPs and map 150,000 (April 1999) Determine allele frequency of SNPs If you don’t have a reference genome - how do you find SNPs?

Slide8: 

Finding SNPs: Sequence-based SNP Mining Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC DNA SEQUENCING RT errors? Sequencing Quality

Slide9: 

Finding SNPs: Sequence-based SNP Mining RRS = Reduced Representation Sequencing Genomic DNA (multiple individuals) RE to generate fragments Clone DNA fragments into plasmid vectors Sequence and align and cluster From overlap identify mismatches = SNPs GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC Altshuler, et al. Nature (2000)

Slide10: 

Finding SNPs: Sequence-based SNP Mining BAC = Bacterial Artificial Chromosome Primary vector for DNA cloning in the HGP Fragment DNA DNA from multiple individuals Clone large fragments into BACs (unknown sequence) Sequence and Reassemble (known sequence) Assembly with other overlapping BACs GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC

Slide11: 

Feb. 2001 - Human Genome Project and TSC TSC and HGP: Highand#x8; Resolution SNP Map

Slide12: 

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt; 1 - 5% MAF) - 1/300 bp Feb 2001 - 1.42 million (1/1900 bp)

Slide13: 

dbSNP -NCBI SNP database SNP Discovery: dbSNP database

Slide14: 

SNP data submitted to dbSNP: Clustering dbSNP processing of SNPs

Slide15: 

Finding SNPs: Marker Discovery and Methods SNP Discovery has proceeded in two distinct phases: 1 - SNP Identification**/Discovery 2 - SNP Discovery/Characterization**

Slide16: 

HapMap Project Proposed: Map more SNPs and genotype Increase SNP density over the first 6 - 12 months Ultimately produce a fine scale genetic map (HapMap) which would serve as a common resource for all biomedical reseseachers Genotype 600,000 SNPs genome-wide Four populations: CEPH (Europe), Yoruban (Africa), Japanese/ Chinese (Asian)

Slide17: 

Initiation of project planning (July 2001): 2.8 million SNPs (1.4 million validated) - 1/1900 bp HapMap SNP Discovery: Prior to Genotyping TACGCCTATA TCAAGGAGAT Generate more SNPs: Other Sources of SNPs: Perlegen (Affymetrix chips) SNP data (chr22) Sequence chromatograms from Celera project

Slide18: 

HapMap Discovery Increased SNP Density and Validated SNPs 10 million rs SNPs 5 million validated rs SNPs

Slide19: 

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt; 1- 5% MAF) - 1/300 bp When will we have them all? Feb 2001 - 1.42 million (1/1900 bp) Nov 2003 - 2.0 million (1/1500 bp) Feb 2004 - 3.3 million (1/900 bp) Mar 2005 - 5.0 million (validated - 1/600 bp)

Slide20: 

Finding SNPs: Sequence-based SNP Mining RANDOM Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC Genomic RRS Library Shotgun Overlap DNA SEQUENCING Random Shotgun Align to Reference

Slide21: 

SNP discovery is dependent on your sample population size GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 8

Slide22: 

SNP Characterization/Genotyping Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt;1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated/mapped - 1/600 bp) 5.0/10.0 = 50% of all common SNPs (validated)!

Slide23: 


Slide24: 

Finding SNPs: Genotype Data Adds Value to SNPs HapMap Genotyping Confirms SNP as 'real' and 'informative' Minor Allele Frequency (MAF) - common or rare MAF in different populations Detection of SNP x SNP correlations (Linkage Disequilibrium) Determine haplotypes

Slide25: 

Few SNPs in dbSNPs had Genotype Data

Slide26: 

1.58 millions SNPs genotyped 71 individuals from 3 American populations European, African and Asian ancestry Perlegen Large-scale Genotyping Capacity

HapMap Completion: 

HapMap Completion Nature - Oct 27 (2005) HapMap + Perlegen

Slide28: 

Perlegen Data dbSNP: Increasing numbers of SNPs now have genotype data HapMap Phase II Perlegen

Slide29: 

Current State of dbSNP Many SNPs left to validate and characterize.

Slide30: 

Increasing SNP Density: HapMap ENCODE Project ENCODE = ENCyclopedia Of DNA Elements Catalog all functional elements in 1% of the genome (30 Mb) 10 Regions x 500 kb/region (Pilot Project) David Altschuler (Broad), Richard Gibbs (Baylor) 16 CEU, 16 YRI, 8 HCB, 8 JPT Comprehensive PCR based resequencing across these regions

Slide31: 

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt;1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated - 1/600 bp) ~4.0 million validated SNPs with genotypes! (HapMap confirmed, allele frequency/population, SNPxSNP correlations (LD), haplotypes)

Slide32: 

SNP discovery is dependent on your sample population size GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes

Slide33: 

Goal: Comprehensively identify all common sequence variation in candidate genes Initial biological focus: Candidate environmental response genes involved in DNA repair, cell cycle, apoptosis, metabolism, cell signaling, and oxidative stress. Approach: Direct resequencing of genes Samples: PDR = 90 ethnically diverse individuals representative of U.S. population (397 genes) EGP95 = 95 samples from 4 ethnic groups (23 HapMap Asians, 22 HapMap Europeans, 15 HapMap Yorubans, 12 African Americans, 24 Hispanic ) (170 genes)

Slide34: 

Targeted SNP Discovery 5’ 3’ Directed analysis: cSNPs 5’ 3’ Complete analysis: cSNP and Haplotype Structure Analysis Arg-Cys Val-Val Arg-Cys Val-Val PCR amplicons PCR amplicons Generate SNP data from complete genomic resequencing (i.e. 5’ regulatory, exon, intron, 3’ regulatory sequence)

Slide35: 

Nov 2005 - Zaitlen et al. Genome Research 15:1594-1600 Summary of NIEHS SNP genotypes in dbSNP Current numbers 554 genes sequenced 12.76 Mb scanned 75,580 genotyped SNPs identified 7 million genotypes deposited in dbSNP

Slide36: 

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (andgt;1- 5% MAF) - 1/300 bp NIEHS SNPs = 1/180 bp (n = 95, 4 pops) HapMap ENCODE = 1/160 (n = 48, 3 pops) Comprehensive resequencing can identify the vast majority of SNPs in a region

Slide37: 

Rarer and population specific SNPs are found by resequencing SNP Discovery: dbSNP database Minor Allele Freq. (MAF) dbSNP (Perlegen/HapMap) { 15% 25% NIEHS SNPs {

NIEHS SNPs Characterization: 

NIEHS SNPs Characterization PDR = 90 ethnically diverse individuals representative of U.S. population (397 genes - ~55,000 SNPs ) Selection of informative (high frequency, coding, etc) SNPs to be genotyped in defined populations (~7600 SNPs) HapMap Populations European (CEU,n=60) African (YRI, n =60) Asian (HCB, n = 45 and JPT, n = 45) Non-HapMap Populations Hispanic (n = 60) African-American (n = 62)

Illumina NIEHS SNPs Genotyping: 

Illumina NIEHS SNPs Genotyping Each well samples 1536 SNPs in one individual For each HapMap sample 5 x 1536 (7680 genotyped SNPs) 3,000,000 genotypes generated (total ~400 samples)

Slide40: 

Population Allele Frequency Correlations Illumina NIEHS SNP Genotyping

Slide41: 

NIEHS SNPs Genotype Data PDR (397 genes) SNPs characterized in six different major populations.

Slide42: 

Summary: The Current State of SNP Resources SNPs have been rapidly adopted as the genetic marker of choice. Approximately 10 million common SNPs exist in the human genome (1/300 bp). Random SNP discovery processes generate many SNPs (TSC and HapMap). Random approaches to SNPs discovery have reached limits of discovery and validation (1/600 bp; 50% SNP validation) Most validated SNPs (5 million) will be genotyped by the HapMap (3 pops) Resequencing approaches continue to catalog important variants (rarer) NIEHS SNPs has generated SNP data on andgt;550 candidate genes and 75 K SNPs

authorStream Live Help