Computational Human Genetics: Computational Human Genetics Itsik Pe'er
Department of Computer Science Columbia University
Fall 2006
Reminder: Reminder Structural variants:
Detected by analysis
Detected by technology
Also:
Project outline presentations
Model organisms What can we do to improve technology?
Meeting #10: Meeting #10 Genotyping technologies
Genotyping Technologies: Genotyping Technologies Basic techniques in molecular biology
Genotyping technologies:
Overview + examples
Focused examination: Affymetrix
Future of sequencing
Model organisms:
Mouse
Dog
How can we examine DNA?: How can we examine DNA? Cut up DNA
Paste DNA
Copy DNA
Observe presence of DNA
Measure DNA
Detect DNA
Sequence DNA
Cutting DNA with Enzymes: Cutting DNA with Enzymes Non-specific digestion
Specific digestion: restriction enzymes
Pasting DNA to Sticky Ends: Pasting DNA to Sticky Ends Complementary DNA sticky ends hybridize and are ligated
Allows insertion of DNA
Copying DNA by PCR: Copying DNA by PCR Polymerase Chain Reaction technique to create many copies of a short genome segment
Requires:
Synthesizing flanking sequences (primers)
2-stranded DNA Region of interest Primer sequences
Copying DNA by PCR: Copying DNA by PCR Primers Polymerase
Copying DNA by PCR: Copying DNA by PCR Exponential increase in the number of amplicon molecules
How can we examine DNA?: How can we examine DNA? Cut up DNA
Paste DNA
Copy DNA
Observe presence of DNA
Measure DNA
Detect DNA
Sequence DNA
Observing Presence of DNA: Observing Presence of DNA Radioactive phosphate in nucleotides
Fluorescent labels (attached molecules)
Sizing DNA with Electrophorsis: Sizing DNA with Electrophorsis (-)-charged DNA molecules are lined up at cathode
Travel to anode through buffer
Longer molecules are slower
Photograph labeled DNAs Buffer Starting line
Detect DNA by Hybridization: Detect DNA by Hybridization Probes – short, single strand DNA molecules
Apply mixture to array of probes, wash, photo
Only probes that have reverse-complements light up
Sequencing DNA: Sequencing DNA A polymerase mix with A-stop bases creates all A-terminating prefixes
Run electrophoresis
Repeat for all bases A ATTA ATTATGCTA TAGCATAAT ACGT A
Genotyping Technologies: Genotyping Technologies Basic techniques in molecular biology
Genotyping technologies:
Overview + examples
Focused examination: Affymetrix
Future of sequencing
Model organisms:
Mouse
Dog
Principles: Principles Allele-dependent chemical event
Hybridization
Extension
Ligation
Reading of signal from the event
Detecting probe/extension/ligand
Considerations:
Robustness
Throughput
Cost
Example: MassArray (Sequenom): Example: MassArray (Sequenom)
Example: MassArray (Sequenom): Example: MassArray (Sequenom) Event: Extension of SNP-specific primer (amplified)
Detection: Mass spectrometry
Specs:
Up to ~20SNPs x ~400 samples at a time
0.10$ per call; requires SNP-specific PCR+probe
Computation:
Design primers of different weights
Molecular Inversion Probes: Molecular Inversion Probes Design a probe with hybridizing flanks
Molecular Inversion Probes: Molecular Inversion Probes
Molecular Inversion Probes: Molecular Inversion Probes Event: Allele-dependant ligation PCR
Detection: “bar-code” tag hybridizes to array
Specs:
Up to 500k SNPs x ~100 samples at a time
0.02$ per call; SNP-specific probe; single PCR
Computation:
Choose tagging sequences
Example: BeadArray (Illumina): Example: BeadArray (Illumina) A/C T G
Example: BeadArray (Illumina): Example: BeadArray (Illumina) Event: Allele-specific ligation PCR
Detection: “bar-code” tag hybridizes to array
Specs:
Up to 500k SNPs x ~100 samples at a time
0.002$ per call; SNP-specific probe; per-SNP PCR
Computation:
Make calls
(clustering in polar coordinates)
Genotyping Technologies: Genotyping Technologies Basic techniques in molecular biology
Genotyping technologies:
Overview
Focused examination: Affymetrix
Future of sequencing
Model organisms:
Mouse
Dog
Example: Affymetrix GeneChip: Example: Affymetrix GeneChip Genomic DNA with SNPs
Example: Affymetrix GeneChip: Example: Affymetrix GeneChip
Example: Affymetrix GeneChip: Example: Affymetrix GeneChip
Example: Affymetrix GeneChip: Example: Affymetrix GeneChip Event: Hybridization to array
Detection: multiple probes; fluorescent target
Specs:
Up to 500k SNPs x~100 samples at a time
0.001$ per call; SNP-specific probe; single PCR
Unflexible
Computation:
Genotype calls
Calling Affymetrix Genotypes: Calling Affymetrix Genotypes “Dynamic Model”:
For each quartet:
Rank hypotheses (AA/AB/BB/Null)
Score the rankings being the same
Ranking hypotheses:
LogLikelihoods: L(AA), L(AB) L(BB) L(Null)
Assume:
Normal signal
Different means for true/noise signal
Clustering Approach: Clustering Approach Given signals from many individuals:
Better estimates of mean signal
Allows clustering in high dimension:
Bayesian Robust Linear Model with Mahalanovich distance
Which Probes Really Type?: Which Probes Really Type?
Genotyping Technologies: Genotyping Technologies Basic techniques in molecular biology
Genotyping technologies:
Overview
Focused examination: Affymetrix
Future of sequencing
Model organisms:
Mouse
Dog
The X-Prize for Genomics: The X-Prize for Genomics Announced: Oct 2006
$10,000,000 cash
Sequence
100 humans
<10 days
< 0.00001 error rate
<2% missing data
<$10000 recurrent cost
Semi-annual competition, till 2013
How far are we?: How far are we? Genotyping?
If made to work everywhere
If we know all SNPs
If we type all structural variants
If we shave half an order of magnitude off cost
If the X-Prize committee accepts that
Standard sequencing:
Human Genome: $0.09/finished bp
Today: ~$5M/genome
Candidate: 454: Candidate: 454 ~$1M/genome
Candidate: Helicos: Candidate: Helicos http://helicosbio.com/B38AD5C6BCE640D9B97A44977D5E1CEF.asp?ie_key=4546BA83C53E4D988C4F5E1B5CE5E2CE
Probably $100k/genome
Candidate: Solexa: Candidate: Solexa http://www.solexa.com/technology/demo.html
Probably $100k/genome
Summary: Summary Diverse technologies, diverse problems
An affordable personal genome in our time
Model Organisms: Model Organisms +
Controlled breeding
Controlled environment
-
Controlled past breeding
Relevance to human disease
Mouse History: Mouse History
Genetic Archaeology: Genetic Archaeology
SNP Rate: Same or Different?: SNP Rate: Same or Different?
Mosaic Structure: Mosaic Structure
Segmental Phylogenies: Segmental Phylogenies
Dogs: Dogs
Sequencing an Inbred Boxer: Sequencing an Inbred Boxer
Linkage Disequilibrium in Dogs: Linkage Disequilibrium in Dogs Larger Ne (ancestral)
Strong bottleneck
Dog History: Dog History Two bottleneck events
Suggests 2-stage disease mapping
Further Reading: Further Reading Di X, Matsuzaki H, Webster TA, Hubbell E, Liu G, Dong S, Bartell D, Huang J, Chiles R, Yang G, Shen MM, Kulp D, Kennedy GC, Mei R, Jones KW, Cawley S. Bioinformatics. 2005 May 1;21(9):1958-63.
Rabbee N and Speed TP Bioinformatics. 2006 Jan 1;22(1):7-12.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM. Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005 Sep 15;437(7057):376-80
Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, …,Lander ES, Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005 Dec 8;438(7069):803-19.
Frazer KA, Wade CM, Hinds DA, Patil N, Cox DR, Daly MJ Segmental phylogenetic relationships of inbred mouse strains revealed by fine-scale analysis of sequence variation across 4.6 mb of mouse genome. Genome Res. 2004 Aug;14(8):1493-500.
Wade CM, Daly MJ Genetic variation in laboratory mice.Nat Genet. 2005 Nov;37(11):1175-80.
Wade CM, Kulbokas EJ 3rd, Kirby AW, Zody MC, Mullikin JC, Lander ES, Lindblad-Toh K, Daly The mosaic structure of variation in the laboratory mouse genome. Nature. 2002 Dec 5;420(6915):574-8.