logging in or signing up CS374 2004 Lecture8 Haplotypes Cubemiddle Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 297 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Ronnie A. Sebro: Ronnie A. Sebro Haplotype reconstruction BMI 374 10/21/2004Mendelian Laws of Inheritance: Mendelian Laws of Inheritance Law of Segregation Alleles separate when gametes are formed Law of Independent assortment Allele pairs separate independently during formation of gametes Mendelian Inheritance Each offspring receives one allele from male parent, and the other from female parent Complex Diseases: Complex Diseases Polygenic or multifactorial diseases Run in families, but do not show Mendelian (monogenic) inheritance Complex interaction between disease susceptibility genes, and environmental factors Examples: asthma, schizophreniaFinding disease genes: Finding disease genes Two common methods employed Pedigree analysis Linkage analysis Affected individuals inherit/share the same portion of the genome Case-control analysis Association analysis Affected individuals have different allele frequencies (higher or lower) than controlsDefinitions: Definitions Marker – small segments of DNA with specific features Types of markers SNPs AATAA vs. AACAA Microsatellites (STRs) -CAGCAGCAG- vs. –CAGCAGCAGCAGCAG- Locus - physical position of a marker on a chromosome Homozygous – when both alleles at a locus are the same Heterozygous – when the alleles at a locus are differentDefinitions: Definitions Haplotype All alleles, one from each locus that are on the same chromosome Recombinant An individual who inherited a haplotype not identical to that inherited by his/her parent Phase Information about which alleles are inherited from each parent Example: Example Genotypes Haplotypes Enumerating Haplotypes: Enumerating Haplotypes Consider an individual heterozygous at 3 loci e.g 1 2 1 2 1 2 Several possible haplotypes Haplotype space can be potentially huge For n SNPs – 2n haplotypes Finding disease genes: Finding disease genes Both tests (association based tests, and pedigree linkage analysis tests) tentatively converge Convergence is at the point of requiring to find a haplotype/allele in tight association (LD) or inherited by all affected individuals Putative disease locus thereby identifiedWhy Haplotype?: Why Haplotype? Single allele vs. Haplotype Advantages of using haplotype Improved Power ! Disadvantages of using haplotype Haplotypes aren’t readily knownProblem: Problem Data generated from sequencer in the following format (SNPs) 1 1 0 0 1 1 1 1 1 1 2 1 2 0 0 0 2 2 2 2 1 2 1 3 1 2 1 1 2 1 2 1 2 1 4 1 2 0 1 2 1 2 2 2 Genotypes are known Haplotypes are unknown Pedigree Haplotyping: Haplotyping Haplotyping can be done at molecular level – whole genome derived haplotypes (ref. Douglas et al., 2001) Algorithms preferred because Lower cost of genotyping Fast and accurate algorithmsCurrent Haplotyping Algorithms: Current Haplotyping Algorithms Algorithms used for unphased data Clark Algorithm (Andy Clark @ Penn State) E-M Algorithm (Stephens et al. ) Bayesian Haplotype Inference (Jun Liu et al.)Clark Algorithm: Clark Algorithm Enumerate haplotypes which exist with certainty in the sample (individuals heterozygous at 0 or 1 loci) Assigns ambiguous haplotypes to those in the known list Solutions are dependent on the order in which the individuals with unresolved haplotype phase are entered The algorithm does not assume HW equilibriumE-M Algorithm: E-M Algorithm Estimate population haplotype probabilities is via maximum likelihood estimation; finding the values of the haplotype probabilities which optimize the probability of the observed data The maximum likelihood estimates of the haplotype probabilities are obtained by maximization of the likelihood This is a missing data problemAssumption of HW equilibrium Software EH (Xie and Ott, 1993) and EH+ (Zhao, Curtis and Sham) Bayesian Algorithm: Bayesian Algorithm A dirichlet prior distribution is used for the haplotype frequencies Uses a Gibbs sampler: enables handling of many SNP loci Implemented in program HAPLOTYPERErrata in data: Errata in data Genotyping Errors (quite common esp. with SNPs) Missing data MCAR MAR Non-ignorable missingness Marker order errorsOverview: Overview Discuss paper dealing with estimation of haplotypes in pedigrees (i.e. some information about phase) Minimum-Recombinant Haplotyping in Pedigrees (Qian & Beckmann) Useful for the HAPMAP project! Useful also for association analyses with the Transmission Disequilibrium Test (TDT) Paper 1: Paper 1 Minimum-Recombinant Haplotyping in Pedigrees Notation Methods (Algorithm) Results Shortcomings of algorithmRecombination Principles: Recombination Principles Minimum-Recombination Principle In nature, recombination is a rare event The most probable haplotypes are those that minimize the total number of recombinations needed in the pedigree Double-Recombinants Naturally these are even rarer events, especially over such short intervals (10cM) Notation: Notation Consider a pedigree of J family members and a set of L linked marker loci Individual – any family member Parent – a family member with at least 1 child Founder – a parent without his/her parents Offspring – a family member with at least one parentNotation: Notation Define individual “genotyped” at locus l iff: The genotype at locus l is known (from DNA) The genotype data can be determined from 1º relatives Ungenotyped parent (other genotyped) Informative if both haplotypes transmitted Partially informative only one haplotype transmitted Genotyped offspring Informative if at least one genotyped parentNotation: Notation Parental source (PS) – allele that is maternally or paternally derived Grandparental source (GS) – the parental source of each parental allele Notation: Notation For a nuclear family: denote the alleles of parent 1 denote the alleles of parent 2 denote the alleles of offspring j denote the paternal and maternal alleles of parent 1 denote the paternal and maternal alleles of parent 2 denote the paternal and maternal alleles of offspring j denote the GS of paternal and maternal alleles of individual j denote the minimum and maximum allele values, respectivelyNotation: Notation denotes PS-unknown genotype with alleles a and b denotes PS-known haplotype with paternal allele A and maternal allele B (ab) = (cd) denotes that genotypes (ab) and (cd) are equal (ab) ≠ (cd) denotes that genotypes (ab) and (cd) are not equal denotes that allele c is a constituent allele of genotype (ab) denotes that allele c is not a constituent allele of genotype (ab)Flexible Locus: Flexible Locus Type 1 If trio are all heterozygotes, and at least 1 parent and offspring not haplotypedFlexible Locus: Flexible Locus Type 2 Two alternative haplotype assignments at locus l in a founder result in equal number of recombinant offspringFlexible Locus: Flexible Locus Type 3 If two alternative haplotype assignments at locus l in offspring result in equal number of recombinants Rules: Rules Divide pedigree into nuclear trios Apply rules to each trio until all individuals haplotyped, or no further inference possible Rule 1: Input missing genotype at unambiguous loci in each parent conditional on spouse and child genotypes Rule 2: Assign haplotype at each unambiguous ocus in each offspring, conditional on parental genotypes Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family Rules: Rules Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trio Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations, retaining all configuration with the minimum recombinantsImplementation: Implementation Raw genotype dataImplementation: Implementation Rule 1: Impute missing genotype at each unambiguous locus in each parent, conditional on genotypes in spouse and offspringImplementation: Implementation Rule 2: Assign a haplotype at each unambiguous locus in each offspring, conditional on genotypes in parents in each parent-offspring trioImplementation: Implementation Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear familyImplementation: Implementation Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trioImplementation: Implementation Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on the haplotypes in offspring and the criterion of minimum recombinants in each nuclear familyImplementation: Implementation Second application of rules 2 and 3Implementation: Implementation Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations with alternative haplotype assignments at each flexible locus in these individuals. Retain all configurations with the minimum recombinants Reapplication of rule 3Results: Results A pedigree with Episodic ataxia 29 total individuals Genotyped at 9 polymorphic markers 2 individuals not genotyped Simulation study Looped marriage structure in a pedigree with ataxia telangiecstasiaResults: Results High degree of concordance with the maximum-likelihood method Identical haplotype configuration obtained with GENEHUNTER (ML based) in >99% of pedigrees analyzed. Simulation Results: Simulation ResultsGenotype Errors: Genotype Errors Impact of genotype errors investigated Generated genotype data on 1000 pedigrees, each pedigree containing one incorrect allele in a random individual at a random marker Mean number of recombinants increased from 5 to 6.2 (1.2) 44% of these additional recombinants were double recombinants All four correct MRHCs were reconstructed in 84% of pedigrees Marker errors: Marker errors The consequence of incorrect marker order on imputing haplotypes was investigated Marker loci 2-7 (of the 9 loci involved for the EA study) were permuted (6! -1 ways) Of the 719 orderings None produced MRHCs with fewer than 5 recombinants Only 5% had the same number of recombinants as the correct ordering Chances of recovering all four MRHCs was 20% and 0% when 2 and 6 marker loci were incorrectly orderedConclusions: Conclusions Both genotype errors and incorrect marker order can produce additional recombinants in reconstructing haplotypes Sensitivity analyses suggest that incorrect marker orderings may have a larger impact than genotyping errors Conclusions: Conclusions This haplotyping method is applicable to both STRs and SNP data Total computational requirement due to enumeration in a pedigree with J family members and L loci is on the order O(J2L3) Computational requirements for SNP data are 3-10 times larger than for STRs (more flexible loci) Shortcomings: Shortcomings A genotyped individual with neither genotyped parents nor genotyped offspring cannot be analyzed in this algorithm Same problem above, even if multiple siblings and other relatives are genotyped Likelihood-based methods are able to assign haplotypes to individuals who are uninformative using this rule-based method You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
CS374 2004 Lecture8 Haplotypes Cubemiddle Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 297 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Ronnie A. Sebro: Ronnie A. Sebro Haplotype reconstruction BMI 374 10/21/2004Mendelian Laws of Inheritance: Mendelian Laws of Inheritance Law of Segregation Alleles separate when gametes are formed Law of Independent assortment Allele pairs separate independently during formation of gametes Mendelian Inheritance Each offspring receives one allele from male parent, and the other from female parent Complex Diseases: Complex Diseases Polygenic or multifactorial diseases Run in families, but do not show Mendelian (monogenic) inheritance Complex interaction between disease susceptibility genes, and environmental factors Examples: asthma, schizophreniaFinding disease genes: Finding disease genes Two common methods employed Pedigree analysis Linkage analysis Affected individuals inherit/share the same portion of the genome Case-control analysis Association analysis Affected individuals have different allele frequencies (higher or lower) than controlsDefinitions: Definitions Marker – small segments of DNA with specific features Types of markers SNPs AATAA vs. AACAA Microsatellites (STRs) -CAGCAGCAG- vs. –CAGCAGCAGCAGCAG- Locus - physical position of a marker on a chromosome Homozygous – when both alleles at a locus are the same Heterozygous – when the alleles at a locus are differentDefinitions: Definitions Haplotype All alleles, one from each locus that are on the same chromosome Recombinant An individual who inherited a haplotype not identical to that inherited by his/her parent Phase Information about which alleles are inherited from each parent Example: Example Genotypes Haplotypes Enumerating Haplotypes: Enumerating Haplotypes Consider an individual heterozygous at 3 loci e.g 1 2 1 2 1 2 Several possible haplotypes Haplotype space can be potentially huge For n SNPs – 2n haplotypes Finding disease genes: Finding disease genes Both tests (association based tests, and pedigree linkage analysis tests) tentatively converge Convergence is at the point of requiring to find a haplotype/allele in tight association (LD) or inherited by all affected individuals Putative disease locus thereby identifiedWhy Haplotype?: Why Haplotype? Single allele vs. Haplotype Advantages of using haplotype Improved Power ! Disadvantages of using haplotype Haplotypes aren’t readily knownProblem: Problem Data generated from sequencer in the following format (SNPs) 1 1 0 0 1 1 1 1 1 1 2 1 2 0 0 0 2 2 2 2 1 2 1 3 1 2 1 1 2 1 2 1 2 1 4 1 2 0 1 2 1 2 2 2 Genotypes are known Haplotypes are unknown Pedigree Haplotyping: Haplotyping Haplotyping can be done at molecular level – whole genome derived haplotypes (ref. Douglas et al., 2001) Algorithms preferred because Lower cost of genotyping Fast and accurate algorithmsCurrent Haplotyping Algorithms: Current Haplotyping Algorithms Algorithms used for unphased data Clark Algorithm (Andy Clark @ Penn State) E-M Algorithm (Stephens et al. ) Bayesian Haplotype Inference (Jun Liu et al.)Clark Algorithm: Clark Algorithm Enumerate haplotypes which exist with certainty in the sample (individuals heterozygous at 0 or 1 loci) Assigns ambiguous haplotypes to those in the known list Solutions are dependent on the order in which the individuals with unresolved haplotype phase are entered The algorithm does not assume HW equilibriumE-M Algorithm: E-M Algorithm Estimate population haplotype probabilities is via maximum likelihood estimation; finding the values of the haplotype probabilities which optimize the probability of the observed data The maximum likelihood estimates of the haplotype probabilities are obtained by maximization of the likelihood This is a missing data problemAssumption of HW equilibrium Software EH (Xie and Ott, 1993) and EH+ (Zhao, Curtis and Sham) Bayesian Algorithm: Bayesian Algorithm A dirichlet prior distribution is used for the haplotype frequencies Uses a Gibbs sampler: enables handling of many SNP loci Implemented in program HAPLOTYPERErrata in data: Errata in data Genotyping Errors (quite common esp. with SNPs) Missing data MCAR MAR Non-ignorable missingness Marker order errorsOverview: Overview Discuss paper dealing with estimation of haplotypes in pedigrees (i.e. some information about phase) Minimum-Recombinant Haplotyping in Pedigrees (Qian & Beckmann) Useful for the HAPMAP project! Useful also for association analyses with the Transmission Disequilibrium Test (TDT) Paper 1: Paper 1 Minimum-Recombinant Haplotyping in Pedigrees Notation Methods (Algorithm) Results Shortcomings of algorithmRecombination Principles: Recombination Principles Minimum-Recombination Principle In nature, recombination is a rare event The most probable haplotypes are those that minimize the total number of recombinations needed in the pedigree Double-Recombinants Naturally these are even rarer events, especially over such short intervals (10cM) Notation: Notation Consider a pedigree of J family members and a set of L linked marker loci Individual – any family member Parent – a family member with at least 1 child Founder – a parent without his/her parents Offspring – a family member with at least one parentNotation: Notation Define individual “genotyped” at locus l iff: The genotype at locus l is known (from DNA) The genotype data can be determined from 1º relatives Ungenotyped parent (other genotyped) Informative if both haplotypes transmitted Partially informative only one haplotype transmitted Genotyped offspring Informative if at least one genotyped parentNotation: Notation Parental source (PS) – allele that is maternally or paternally derived Grandparental source (GS) – the parental source of each parental allele Notation: Notation For a nuclear family: denote the alleles of parent 1 denote the alleles of parent 2 denote the alleles of offspring j denote the paternal and maternal alleles of parent 1 denote the paternal and maternal alleles of parent 2 denote the paternal and maternal alleles of offspring j denote the GS of paternal and maternal alleles of individual j denote the minimum and maximum allele values, respectivelyNotation: Notation denotes PS-unknown genotype with alleles a and b denotes PS-known haplotype with paternal allele A and maternal allele B (ab) = (cd) denotes that genotypes (ab) and (cd) are equal (ab) ≠ (cd) denotes that genotypes (ab) and (cd) are not equal denotes that allele c is a constituent allele of genotype (ab) denotes that allele c is not a constituent allele of genotype (ab)Flexible Locus: Flexible Locus Type 1 If trio are all heterozygotes, and at least 1 parent and offspring not haplotypedFlexible Locus: Flexible Locus Type 2 Two alternative haplotype assignments at locus l in a founder result in equal number of recombinant offspringFlexible Locus: Flexible Locus Type 3 If two alternative haplotype assignments at locus l in offspring result in equal number of recombinants Rules: Rules Divide pedigree into nuclear trios Apply rules to each trio until all individuals haplotyped, or no further inference possible Rule 1: Input missing genotype at unambiguous loci in each parent conditional on spouse and child genotypes Rule 2: Assign haplotype at each unambiguous ocus in each offspring, conditional on parental genotypes Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family Rules: Rules Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trio Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations, retaining all configuration with the minimum recombinantsImplementation: Implementation Raw genotype dataImplementation: Implementation Rule 1: Impute missing genotype at each unambiguous locus in each parent, conditional on genotypes in spouse and offspringImplementation: Implementation Rule 2: Assign a haplotype at each unambiguous locus in each offspring, conditional on genotypes in parents in each parent-offspring trioImplementation: Implementation Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear familyImplementation: Implementation Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trioImplementation: Implementation Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on the haplotypes in offspring and the criterion of minimum recombinants in each nuclear familyImplementation: Implementation Second application of rules 2 and 3Implementation: Implementation Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations with alternative haplotype assignments at each flexible locus in these individuals. Retain all configurations with the minimum recombinants Reapplication of rule 3Results: Results A pedigree with Episodic ataxia 29 total individuals Genotyped at 9 polymorphic markers 2 individuals not genotyped Simulation study Looped marriage structure in a pedigree with ataxia telangiecstasiaResults: Results High degree of concordance with the maximum-likelihood method Identical haplotype configuration obtained with GENEHUNTER (ML based) in >99% of pedigrees analyzed. Simulation Results: Simulation ResultsGenotype Errors: Genotype Errors Impact of genotype errors investigated Generated genotype data on 1000 pedigrees, each pedigree containing one incorrect allele in a random individual at a random marker Mean number of recombinants increased from 5 to 6.2 (1.2) 44% of these additional recombinants were double recombinants All four correct MRHCs were reconstructed in 84% of pedigrees Marker errors: Marker errors The consequence of incorrect marker order on imputing haplotypes was investigated Marker loci 2-7 (of the 9 loci involved for the EA study) were permuted (6! -1 ways) Of the 719 orderings None produced MRHCs with fewer than 5 recombinants Only 5% had the same number of recombinants as the correct ordering Chances of recovering all four MRHCs was 20% and 0% when 2 and 6 marker loci were incorrectly orderedConclusions: Conclusions Both genotype errors and incorrect marker order can produce additional recombinants in reconstructing haplotypes Sensitivity analyses suggest that incorrect marker orderings may have a larger impact than genotyping errors Conclusions: Conclusions This haplotyping method is applicable to both STRs and SNP data Total computational requirement due to enumeration in a pedigree with J family members and L loci is on the order O(J2L3) Computational requirements for SNP data are 3-10 times larger than for STRs (more flexible loci) Shortcomings: Shortcomings A genotyped individual with neither genotyped parents nor genotyped offspring cannot be analyzed in this algorithm Same problem above, even if multiple siblings and other relatives are genotyped Likelihood-based methods are able to assign haplotypes to individuals who are uninformative using this rule-based method