logging in or signing up Next Generation Sequencing vasu2891 Download Post to : URL : Related Presentations : Let's Connect Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Copy embed code: Embed: Flash iPad Dynamic Copy Does not support media & animations Automatically changes to Flash or non-Flash embed WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 3709 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 17, 2013 This Presentation is Public Favorites: 3 Presentation Description No description available. Comments Posting comment... By: vasu2891 (7 month(s) ago) 1600 views But NO comments!! Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Next Generation Sequencing NGS: Next Generation Sequencing NGS Vasanthan V Project Trainee @ RGCBCONTENTS: CONTENTS Introduction History Use of Sequencing Sanger Sequencing NGS Sanger vs NGS Types of NGS Methodology Software's File Formats Bioinformatics for NGS Recent TrendsINTRODUCTION: INTRODUCTION DNA sequencing - the process of determining the precise order of nucleotides within a DNA molecule. includes any method or technology that is used to determine the order of the four bases— Adenine(A), Guanine(G), Cytosine(C), and Thymine(T)—in a strand of DNA. advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.PowerPoint Presentation: Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as diagnostic , biotechnology , forensic biology, and biological systematics.PowerPoint Presentation: The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.PowerPoint Presentation: Reason of delay in DNA Sequencing :- chemical properties chain length existence of only four bases in DNA no base-specific DNAases RNA Sequencing:- Escherichia coli alanine tRNA was the first nucleic acid molecule to be sequenced by Holley and coworkers in 1965 DNA Sequencing technologies developed since 1975Principle of Sequencing:- : Principle of Sequencing:- Sequencing reaction :- Based on PCR, reaction mixture having ddNTPs . Separation by Gel Electrophoreses. Detection on an automated sequencer . Assembling of the sequenced parts of a geneHISTORY: HISTORY The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with automated analysis, DNA sequencing has become easier and orders of magnitude faster .EVOLUTION IN SEQUENCING TECHNOLOGY: EVOLUTION IN SEQUENCING TECHNOLOGYSequencing the Human Genome: 11 2010: 5K$, a few days? 2009: Illumina, Helicos 40-50K$ Sequencing the Human Genome Year Log 10 (price) 2010 2005 2000 10 8 6 4 2 2012: 100$, <24 hrs ? 2008: ABI SOLiD 60K$, 2 weeks 2007: 454 1M$, 3 months 2001: Celera 100M$, 3 years 2001: Human Genome Project 2.7G$, 11 yearsUSE OF SEQUENCING: USE OF SEQUENCING to determine the sequence of individual genes, larger genetic regions (i.e. clusters of genes or operons), full chromosomes or entire genomes. depending on the methods used, sequencing may provide the order of nucleotides in DNA or RNA isolated from cells of animals, plants, bacteria, archaea, or virtually any other source of genetic information. resulting sequences may be used by researchers in molecular biology or genetics to further scientific progress or may be used by medical personnel to make treatment decisions or aid in genetic counseling.SANGER SEQUENCING: SANGER SEQUENCINGPowerPoint Presentation: Utilizes dideoxunucleotides triphosphates to terminate DNA chain elongation. Separation of molecules by gel or capillary electrophoresis and detection of dye- labelled terminator Can be used to interrogate the sequence of single samples. 96 samples can be run in SBS format with 24-36 runs per day. single instrument can generate 1-2 million bases per day. SANGER SEQUENCING PIPELINE: SANGER SEQUENCING PIPELINE Picking & Growth Template Preparation PCR Cycling & CleanUp Seq. Setup / Cycling PCR Amplicons Plasmids Ready to Sequence Templates 3730 XL Analysis PCR Clean up PCR Normalizer Library ConstructionNEXT GENERATION SEQENCING - NGS: NEXT GENERATION SEQENCING - NGS Employs micro- and nanotechnologies to reduce the size of sample components, reducing reagent costs, and enabling massively parallel sequencing reactions. Highly multiplexed, allowing simultaneous sequencing and analysis of millions of samples. became commercially available around 2005 the first using Solexa sequencing technology several different sequencing methods have been developed, all of which are continually being improved at an astonishing rateSanger vs NGS: Sanger vs NGS‘massively parallel’ sequencing - NGS: ‘massively parallel’ sequencing - NGS All commercially-available sequencers share the following attributes : Random fragmentation of starting DNA Ligation with custom linkers Library amplification on a solid surface (either bead or glass) Direct step-by-step detection of each nucleotide base incorporated during the sequencing reaction Hundreds of thousands to hundreds of millions of reactions imaged per instrument run = “massively parallel sequencing” Shorter read lengths than capillary sequencersTypes of NGS: Types of NGS largely be grouped into three main types: sequencing by synthesis, sequencing by ligation, and single-molecule sequencing.Sequencing by synthesis: Sequencing by synthesis Like Sanger sequencing, NGS techniques largely determine base composition through the detection of chemiluminescence created by nucleotide incorporation during synthesis of the complementary DNA strand by DNA polymerase. Sanger sequencing uses dideoxynucleotide chain termination to determine the sequence of nucleotides in a DNA strand, requiring many template fragments of varying sizes. In sequencing by synthesis, DNA is fragmented to the appropriate size, ligated to adaptor sequences, and then clonally amplified to enhance the fluorescent or chemical signal. Templates are then separated and immobilized in preparation for flow-cell cycles. Although the various methods use different chemistry, all use sequential washes of nucleotides along with varying chemistries for fluorescence or chemical detection of nucleotide incorporation. The three methods we group under sequencing by synthesis also differ by read length and how templates are amplified and immobilized.Roche 454 pyrosequencing: Roche 454 pyrosequencing In Roche 454 pyrosequencing ( http://www.my454.com ), a single, primed DNA template is adhered to a microbead and amplified using emulsion PCR. Each bead is then placed in a well of a PicoTiterPlate , which is put into a flow cell where it is incubated with DNA polymerase, ATP sulfurylase , luciferase, and apyrase along with the substrates luciferin and adenosine 5′-phosphosulfate (ASP). When DNA polymerase incorporates an appropriate dNTP into the new strand, pyrophosphate is released, which is converted to adenosine triphosphate (ATP) in the presence of ASP. ATP then reacts with luciferase to catalyze the conversion of luciferin to oxyluciferin , releasing light in proportion to the amount of ATP produced by dNTP incorporation ( Ronaghi et al., 1998 ; Nyren , 2007 ). All unused ATP and nucleotides are then removed by apyrase , washed away, and a new chemical mixture is washed over the DNA templates. This procedure is repeated many times until the DNA template has elongated.PowerPoint Presentation: The fluorescent light produced by nucleotide incorporation is detected by a camera and analyzed to produce the string of nucleotides that is the DNA sequence. Although Roche 454 started out with read lengths of ∼100 bp , the technology has steadily improved to where read lengths are now comparable to that produced by Sanger sequencing (∼800 bp ), producing ∼700 Mb total from ∼1 million reads. Because of its longer reads, this platform is often used in genomic or transcriptomic sequencing when de novo assembly is involved and was employed or reviewed in several studies in this issue (e.g., Buggs et al., 2012 ; He et al., 2012 ; Lai et al., 2012 ; Strickler et al., 2012 ; Zalapa et al., 2012 ).Sequencing: Pyrosequencing (454): 25 Sequencing: Pyrosequencing (454) Complementary strand elongation: DNA PolymerasePowerPoint Presentation: Roche 454 technologyGS FLX+ System: GS FLX+ System Now delivering sequencing reads up to 1,000 bp in length! GS FLX+ System Sequencing Kit New! GS FLX Titanium XL+ GS FLX Titanium XLR70 Read Length Up to 1,000 bp Up to 600 bp Mode Read Length 700 bp 450 bp Throughput Profile - 85% of total bases from reads >500 bp - 45% of total bases from reads >700 bp - 85% of total bases from reads > 300 bp - 20% of total bases from reads > 500 bp Typical Throughput 700 Mb 450 Mb Reads per Run ~1,000,000 shotgun ~1,000,000 shotgun, ~700,000 amplicon Consensus Accuracy* 99.997% 99.995% Run Time 23 hours 10 hours Sample Input gDNA or cDNA gDNA , cDNA , or amplicons (PCR products)Illumina: Illumina Initially developed by Solexa , the Illumina Genome Analyzer uses solid-phase bridge amplification in which 5′ and 3′ adapters are ligated to each end of a DNA template ( http://www.illumina.com ). One end of the fragment is then attached to the substrate. The adapters hybridize to immobilized forward or reverse primers, creating a bridge that facilitates amplification, generating amplicons that remain attached to the substrate, thus forming clusters of identical templates, which enhances chemiluminescent detection. Millions of such clusters are formed within each channel of the flow cell. Following amplification, the DNA amplicons are denatured and primed.PowerPoint Presentation: Elongation is conducted through a series of cyclical washes, the first being the addition of a mixture of all four nucleotides, each labelled with a different fluorophore and modified as 3′- O -azidomethyl reversible terminators. Following image capture, elongation continues after the fluorescent dye moiety is cleaved and the 3′-OH group is restored through reaction with tris (2-carboxyethyl)phosphine. This cycle is repeated until the DNA fragment has been synthesized to its target length. On a dual flow cell, the HiSEq . 2000 system can now produce ∼6 billion paired-end reads for a total of 540–600 Gb in ∼11 d run time. This method is currently the most widely used NGS platform and is used or reviewed by most studies.Sequencing: Fluorescently labeled Nucleotides (Solexa): 31 Sequencing: Fluorescently labeled Nucleotides (Solexa) Complementary strand elongation: DNA PolymerasePowerPoint Presentation: Illumina SolexaSystems / HiSeq Systems Comparison: Systems / HiSeq Systems Comparison Any application. Any study . Run Mode High Output Rapid Run High Output High Output Rapid Run High Output Output (2 x 100 bp) 600 Gb 120 Gb 600 Gb 300 Gb 60 Gb 300 Gb Run Time (2 x 100 bp) ~11 days ~27 hours ~11 days ~8.5 days ~27 hours ~8.5 days Paired-end Reads 6 Billion 1.2 Billion 6 Billion 3 Billion 600 Million 3 Billion Single Reads 3 Billion 600 Million 3 Billion 1.5 Billion 300 Million 1.5 Billion Maximum Read Length** 2 × 100 bp 2 × 150 bp 2 × 100 bp 2 × 100 bp 2 × 150 bp 2 × 100 bp HiSeq 2500 HiSeq 2000 HiSeq 1500 HiSeq 1000Ion Torrent: Ion Torrent The Ion Torrent system ( http://www.iontorrent.com ) is unique among NGS technologies in that the detection for sequencing is not based upon fluorescent dyes but rather measuring the pH change as the result of the release of a H + ion upon nucleotide incorporation using semiconductor technology ( Rothberg et al., 2011 ). By sequentially adding nucleotides, the machine is able to detect which nucleotide has been incorporated into the growing strand. There are now two systems available that use this technology, the Ion PGM , for laboratory applications, and the new Ion Proton , which provides higher throughput. The new Proton system is touted to have 165 million sensors with up to a 250-bp read length upon release of the next hardware chip, projected to have 660 million sensors.PowerPoint Presentation: For both the PGM and the new Proton systems, each hardware chip improvement increases the throughput. With the new 318 chip set, the PGM sequencer can produce over 1000 Mb of sequence with 11.1 million sensors. The other allure of the Ion systems is that sample preparation costs are relatively low compared to other systems. Publications on research that has utilized the Ion Torrent system currently focus on the shotgun sequencing of microbial genomes (e.g., Howden et al., 2011 ; Rothberg et al., 2011 ), but this system has clearly made its way into programs pursuing plant-based objectives.PowerPoint Presentation: Ion Proton™ System performance specifications with Ion PI™ Chip Throughput Up to 10 Gb (Note: The Ion PII™ Chip* will be available about six months after the Ion PI™ Chip. Ion PII™ Chip* will enable sample-to-variant analysis of a human genome in a single day, at up to 20x coverage.) Read length Up to 200-base fragment reads Number of reads passing filter 60-80 million reads passing filter Sequencing run time 2-4 hours Key applications Human-scale genome sequencing ChIP sequencing Whole transcriptome sequencingExome sequencing Methylation analysisGene expression by sequencing Small genome sequencing De novo sequencing Small RNA sequencingGene sequencing Areas of interest Agricultural research Stem cell research Cancer research Epigenomics Ancient DNA genomics Forensic scienceMetagenomics Data formats Industry standard FASTQ, SFF, BAM and VCF format outputsSequencing by ligation: Sequencing by ligation Sequencing by synthesis uses DNA polymerase as the elongation engine during DNA sequence determination. Sequencing by ligation methods harness the mismatch sensitivity of DNA ligase to determine the sequence of nucleotides in a given DNA strand ( Landegren et al., 1988 ). These methods use oligonucleotide probes of varying lengths, which are labelled with fluorescent tags, depending on the nucleotide(s) to be determined. The fragmented DNA templates are primed with a short, known anchor sequence, which allows the probes to hybridize.PowerPoint Presentation: DNA ligase is added to the flow cell and joins the fluorescently labelled probe to the primer and template. Fluorescence imaging is performed to determine which probe was incorporated. This process is repeated using different sets of probes to query the DNA template and assess the sequence of nucleotides. The methods we describe here differ in their probe usage and read length .SOLiD: SOLiD Life Technologies/Applied Biosystems ( http://www.appliedbiosystems.com ) has created the S upport O ligonucleotide LI gation D etection ( SOLiD ) platform that utilizes sequencing by ligation to determine DNA sequence composition. Fragmented or mate-paired, primed libraries are enriched using emulsion PCR on microbeads , which are then adhered to a glass slide. A set of four 1,2-probes, each labelled with a different fluorophore , is added to the flow cell. The first two positions comprise a known di-base pair specific to the fluorophore ; these bases query the first and second positions following the hybridized primer. Bases three to five are degenerate bases separated from bases six to eight, made up of universal inosine bases, by a phosphorothiolate linkage.PowerPoint Presentation: A matching 1,2-probe is ligated to the primer by DNA ligase. Following fluorescence imaging to determine which 1,2-probes were ligated, silver ions cleave the phosphorothiolate link, thus regenerating the 5′phosphate group for subsequent ligation ( McKernan et al., 2005 ). This process of ligation, detection, and cleavage is repeated several more times, extending the complementary strand to a length determined by the number of cycles. After sufficient length is reached, the extended product is removed, the process begun anew, and the template reset with a primer complementary to the n − 1 position of the previous round of primers.PowerPoint Presentation: The template is extended through the series of ligations, then reset four more times. This primer reset process results in each template base being queried twice, a check and balance system that is determined through the creation and alignment of a series of color images analyzed through space and time to determine the actual DNA sequence. This method is often used in resequencing studies (e.g., Ashelford et al., 2011 ), transcriptomics , or in genomic sequencing along side other technologies (e.g., Shulaev et al., 2011 ).Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD): 43 Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD ) Complementary strand elongation: DNA LigaseSequencing: Fluorescently Labeled Nucleotides (ABI SOLiD): 44 Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD ) 5 reading frames, each position is read twice5500 W SerieS Genetic AnAlyzerS: 5500 W SerieS Genetic AnAlyzerS Key advancements lower cost Up to 50% savings in running cost with direct library amplification Simplified workflow Template preparation reduced to 1/4 the time by replacing beads with direct amplification on FlowChip Increased throughput Faster time to result from high density sequencing colonies that provide 2–4X higher throughputSingle Molecule Sequencing: HeliScope: 47 Single Molecule Sequencing: HeliScope Direct sequencing of DNA molecules: no amplification stage DNA fragments are attached to array Potential benefits: higher throughput, less errorsSingle Molecule Real Time Sequencing (also known as SMRT): S ingle M olecule R eal T ime Sequencing (also known as SMRT ) Zero-mode waveguide Single DNA polymerase enzyme is affixed DNA to bottom of a The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase.SMRT SEQUENCING ADVANTAGE: SMRT SEQUENCING ADVANTAGE Sequencing with the PacBio RS system based on single molecule, real-time (SMRT) technology offers abundant benefits, especially : Extraordinarily long reads : Produce reads with average lengths of 3,000 to 5,000 bp , with the longest reads over 20,000 base pairs. Extremely high accuracy : Perform de novo assembly of genomes and detect variants with greater than 99.999% accuracy . Sequence individual molecules with 99% accuracy at greater than Sanger lengths. Exquisite sensitivity : Detect minor variants that are present at a frequency less than 0.1 %. Additionally, the PacBio RS offers: Shortest run time : Sequence for as little as 30 minutes and still get reads longer than Sanger sequencing. Least GC bias : Sequence through regions of extremely high or low GC content with ease, resulting in more even coverage. No amplification bias : Samples need not be amplified, improving coverage uniformity and avoiding artifacts .PowerPoint Presentation: Comparison of next-generation sequencing methods Method Single-molecule real-time sequencing (Pacific Bio) Ion semiconductor (Ion Torrent sequencing) Pyrosequencing (454) Sequencing by synthesis (Illumina) Sequencing by ligation (SOLiD sequencing) Chain termination (Sanger sequencing) Advantages Longest read length. Fast. Detects 4mC, 5mC, 6mA . Less expensive equipment. Fast. Long read size. Fast. Potential for high sequence yield, depending upon sequencer model and desired application. Low cost per base. Long individual reads. Useful for many applications. Disadvantages Low yield at high accuracy. Equipment can be very expensive. Homopolymer errors. Runs are expensive. Homopolymer errors. Equipment can be very expensive. Slower than other methods. More expensive and impractical for larger sequencing projects.Assessing quality: phred scores: Assessing quality: phred scores Q = --10log 10 P P=error probability of a given base call.Software tools for NGS: Software tools for NGS Galaxy provides a web-based application for the analysis of sequence data. Includes tools for the analysis and manipulation of NGS data. Simple and extensible interfacePowerPoint Presentation: Alignment! If a “reference” genome exists for the organism you are sequencing, reads can be “aligned” to the reference. This involves finding the place in the reference genome that each read matches to. Due to high sequence similarity within members of the same species, most reads should map to the reference.PowerPoint Presentation: Tools for generating alignments! There are MANY software packages available for aligning data from next generation sequencing experiments.! Two of the most popular are:! – BWA: http://bio-bwa.sourceforge.net! – Bowtie: http://bowtie-bio.sourceforge.net! Both utilize the “Burrows-Wheeler Transform.”!PowerPoint Presentation: Alignment formats! SAM (Sequence Alignment/Map) format has become the de facto standard for storing alignment data. BAM is a binary version of SAM allowing more efficient storage.PowerPoint Presentation: SAM tools SAMtools provides a command line interface for manipulation of SAM/BAM formatted data.! Open source and multi-platform (R package available: Rsamtools ). Able to: – Extract reads from specific genomic region – Operate on remote files – Much more…PowerPoint Presentation: Visualization Many (many!) genome browsers available: – UCSC Genome Browser – Ensembl – Gbrowse – 1000 Genomes Browser – Integrative Genomics Viewer (IGV) - SeqMonkSequence file formats: Sequence file formats There are a lot of file sequence formats. They include different information about the sequence. The most common file formats in the NGS world are: sff , fastq and fasta . Every program have different requirement, so every program asks for different file formats.FASTQ: FASTQ The FASTQ format allows the storage of both sequence and quality information for each read.! This is a compact text-based format that has become the de facto standard for storing data from next generation sequencing experiments.Fastq format!: Fastq format! @HWUSI-EAS582_157:6:1:1:1501/1 NCACAGACACACACGAACACACAAAGACATGCCCATATGAAGAT + %.7786867:778556858746575058873/347777476035 @HWUSI-EAS582_157:6:1:1:1606/1 NCTGGCACCTTGATTTTGGACTTCCCAGCCTCCAGAACTGTGAG + %1948988888798988366898888648998788898888588 @HWUSI-EAS582_157:6:1:1:453/1 NCTGCTTGCACCCCTGAAGTCACTGATCACATTTCAGGGTCACC + %/868998988888867668888986644788988413488885 @HWUSI-EAS582_157:6:1:1:1844/1 NGATTGACATTGGCAAAGAGGACAACTGATTGCAAACTTCACAC + %-7;:::::;86499;75574586::635:62687666887879 @HWUSI-EAS582_157:6:1:1:1707/1 NAGGCTCAGGCGCACGGCCTACATCGTCGCTGTCGGCCAAGGGG + “Read” (sequence)! Quality scores (phred-33)!FastQC: FastQC Simple java-based tool for quality assessment of next generation sequencing data. Takes FASTQ file as input and generates multiple QC plots. No ability to customize or interact with plots.SFF: SFF The SFF (Standard Flowgram Format) files are the 454 equivalent to the ABI chromatogram files. These files hold the information about: the flowgram , the called sequence, the qualities, and the recommended quality and adaptor clipping.Fasta : Fasta The fasta format is the most simple one. Each sequence starts with a “>” followed by the sequence name, an space and, optionally, the description >seq_1 description GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT >seq_2 ATCGTAGTCTAGTCTATGCTAGTGCGATGCTAGTGCTAGTCGTATGCATGGCTATGTGTGIllummina fastq: Illummina fastq This file is almost identical to a sanger fastq file, but the encoding for the quality scores is different. When we deal with a fastq file we have to be sure about which kind of file we are dealing with, an illumina fastq or a sanger fastq . Unfortunately they are not easy to differentiate. Also you have to take into account that solexa used to had a third fastq format, the solexa fastq , although this one is mostly obsoleted. Recently Illumina has also decided to distribute its files as Sanger fastq , so the Illumina fastq will be not used any more. One of the seq_crumbs utilities, guess_seq_format , is able to differentiate the Sanger from the Illumina version by looking for quality characters exclusive of the Sanger version.Compressed files: Compressed files Sometime these sequence text file can be found compressed to save up hard drive space. The most common compression formats are gzip and bgzip . bgzip is a gzip variant commonly used in genomics because, although it is a little less efficient in the compression ratio, it allows random access. More a more software is becoming compatible with this formats.SRA: SRA The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD ® System, Helicos Heliscope ®, Complete Genomics®, and others.Read/Sequence Format: Read/Sequence Format SRF (Sequence Read Format) was designed for being the single format capable of storing data generated by any DNA sequencing technology. However, there seems not many usage and mentioning after 2008. FASTA is one of the earliest standard formats and is supported by Blast. Sometimes also given as FNA or FAA ( Fasta Nucleic Acid or Fasta Amino Acid). FASTQ is a representation file format for sequence data with quality. It is used as input for a lot of programs. Sometimes also given as QUAL. SFF (Standard Flowgram Format) is used as output by the 454 sequencers. SCARF - A standard Illumina output format, for sequence data with quality. AB1 - Chromatogram files used by instruments from Applied Biosystems EMBL is a flat file format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databasesBrowser/Visualization Format: Browser/Visualization Format Some commonly used formats are listed below: SAM format is a compact and index-able representation of alignment results. It is the output format of many popular alignment tools, e.g., Bowtie, BWA, SOAP2, Illumina GA pileline , MAQ, BLAST, etc. BAM is the compressed binary version of the SAM format. It provides an efficient way of displaying very large alignment results in UCSC browser. WIG format is for display of dense, continuous data such as GC percent, probability scores, and transcriptome data. BED is the main format to define the data lines that are displayed in an annotation track. GFF is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GTF is a refinement to GFF that tightens the specification.Next-Gen Sequencing of mRNAs: Next-Gen Sequencing of mRNAs cDNA = complementary or copy DNA EST = Expressed Sequence Tag The microarray could be described as a “closed system” because information about RNAs is limited by the targets available for hybridization. RNAs not represented on the array are not interrogated. Direct sequencing of cDNAs overcomes this problem by large-scale random sampling of sequences from a whole-cell RNA extract Statistical counting of distinct sequences provides a precise estimate of expression level cDNA library can be normalized to capture rare messages Has been dramatically enabled by large scale sequencingmRNA Sequencing: Preparation of a cDNA library in phage λ vector: mRNA Sequencing: Preparation of a cDNA library in phage λ vectorBIOINFORMATICS FOR NEXT GENERATION SEQUENCING: BIOINFORMATICS FOR NEXT GENERATION SEQUENCING The transforming aspect of the Human Genome Project was not the completion of the genome sequence itself, but rather the technologies that enabled, and were enabled by, the sequencing of that first reference genome. The evolution of ‘ omic science through microarray transcriptomics , metabolomics, proteomics, and wholegenome SNP- omics has in many ways come full circle with a new focus on genomics and genome sequencing. Next-generation sequencing technologies have begun to revolutionise genomics and their effects are becoming increasingly widespread.PowerPoint Presentation: The 1000 genomes project (http://www.1000genomes.org/) will create a new map of genetic variation for genome going far beyond the detail captured in the HapMap . Other projects are helping to catalogue genes involved in cancer, alternative splicing in different tissues and transcription factor binding, for example . The growing number of robust applications and the steadily falling cost for generating sequence-based data suggest that these next-generation technologies will continue to rapidly open new applications in the biological sciences and generate new opportunities for software and algorithm development.PowerPoint Presentation: Given the vast amount of data produced (currently greater than a gigabase per run, with this constantly increasing as well), developing a sound data storage and management solution and creating informatics tools to effectively analyze the data are essential to successful application of the technology. During 2008, a large number of new software applications and algorithms have been developed to deal with this new data.PowerPoint Presentation: A recent advert in Nature from the Illumina , one of the providers of next-generation sequencing technology, highlighted significant papers in the area of bioinformatics 16 listed papers. In addition to those cited in the advertisement, there have been many other tools and algorithms published in Bioinformatics that are relevant to next-generation sequencing applications. To celebrate this contribution they have gathered these together in a ‘Bioinformatics for Next Generation Sequencing’ virtual issue ( http:// www.oxfordjournals.org/our_journals/bioinformatics/ nextgenerationsequencing.html ). This will be a living resource that we will continually update to include the very latest papers in this area to help researchers keep abreast of the latest developments.PowerPoint Presentation: To date, the majority of the papers have described methods to take the short sequences produced by the Illumina Genome Analyzer and Applied Biosystems SOLiD machines and align them to a reference genome. This is a crucial and basic requirement for many applications and a variety of techniques have been applied to make the tools sufficiently fast to deal with millions of sequences. Now that there are many of these tools available the Bioinformatics community has begun to make applications that are useful for specific applications such as identifying likely sites of interaction in CHIP-seq .ngs_backbone: ngs_backbone ngs_backbone is a bioinformatic application created to work on sequence analysis by using NGS (Next Generation Sequencing) and sanger sequences. It is capable of cleaning reads, do de novo assembly or mapping against a reference and annotate SNPs, SSRs, ORFs, GO terms and sequence descriptions.PowerPoint Presentation: A new generation of non-Sanger-based sequencing technologies has delivered on its promise of sequencing DNA at unprecedented speed, thereby enabling impressive scientific achievements and novel biological applications. However, before stepping into the limelight, next-generation sequencing had to overcome the inertia of a field that relied on Sanger-sequencing for 30 years.REFERENCES: REFERENCES BIOINFORMATICS Vol . 25 no. 4 2009, pages 429 doi:10.1093/bioinformatics/btp037 https:// wiki.nbic.nl/index.php/NGS_File_Format applications of next-generation sequencing : Sequencing technologies —the next generation - Michael L. Metzker *‡ nature biotechnology 46 | jANuARy 2010 | vOLume 11 Introduction to the analysis of next generation sequencing data - Mik Black, University of Otago , Cristin Print, The University of Auckland ! Next-generation DNA sequencing -Jay Shendure1 & Hanlee Ji2 - nature biotechnology volume 26 number 10 OCTOBER 2008 You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.