Introduction to Microarrays :Stephen Sontum
Middlbury College
sontum@middlebury.edu Chapter 4: Basic Research with DNA Microarrays
A. Malcolm Campbell & Laurie J. Heyer Introduction to Microarrays
What we hope to learn Microarrays :What we hope to learn Microarrays Understand the principles of the microarray technique.
Appreciate the limitations of microarrays and problems associated with the technique.
Know what types of output are generated from different microarray analysis packages and what they mean.
Understand and be able to evaluate research papers about microarrays. Core
Why Learn about Microarrays ? :Why Learn about Microarrays ? Extremely useful and powerful technology – given a sample of human tissue, allows you to determine the expression level of all human genes within that tissue.
Now extremely widely used, not only in research laboratories but also within commercial companies and diagnostically in hospitals.
Many research articles written involving microarray data – bioinformatics is vital for understanding these data and results.
Microarrays are Popular :Microarrays are Popular The NYU Med Center collects about 3 GB of microarray data per week
NCBI GEO 80K curate sample sets
PubMed search "microarray"= 13,948 papers
2005 = 4406
2004 = 3509
2003 = 2421
2002 = 1557
2001 = 834
2000 = 294
What is a Microarray ? :What is a Microarray ? Mark Schena, one of the founders of the technology in early 1990s at Stanford, says microarrays need to be:
Microscopic ordered arrays of specific probes on a planar surface. Sample 1. Label sample Core
For Example - Protein Microarrays :For Example - Protein Microarrays Sample 1. Label sample Different Proteins Different Antibodies
Our Focus – Gene Expression Microarrays :Our Focus – Gene Expression Microarrays Sample 1. Label sample Different mRNAs Complementary DNA sequences
Gene Expression Microarrays –Key Concepts :Gene Expression Microarrays –Key Concepts 1. mRNA extraction Tissue biopsyor cell culture
NCBI GEO Overview :NCBI GEO Overview Platform A Platform record describes the list of elements on the array (e.g., cDNAs, oligonucleotide probesets, ORFs, antibodies) or the list of elements that may be detected and quantified in that experiment (e.g., SAGE tags, peptides).
Sample A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it.
Series A Series record defines a set of related Samples considered to be part of a group, how the Samples are related, and if and how they are ordered. A Series provides a focal point and description of the experiment as a whole. Series records may also contain tables describing extracted data, summary conclusions, or analyses. Core
Goals of a Microarray Experiment :Goals of a Microarray Experiment Find the genes that change expression between experimental and control samples
Classify samples based on a gene expression profile
Find patterns: Groups of biologically related genes that change expression together across samples/treatments Core
Two Major Gene Expression Microarray Technologies :Two Major Gene Expression Microarray Technologies Spotted Arrays AffymetrixGeneChips We will examine their manufacture and their use. Core
Spotted Microarrays – Manufacture :Spotted Microarrays – Manufacture DNA probes spotted onto the microarray can either be cDNA created by PCR or synthetic oligonucleotides.
The probes are physically spotted onto particular positions on a glass slide using a robot and immobilized using specific surface chemistry. Pat Brown has plans $30K
Spotted Microarrays – Manufacture :Spotted Microarrays – Manufacture Spotted microarrays can
also be manufactured
‘by hand’.
Spotted Microarrays – Example Use :Spotted Microarrays – Example Use Total mRNA was extracted from bone marrow taken from patients with ALL and AML.
Converted to cDNA by reverse transcription (since mRNA is sensitive to degradation). T.R. Golub et. al (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, vol. 286, 531-537. As an example, we will look at detecting differences between acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) . This was one of the first successful uses of microarrays in cancer classification.
Spotted Microarrays – Two Channels :Spotted Microarrays – Two Channels AML cDNAs ALL cDNAs Cy3 DyeLabel Cy5 DyeLabel Mix and hybridize onto slide Spot Interpretation
Green Spot – higher expression in ALL.
Red Spot – higher expression in AML.
Yellow Spot – equal expression in both.
Black Spot – not expressed in either. Core
Affymetrix GeneChip –Manufacture :Affymetrix GeneChip –Manufacture Oligonucleotides are synthesized one nucleotide at a time on the surface of a quartz wafer using photolithographic chemistry in clean room conditions.
Affymetrix “Gene chip” system :Affymetrix “Gene chip” system Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene) 20,000 genes/chip
RNA labeled and scanned in a single “color
Arrays get smaller every year (more genes)
Chips are expensive
Proprietary system: “black box” software, can only use their chips
Affymetrix GeneChip –Multiple Probes per Gene :Affymetrix GeneChip –Multiple Probes per Gene
Affymetrix GeneChip – Single Channel :Affymetrix GeneChip – Single Channel AML cDNAs ALL cDNAs Biotin Label Hybridize Scan Core
Example Yeast grown in Oxygen :Example Yeast grown in Oxygen 21 © 2003 Discovering genomics AM Campbell LJ Heyer GenomicsPlace
Measuring Fluorescence :Measuring Fluorescence Why is there a dark center in the middle of each spot?
What differences and similarities does a DNA chip have with a southern blot? 22 © 2003 Discovering genomics AM Campbell LJ Heyer
Northern Blot Comparison :Northern Blot Comparison 23 © 2003 Discovering genomics AM Campbell LJ Heyer
Oxygen and Gene Expression :Oxygen and Gene Expression 24 © 2003 Discovering genomics AM Campbell LJ Heyer What color would could we used to represent gene expression?
Oxygen and Gene Expression :Oxygen and Gene Expression 25 © 2003 Discovering genomics AM Campbell LJ Heyer Where any of the genes transcribed similarly?
Oxygen and Gene Expression :Oxygen and Gene Expression 26 © 2003 Discovering genomics AM Campbell LJ Heyer
Image Analysis – Spotted Arrays :Image Analysis – Spotted Arrays Gridding Segmentation and background extraction Complications include spots lying on curves, spots of different shapes and sizes, variation in background fluorescence, etc.
Image Analysis – Affymetrix :Image Analysis – Affymetrix But Affymetrix can still suffer from surface defects such as scratches and visual inspection is important. Affymetrix manufacturing and processing is much more controlled and thus image analysis is more straightforward (generally using their own software).
Data Acquisition :Data Acquisition Scan the arrays
Quantitate each spot
Subtract background
Normalize
Export a table of fluorescent intensities for each gene in the array Core
Why do we need Normalization ? :Why do we need Normalization ? Variation in gene expression values can either be:
Variations that we are interested in, for example caused by particular sample treatments or disease states (Signal)
Variations that we are not interested in, for example caused by differences in the chip spotting process, sample handling and labelling, quantity of sample applied to the microarrays, hybridization time and temperature, image scanning parameters, image processing, etc. (Systematic Bias or Noise)
Normalization is a term used to describe a collection of methods which try to eliminate the unwanted variations between sample expression values while retaining the variation that we are interested in.
These problems can be demonstrated by microarraying the same sample twice and comparing the expression values.
10 Minute Activity :10 Minute Activity The paper examines two types of acute leukemia:acute myeloid leukamia (AML) and acute lymphoblastic leukemia (ALL)
Bone marrow samples were taken from 27 ALL and 11 AML patients
RNA was extracted and hybridized to Affymetrix microarrays, which were then scanned.
First spend 5 minutes by yourself writing down:
- The variation that this experiment wishes to detect (signal).
- Causes of variation that this experiment is not interested in (noise).
- If this experiment used solid tissue biopsies – what other factors of variation may result?
Then form into groups of 2 or 3 people for 5 minutes and compare your lists – do you agree on the causes of variation?
Sources of Variability :Sources of Variability Image analysis (identifying and quantitating each spot on the array)
Scanning (laser and detector, chemistry of the flourescent label))
Hybridization (temperature, time, mixing, etc.)
Probe labeling
RNA extraction
Sample variability Core
Self-against-Self Experiment :Self-against-Self Experiment Ideally, all data points should lie on this line. This shows random noise andsystematic bias.
MVA Plot Shows This Better :MVA Plot Shows This Better M = log(chip1/chip2)
= log(chip1) – log(chip2)
Essentially the ‘difference’ in log intensity between chip1 and chip2. A = (log(chip1) + log(chip2))/2
Essentially the ‘average log intensity’ for chip1 and chip2, or average signal. The red lowess regression line indicates that we have systematic bias which depends on signal. http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm Core
Lowess Normalization :Lowess Normalization This type of normalization corrects any systematic bias which is intensity dependent. It assumes similar numbers of genes will be upregulated and downregulated, and shifts the lowess fit to lie along the x-axis. BEFORE AFTER
Normalization :Normalization Can control for many of the experimental sources of variability (systematic, not random or gene specific)
Bring each image to the same average brightness
Can use simple math or fancy -
divide by the mean (whole chip or by sectors)
LOESS (locally weighted regression)
No sure biological standards
Other Types of Biases :Other Types of Biases It is always better, but not always possible, to minimize these unwanted variations through good Experimental Design and Experimental Procedures.
For instance dye bias can be removed by carrying out replicate samples in which the dyes have been swapped (dye swap). Other types of biases which may need correcting include:
Dye biases due to different behaviours of fluorescent dyes.
Spatial biases across the microarray surface.
Print tip biases due to physical differences between printing tips.
Sample Replicates :Sample Replicates In the ALL and AML study we have 27 independent samples from patients with ALL and 11 independent samples from patients with AML.
These are biological replicates, in that they provide information about the variation between biological samples.
Replicates are essential in microarray studies, they not only make the mean expression values more accurate (reducing random noise), but also provide information about the variability of a particular expression value in the natural population (essential for hypothesis testing … to come later). Core