Presentation Transcript
MicroArray Image Analysis: MicroArray Image Analysis Robin Liechti
robin.liechti@ie-bpv.unil.ch
Microarray analysis: Microarray analysis Array construction, hybridisation, scanning
Quantitation of fluorescence signals
Data visualisation
Meta-analysis (clustering)
More visualisation
Technical: Technical
Experimental design: Experimental design Track what’s on the chip
which spot corresponds to which gene
Duplicate experimental spots
reproducibility
Controls
DNAs spotted on glass
positive probe (induced or repressed)
negative probe (bacterial genes on human chip)
oligos on glass or synthesised on chip (Affymetrix)
point mutants (hybridisation plus/minus)
Images from scanner: Images from scanner Resolution
standard 10m [currently, max 5m]
100m spot on chip = 10 pixels in diameter
Image format
TIFF (tagged image file format) 16 bit (65’536 levels of grey)
1cm x 1cm image at 16 bit = 2Mb (uncompressed)
other formats exist e.g.. SCN (used at Stanford University)
Separate image for each fluorescent sample
channel 1, channel 2, etc.
Images in analysis software: Images in analysis software The two 16-bit images (cy3, cy5) are compressed into 8-bit images
Goal : display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image
RGB image :
Blue values (B) are set to 0
Red values (R) are used for cy5 intensities
Green values (G) are used for cy3 intensities
Qualitative representation of results
Images : examples: Images : examples
Processing of images: Processing of images Addressing or gridding
Assigning coordinates to each of the spots
Segmentation
Classification of pixels either as foreground or as background
Intensity extraction (for each spot)
Foreground fluorescence intensity pairs (R, G)
Background intensities
Quality measures
Addressing (I): Addressing (I) The basic structure of the images is known (determined by the arrayer)
Addressing (II): Addressing (II) The measurement process depends on the addressing procedure
Addressing efficiency can be enhanced by allowing user intervention (slow!)
Most software systems now provide for both manual and automatic gridding procedures
Segmentation (I): Segmentation (I) Classification of pixels as foreground or background -> fluorescence intensities are calculated for each spot as measure of transcript abundance
Production of a spot mask : set of foreground pixels for each spot
Segmentation (II): Segmentation (II) Segmentation methods :
Fixed circle segmentation
Adaptive circle segmentation
Adaptive shape segmentation
Histogram segmentation
Fixed circle segmentation: Fixed circle segmentation Fits a circle with a constant diameter to all spots in the image
Easy to implement
The spots need to be of the same shape and size
Adaptive circle segmentation: Adaptive circle segmentation The circle diameter is estimated separately for each spot Problematic if spot exhibits oval shapes
Adaptive shape segmentation: Adaptive shape segmentation Specification of starting points or seeds Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region.
Histogram segmentation: Histogram segmentation Uses a target mask chosen to be larger than any other spot
Foreground and background intensity are determined from the histogram of pixel values for pixels within the masked area
Example : QuantArray
Background : mean between 5th and 20th percentile
Foreground : mean between 80th and 95th percentile
Unstable when a large target mask is set to compensate for variation in spot size
Information extraction: Information extraction
Spot intensity: Spot intensity The total amount of hybridization for a spot is proportional to the total fluorescence at the spot
Spot intensity = sum of pixel intensities within the spot mask
Since later calculations are based on ratios between cy5 and cy3, we compute the average* pixel value over the spot mask
*alternative : use ratios of medians instead of means
Background intensity: Background intensity Motivation : spot’s measured intensity includes a contribution of non-specific hybridization and other chemicals on the glass
Fluorescence from regions not occupied by DNA should by different from regions occupied by DNA -> could be interesting to use local negative controls (spotted DNA that should not hybridize)
Different background methods : Local background, morphological opening, constant background, no adjustment
Local background: Local background Focusing on small regions surrounding the spot mask.
Median of pixel values in this region
Most software package implement such an approach By not considering the pixels immediately surrounding the spots, the background estimate is less sensitive to the performance of the segmentation procedure
Morphological opening (spot): Morphological opening (spot) Applied to the original images R and G Use a square structuring element with side length at least twice as large as the spot separation distance Remove all the spots and generate an image that is an estimate of the background for the entire slide For individual spots, the background is estimated by sampling this background image at the nominal center of the spot Lower background estimate and less variable
Constant background: Constant background Global method which subtracts a constant background for all spots
Some findings suggests that the binding of fluorescent dyes to ‘negative control spots’ is lower than the binding to the glass slide
-> More meaningful to estimate background based on a set of negative control spots
If no negative control spots : approximation of the average background = third percentile of all the spot foreground values
No adjustment: No adjustment Do not consider the background
Quality measures (-> Flag): Quality measures (-> Flag) How good are foreground and background measurements ?
Variability measures in pixel values within each spot mask
Spot size
Circularity measure
Relative signal to background intensity
b-value : fraction of background intensities less than the median foreground intensity
p-score : extend to which the position of a spot deviates from a rigid rectangular grid
Based on these measurements, one can flag a spot
Summary: Summary The choice of background correction method has a larger impact on the log-intensity ratios than the segmentation method used
The morphological opening method provides a better estimate of background than other methods
Low within- and between-slide variability of the log2 R/G
Background adjustment has a larger impact on low intensity spots
Farmer group in Lausanne: Farmer group in Lausanne Using Imagene 4.2
To avoid great variability of the ratios around the low-intensity spots, we use a cut-off value for the intensity minus background values (e.g.. 1000). Lost of information, but no bad information !
References: References Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2001), ‘Comparisons of methods for image analysis on cDNA microarray data’. Technical report #584, Department of Statistics, University of California, Berkeley.
Yang, Y. H., Buckley, M. J. and Speed, T. P. (2001), ‘Analysis of cDNA microarray images’. Briefings in bioinformatics, 2 (4), 341-349.
Imagene demo: Imagene demo