Presentation Transcript
Gene expression patterns of breast cancer phenotype revealed by molecular profiling : Gene expression patterns of breast cancer phenotype revealed by molecular profiling Gabriela Alexe, IBM Research
DIMACS Workshop on Detecting and Processing Regularities in High Throughput Biological Data
June 20 - 22, 2005
Slide2: Peter L Hammer
Sorin Alexe David E Axelrod
Endre Boros
Gyan Bhanot
Jorge Lepre
Gustavo Stolovitzky Ram Ramaswamy
Lillian Chiang
Babu Vengatharagavan
Arnold J Levine
Michael Reiss
Outline: Outline Motivation
Finding relevant molecular profiles for breast cancer
Consensus clustering
Multi-gene biomarker selection
Robust pattern-based diagnosis models
Future work
Breast cancer incidence: Breast cancer incidence most commonly diagnosed cancer after nonmelanoma skin cancer
second leading cause of cancer deaths after lung cancer.
US 2005: estimated 213,000 new BCA cases will be diagnosed, and 41,000 deaths / 1.2 million worldwide
1/8 chance to develop BCA
1/33 chance of death
5-10% hereditary
Breast cancer: extensive heterogeneous disease: Breast cancer: extensive heterogeneous disease both genetic (5-10% BRCA1/2) and non-genetic
highly variable with regard to pathological and clinical features at molecular level
pathological and molecular heterogeneity among
different breast cancers
different areas within individual neoplasms
personalized treatment: genuine need to identify parameters that might
accurately predict the effectiveness of treatment
Stages of breast cancer: Stages of breast cancer
Histology: Histology Hormone receptor status
ER +/-, PR+/-, HER2neu+/-
DNA Cytometry 2/3 aneuploid (less DNA) / diploid
Image Cytometry
S-phase
Genetic mutations
Similar histopathological appearance BCA may have divergent clinical and prognostical course
Major need to develop specific and alternative therapies
Molecular profiling of BCA : Molecular profiling of BCA
Measurement of global expression patterns towards identification of individual genes that mediate particular aspects of cellular physiology
DNA microarrays
systematic method to study the mRNA variation between cancer/healthy cells
identification of clinically relevant tumor entities and subclasses
prognostic biomarkers / pathways/ potential therapeutic targets
Molecular profiling of BCA: Molecular profiling of BCA
Perou et al. Nature 2000
Molecular portraits of human breast tumours
genome-www.stanford.edu/breast_cancer
identified multiple tumor classes which differ in expression of the ER
Luminal A
Luminal B
ERBB+
Basal
Normal
Biomedical data: Biomedical data Sorlie et al., PNAS 2003 Breast cancer data (Stanford & Norway)
cDNA gene expression data
122 breast cancer samples
552 “intrinsic genes”
Hierarchical clustering
5 major subgroups of samples / genes
Used same techniques to validate findings on external datasets (van’t Veer, West)
Biomedical problem: Biomedical problem Sources of noise
- data measurements: experimental noise, 7% missing data
- data analysis techniques: hierarchical clustering sensitive to data perturbations
- selection of biomarkers: dependent on chip / data analysis technique
Goal
Robust approach to assess molecular profiles
Methods: Preprocessing data: Methods: Preprocessing data Stochastic kNN imputation method
similar to kNN imputation (Troyanskaya et al, 2001)
Dynamic programming: ensemble of imputations
530 genes, 118 samples
Consensus clustering: Consensus clustering Assesses the stability of hierarchical clustering across multiple perturbations of the data by simulated stratified re-sampling of 80% of the cases (Monti et al., 2003)
Implemented in GenePattern ttp://www.broad.mit.edu/cancer/software/genepattern/
Consensus (core) clusters: maximal bicliques in agreement matrix (incremental polynomial alg, 2004)
Agreement matrix: Agreement matrix
Finding multi-gene biomarkersLogical Analysis of Data, Hammer 1988: Finding multi-gene biomarkers Logical Analysis of Data, Hammer 1988
Discretization (noise reduction)
Pattern extraction (efficient algorithms, 2004)
Model construction (weighted voting)
Validation
Additional information (prominent classes, important features)
Applied to various biomedical datasets
Slide17: Patterns, Models, Classifiers Positive Patterns Negative Patterns Model
Slide18: P N
Examples of patterns: Examples of patterns
Multi-gene biomarkers: Multi-gene biomarkers E.g., Combinations of genes highly predictive of phenotype,
not identified in Sorlie et al.
Luminal A: 10
Luminal B: 9
ERBB+: 9
Basal: 12
Normal: 12
Extensive multi-gene biomarker annotations: Extensive multi-gene biomarker annotations
BIOCARTA, KEGG, DAVID, GENMAPP,
GOMINER, PANTHER, I-HOP
Pattern-based diagnosis model: Pattern-based diagnosis model Prediction Classification
Validation: Validation Classification accuracy of pattern models through
leave-one-out cross validation experiments
Conclusions and Future work: Conclusions and Future work Provide a robust classification which has significant overlap with previous analyses
Clusters Luminal B and ERBB+ unreliable – need further analyses
Sample reproducibility
Validate on novel external BCA gene expression datasets
Slide25:
Thank you for your attention