Analysis of Microarray Gene Expression Data
Author: |
Mei-Ling Ting Lee |
| Publisher: |
Kluwer |
| Copyright: |
2004 |
| ISBN-13: |
978-0-7923-7087-1 |
| Pages: |
371; hardcover |
| Price: |
$94.50 |
|
|
|
|
Comment from the Stata technical group
Microarrays are ordered sets of DNA molecules of known sequence. Usually
rectangular in shape, they can consist of thousands of genes per sample.
Microarray technology allows the simultaneous measurement of gene
expressions for the thousands of genes in each sample. The resulting data
can then be used in various ways, such as in diagnosing tumors, drug-effect
profiling, and grouping genes with similar expression patterns in order to
identify genes that contribute to common functions.
Microarray technology results in datasets with a small-to-moderate number of
observations on (potentially) thousands of variables. This is opposite of
the way data are usually collected, so this book can help you identify
statistical methods that are appropriate (and which methods need to be
modified to be made appropriate). Because thousands of gene expressions are
measured simultaneously, the data are inherently noisy. This text also
describes ways to minimize noise and experimental variation.
The statistical analysis methods covered in this text include
transformations to normality, ANOVA and basic design, multiple testing,
permutation tests, mixture models, power and sample size, cluster analysis
and other multivariate methods, and advanced topics, such as neural
networks.
Table of contents
List of Figures
List of Tables
Preface
Part I Genome probing using microarrays
1 Introduction
2 DNA, RNA, proteins, and gene expression
2.1 The Molecules of Life
2.2 Genes
2.3 DNA
2.4 RNA
2.5 The Genetic Code
2.6 Proteins
2.7 Gene Expression and Microarrays
2.8 Complementary DNA (cDNA)
2.9 Nucleic Acid Hybridization
3 Microarray technology
3.1 Transcriptional Profiling
3.1.1 Sequencint-based Transcriptional Profiling
3.1.2 Hybridization-based Transcriptional Profiling
3.2 Microarray Technological Platforms
3.3 Probe Selection and Synthesis
3.4 Array Manufacturing
3.5 Target Labeling
3.6 Hybridization
3.7 Scanning and Image Analysis
3.8 Microarray Data
3.8.1 Spotted Array Data
3.8.2 In-situ Oligonucleotide Array Data
3.9 So I Have My Microarray Data—What’s Next?
3.9.1 Confirming Microarray Results
3.9.2 Northern Blot Analysis
3.9.3 Reverse-transcription PCR and Quantitative
Real-time RT-PCR
4 Inherent variability in array data
4.1 Genetic Populations
4.2 Variability in Gene Expression Levels
4.2.1 Variability Due to Specimen Sampling
4.2.2 Variability Due to Cell Cycle Regulation
4.2.3 Experimental Variability
4.3 Test the Variability by Replication
4.3.1 Duplicated Spots
4.3.2 Multiple Arrays and Biological Replications
5 Background noise
5.1 Pixel-by-pixel Analysis of Individual Spots
5.2 General Models for Background Noise
5.2.1 Additive Background Noise
5.2.2 Correction for Background Noise
5.2.3 Example: Replication Test Data Set
5.2.4 Noise Models for GeneChip Arrays
5.2.5 Elusive Nature of Background Noise
6 Transformation and normalization
6.1 Data Transformations
6.1.1 Logarithmic Transformation
6.1.2 Square Root Transformation
6.1.3 Box–Cox Transformation Family
6.1.4 Affine Transformation
6.1.5 The Generalized-log Transformation
6.2 Data Normalization
6.2.1 Normalization Across G Genes
6.2.2 Example: Mouse Juvenile Cystic Kidney Data Set
6.2.3 Normalization Across G Genese and N Samples
6.2.4 Color Effects and MA Plots
6.2.5 Normalization Based on LOWESS Function
6.2.6 Normalization Based on Rank-invariant Genes
6.2.7 Normalization baed on a Sample Pool
6.2.8 Global Normalization Using ANOVA Models
6.2.9 Other Normalization Issues
7 Missing values in array data using microarrays
7.1 Missing Values in Array Data
7.1.1 Sources of Problem
7.2 Statistical Classification of Missing Data
7.3 Missing Values in Replicated Designs
7.4 Imputation of Missing Values
8 Saturated intensity readings
8.1 Saturated Intensity Readings
8.2 Multiple Power-levels for Spotted Arrays
8.2.1 Imputing Saturated Intensity Readings
8.3 High Intensities in Oligonucleotide Arrays
Part II Statistical Models and analysis
9 Experimental design
9.1 Factors Involved in Experiments
9.2 Types of Design Structures
9.3 Common Practice in Microarray
9.3.1 Reference Design
9.3.2 Time-course Experiment
9.3.3 Color Reversal
9.3.4 Loop Design
9.3.5 Example: Time-course Loop Design
10 ANOVA models for microarray data
10.1 A Basic Log-linear Model
10.2 ANOVA with Multiple Factors
10.2.1 Main Effects
10.2.2 Interaction Effects
10.3 A Generic Fixed-Effects ANOVA Model
10.3.1 Estimation for Interaction Effects
10.4 Two-stage Estimation Procedures
Example
10.5 Identifying Differentially Expressed Genes
10.5.1 Standard MSE-based Approach
10.5.2 Other Approaches
10.5.3 Modified MSE-based Approach
10.6 Mixed-effects Models
10.7 ANOVA for Splot-plot Design
10.8 Log Intensity Versus Log Ratio
11 Multiple testing in microarray studies
11.1 Hypothesis Testing for Any Individual Gene
11.2 Multiple Testing for the Entire Gene Set
11.2.1 Framework for Multiple Testing
11.2.2 Test Statistic for Each Gene
11.2.3 Two Error Control Criteria in Multiple Testing
11.2.4 Implementation Algorithms
11.2.5 Example of Multiple Testing Algorithms
11.2.6 Concluding Remarks
12 Permutation tests in microarray data
12.1 Basci Concepts
12.2 Permutation Tests in Microarray Studies
12.2.1 Exchangeability in Microarray Designs
12.2.2 Limitation of Having Few Permutations
12.2.3 Pooling Test Results Across Genes
12.3 Lipopolysaccharide- E. colo Data Set
12.3.1 Statistical Model
12.3.2 Permutation Testing and Results
13 Bayesian methods for microarray data
13.1 Mixture Model for Gene Expression
13.1.1 Variations on the Mixture Model
13.1.2 Example of Gamma Models
13.2 Mixture Model for Differential Expression
13.2.1 Mixture Model for Color Ratio Data
13.2.2 Relation of Mixture Model to ANOVA model
13.2.3 Bayes Interpretation of Mixture Model
13.3 Empirical Bayes Methods
Example of Empirical Bayes Fitting
13.4 Hierarchical Bayes Models
13.4.1 Example of Hierarchical Modeling
14 Power and sample size considerations
14.1 Test Hypotheses in Microarray Studies
14.2 Distributions of Estimated Differential Espression
14.3 Summary Measures of Estimated Differential Expression
14.4 Multiple Testing Framework
14.5 Dependencies of Estimation Errors
14.6 Familywise Type I Error Control
14.6.1 Type I Error Control: the Sidak Approach
14.6.2 Type I Error Control: the Bonferroni Approach
14.7 Familywise Type II Error Control
14.7.1 Type II Error Control: the Sidak Approach
14.7.1 Type II Error Control: the Bonferroni Approach
14.8 Contrast of Planning and Implementation in Multiple Testing
14.9 Power Calculations for Different Summary Measures
14.9.1 Designs with Linear Summary Measure
14.9.2 Numerical Example for Linear Summary
14.9.3 Designs with Quadratic Summary Measure
14.9.4 Numerical Example for Quadratic Summary
14.10 A Bayesian Perspective on Power and Sample Size
14.10.1 Connection to Local Discovery Rates
14.10.2 Representative Local True Discovery
14.10.3 Numerical Example for TDR and FDR
14.11 Applications to Standard Designs
14.11.1 Treatment-control Designs
14.11.2 Sample Size for Treatment-control Design
14.11.3 Multiple-treatment Designs
14.11.4 Power Table for a Multiple-treatment Design
14.11.5 Time-course and Similar Multiple-treatment Designs
14.12 Relation Between Power, Replcation and Design
14.12.1 Effects of Replication
14.12.2 Controlling Sources of Variability
14.13 Assessing Power from Microarray Pilot Studies
14.13.1 Example 1: Juvenile Cystic Kidney Disease
14.13.2 Example 2: Opioid Dependence
Part III Unsupervised exploratory analysis
15 Cluster analysis
15.1 Distance and Similarity Measures
15.2 Distance Measures
15.2.1 Properties of Distance Measures
15.2.2 Minkowski Distance Measures
15.2.3 Mahalanobis Distance
15.3
15.3.1 Inner Product
15.3.2 Pearson Correlation Coefficient
15.3.3 Spearman Rank Correlation Coefficient
15.4 Inter-cluster Distance
15.4.1 Mahalanobis Inter-cluster Distance
15.4.1 Neighbor-based Inter-cluster Distance
15.5 Hierarchical Clustering
15.5.1 Single Linkage Method
15.5.2 Complete Linkage Method
15.5.3 Average Linkage Method
15.5.4 Centroid Linkage Method
15.5.5 Median LInkage Clustering
15.5.6 Ward’s Clustering Method
15.5.7 Applications
15.5.8 Comparisons of Clustering Algorithms
15.6 K-means Clustering
15.7 Bayesian Cluster Analysis
15.8 Two-way Clustering Methods
15.9 Reliability of Clustering Patterns for Microarray Data
16 Principal components and singular value decomposition
16.1 Principal Component Analysis
16.1.1 Applications of Dominant Principal Components
16.2 Singular-value Decomposition
16.3 Computational Procedures for SVD
16.4 Eigengenes and Eigenarrays
16.5 Fraction of Eigenexpression
16.6 Generalized Singular Value Decomposition
16.7 Robust Singular Value Decomposition
17 Self-organizing maps
17.1 The Basic Logit of a SOM
17.2 The SOM Updating Algorithm
17.3 Program GENECLUSTER
17.4 Supervised SOM
17.5 Applications
Using SOM to Cluster Genes
Using SOM to Cluster Tumors
Multiclass Cancer Diagnosis
Part IV Supervised learning methods
18 Discrimination and classification
18.1 Fisher’s Linear Discriminant Analysis
18.2 Maximum Likelihood Discriminant Rules
18.3 Bayesian Classification
18.4 k-Nearest Neighbor Classifier
18.5 Neighborhood Analysis
18.6 A Gene-casting Weighted Voting Scheme
18.7 Example: Classification of Leukemia Samples
19 Artificial neural networks
19.1 Single-layer Neural Networks
19.1.1 Separating Hyperplanes
19.1.2 Class Labels
19.1.3 Decision Rules
19.1.4 Risk Functions
19.1.5 Gradient Descent Procedures
19.1.6 Rosenblatt’s Perception Method
19.2 General Structure of Multilayer Neural Networks
19.3 Training a Multilayer Neural Network
19.3.1 Sigmoid Functions
19.3.2 Mathematical Formulation
19.3.3 Training Algorithm
19.3.4 Discussion
19.4 Cancer Classifications Using Neural Networks
20 Support vector machines
20.1 Geometric Margins for Linearly Separable Groups
20.2 Convex Optimization in the Dual Space
20.3 Support Vectors
20.4 Linearly Nonseparable Groups
20.5 Nonlinear Separating Boundary
20.5.1 Kernel Functions
20.5.2 Kernels Defined by Symmetric Functions
20.5.3 Use of SVM for Classifying Genes
20.6 Examples
20.6.1 Functional Classification of Genes
20.6.2 SVM and One-versus All Classfication Scheme
Appendices
Glossary of Notation
Author Index
Topic Index
|