(2013)Quantitative analysis of time-series microarray data, with application to investigating responses to environmental stresses in arabidopsis. PhD thesis, University of Warwick.
Upon completion of formal course requirements, each student will be required to take a written and oral Qualifying Examination that will admit the student to the candidacy of the Ph.D.
microarray image processing+phd thesis ..
High-throughput technologies have made it possible to perform genome-scale analyses to
investigate a variety of research areas. From these analyses, vast amounts of potentially
noisy data is generated which could obscure the underlying signal.
In this thesis, a high-throughput regression analysis approach was developed, where a
variety of linear and nonlinear models were fitted to gene expression profiles from time
course experiments. These models included the logistic, Gompertz, exponential, critical
exponential, linear+exponential, Gaussian, and hyperbolic functions. The fitted parameters
from these models reflect aspects of the model shape, and are thus biologically
interpretable. Investigating the fitted parameters allowed for the interpretation of the
gene expression profiles in terms of the underlying biology, such as the time of initial
expression. This provides a potentially more mechanistic approach to study the genetic
responses to stimuli. This analysis was applied to three time series gene expression
experiments - a Saccharomyces cerevisiae time course as a validation of the method,
and two time course experiments on Arabidopsis thaliana investigating stress responses
to the senescence process, and pathogen infection by Botrytis cinerea.
A cluster analysis, named ShapeCluster, was developed as an application of the fitted
models. Using this analysis, it was possible to cluster on aspects of the shape of the
expression profiles using different combinations of parameters. This added flexibility to
the analysis and allowed for the investigation of the data in multiple ways. Specifically,
performing the cluster analysis on a specific parameter permitted the identification of
genes that are co-regulated, or participate in response to the biological stress in question.
Several methods of producing clusters with combinations of parameters, namely simultaneous
parameter clustering, sequential meta-clustering, and cross meta-clustering,
provided additional means of interrogating the data. Clusters from these methods were
assessed for significance through the use of over-represented annotation terms and motifs,
and found to produce biologically relevant sets of genes.
Experiments using quantitative-PCR and luciferase transcriptional reporters were designed
to determine the response to a combined Botrytis and senescence stress. A predicted
model was identified by fitting a factor model to the experimental data, and
identifying the most significant model effects. This model removed noise from the biological
data, and confirmed that the effects of the two stresses was additive.
In cross-sectional data, each sample is obtained from separate individuals (plants),
and thus may be different biological ages. An iterative, cross-validation multivariate
regression approach was developed, termed time shifting, to estimate the true biological
age of the replicate samples, and it was shown that the approach resulted in better
model fits for a large proportion of the genes.
In this thesis, a number of novel analytical approaches for obtaining information
from gene expression microarray datasets were developed. These analyses provided biologically
oriented descriptions of individual gene expression profiles, allowing for the
modelling and greater interpretation of profiles obtained from time-series experiments.
Through careful choice of appropriate models, such statistical regression approaches
allow for an improved comparison of gene expression profiles, and may provide an improved
understanding of common regulatory mechanisms between genes.