Tomovic, Andrija. Computational analysis of promoters and DNA-protein interactions. 2009, Doctoral Thesis, University of Basel, Faculty of Science.
|
PDF
4Mb |
Official URL: http://edoc.unibas.ch/diss/DissB_8870
Downloads: Statistics Overview
Abstract
The investigation of promoter activity and DNA-protein interactions is very important for
understanding many crucial cellular processes, including transcription, recombination and
replication. Promoter activity and DNA-protein interactions can be studied in the lab (in
vitro or in vivo) or using computational methods (in silico). Computational approaches
for analysing promoters and DNA-protein interactions have become more powerful as
more and more complete genome sequences, 3D structural data, and high-throughput data
(such as ChIP-chip and expression data) have become available. Modern scientific
research into promoters and DNA-protein interactions represents a high level of cooperation
between computational and laboratorial methods.
This thesis covers several aspects of the computational analysis of promoters and DNAprotein
interactions: analysis of transcription factor binding sites (investigating position
dependencies in transcription factor binding sties); computational prediction of
transcription factor binding sites (a new scanning method for the in silico prediction of
transcription factor binding sites is described); computational analysis of crystal
structures of DNA-protein interactions (multiple proteins bound to DNA); and
computational predictions of transcription factor co-operations (investigating
dependencies between transcription factors in human, mouse and rat genomes, and a new
method of in silico prediction of cis-regulatory motifs and transcription start sites is
described). In addition, this thesis reports how one statistical method for the analysis of
transcription factor binding sites can be used for estimating the quality of multiple
sequence alignments.
The main finding reported in this thesis is that it is wrong to assume, a priori, that
positions in transcription factor binding sites are all either independent or dependent on
one another. Position dependencies should be tested using rigorous statistical methods on
a case-by-case basis. When dependencies are detected, they can be modelled in a very
simple way, which doesn’t require complex mathematical tools with a lot of parameters
and more data. An example of such a model, including a web-based implementation of
the algorithm, is reported in this thesis. It has also been shown that the conformational energy (indirect readout) of DNA in complexes with transcription factors which have
dependent positions in their binding sites is significant ly higher than in those with
transcription factors which do not have dependent positions in their binding sites.
The structural analysis of multiple protein-DNA interactions showed that the formation
of interactions between multiple proteins and DNA results in a decrease in proteinprotein
affinity and an increase in protein-DNA affinity, with a net gain in overall
stability of complexes where multiple proteins are bound to DNA. This effect is clearly
important for modelling transcription factor co-operativity. In addition, the physical
overlap of two factors does not simply relate to the region on the DNA where the binding
site is found. Two factors may lie very close together but possibly not physically overlap
because their side-chains can interlink with one another. In this way, it is possible to find
a large overlap between two transcription factor binding sites, but from a 3D perspective
it is still possible for both factors to bind simultaneously. It may also be that one
transcription factor binds to the minor and another to the major groove of DNA. That
information is also useful for modelling transcription factor co-operativity.
Moreover, this thesis reports the results from a computational prediction of dependencies
(co-operativities) between transcription factors which usually act together in gene
regulation in human, mouse and rat genomes. It is shown that that the computational
analysis of transcription factor site dependencies is a valuable complement to
experimental approaches for discovering transcription regulatory interactions and
networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions. Finally, it has been
demonstrated that modelling transcription factor co-operativities improves the quality of
transcription start site predictions. For three genes (ctmp, gap-43 and ngfrap) in-vivo
validation of the predicted transcription start sites is performed.
Finally, the Bayesian method for the detection of dependencies between positions in
transcription factor binding sites can easily be converted into a method for estimating the
quality of multiple sequence alignments. That method is simple, linear complexity, which
is easy to implement and which performs better than other state-of-the-art methods which
are more complex.
understanding many crucial cellular processes, including transcription, recombination and
replication. Promoter activity and DNA-protein interactions can be studied in the lab (in
vitro or in vivo) or using computational methods (in silico). Computational approaches
for analysing promoters and DNA-protein interactions have become more powerful as
more and more complete genome sequences, 3D structural data, and high-throughput data
(such as ChIP-chip and expression data) have become available. Modern scientific
research into promoters and DNA-protein interactions represents a high level of cooperation
between computational and laboratorial methods.
This thesis covers several aspects of the computational analysis of promoters and DNAprotein
interactions: analysis of transcription factor binding sites (investigating position
dependencies in transcription factor binding sties); computational prediction of
transcription factor binding sites (a new scanning method for the in silico prediction of
transcription factor binding sites is described); computational analysis of crystal
structures of DNA-protein interactions (multiple proteins bound to DNA); and
computational predictions of transcription factor co-operations (investigating
dependencies between transcription factors in human, mouse and rat genomes, and a new
method of in silico prediction of cis-regulatory motifs and transcription start sites is
described). In addition, this thesis reports how one statistical method for the analysis of
transcription factor binding sites can be used for estimating the quality of multiple
sequence alignments.
The main finding reported in this thesis is that it is wrong to assume, a priori, that
positions in transcription factor binding sites are all either independent or dependent on
one another. Position dependencies should be tested using rigorous statistical methods on
a case-by-case basis. When dependencies are detected, they can be modelled in a very
simple way, which doesn’t require complex mathematical tools with a lot of parameters
and more data. An example of such a model, including a web-based implementation of
the algorithm, is reported in this thesis. It has also been shown that the conformational energy (indirect readout) of DNA in complexes with transcription factors which have
dependent positions in their binding sites is significant ly higher than in those with
transcription factors which do not have dependent positions in their binding sites.
The structural analysis of multiple protein-DNA interactions showed that the formation
of interactions between multiple proteins and DNA results in a decrease in proteinprotein
affinity and an increase in protein-DNA affinity, with a net gain in overall
stability of complexes where multiple proteins are bound to DNA. This effect is clearly
important for modelling transcription factor co-operativity. In addition, the physical
overlap of two factors does not simply relate to the region on the DNA where the binding
site is found. Two factors may lie very close together but possibly not physically overlap
because their side-chains can interlink with one another. In this way, it is possible to find
a large overlap between two transcription factor binding sites, but from a 3D perspective
it is still possible for both factors to bind simultaneously. It may also be that one
transcription factor binds to the minor and another to the major groove of DNA. That
information is also useful for modelling transcription factor co-operativity.
Moreover, this thesis reports the results from a computational prediction of dependencies
(co-operativities) between transcription factors which usually act together in gene
regulation in human, mouse and rat genomes. It is shown that that the computational
analysis of transcription factor site dependencies is a valuable complement to
experimental approaches for discovering transcription regulatory interactions and
networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions. Finally, it has been
demonstrated that modelling transcription factor co-operativities improves the quality of
transcription start site predictions. For three genes (ctmp, gap-43 and ngfrap) in-vivo
validation of the predicted transcription start sites is performed.
Finally, the Bayesian method for the detection of dependencies between positions in
transcription factor binding sites can easily be converted into a method for estimating the
quality of multiple sequence alignments. That method is simple, linear complexity, which
is easy to implement and which performs better than other state-of-the-art methods which
are more complex.
Advisors: | Engel, Andreas |
---|---|
Committee Members: | Schwede, Torsten and Matthias, Patrick D. |
Faculties and Departments: | 05 Faculty of Science > Departement Biozentrum > Former Organization Units Biozentrum > Structural Biology (Engel) |
UniBasel Contributors: | Schwede, Torsten |
Item Type: | Thesis |
Thesis Subtype: | Doctoral Thesis |
Thesis no: | 8870 |
Thesis status: | Complete |
Number of Pages: | 111 Bl. |
Language: | English |
Identification Number: |
|
edoc DOI: | |
Last Modified: | 02 Aug 2021 15:07 |
Deposited On: | 30 Apr 2010 09:15 |
Repository Staff Only: item control page