Protein Sequence and Structure Comparison based on vectorial Representations.
Technische Universität, Darmstadt
[Ph.D. Thesis], (2009)
Available under Creative Commons Attribution Non-commercial No Derivatives, 2.5.
Download (6MB) | Preview
|Item Type:||Ph.D. Thesis|
|Title:||Protein Sequence and Structure Comparison based on vectorial Representations|
Proteins are very complex physical objects consisting of thousands of atoms and hundreds of amino acids with complicated local and global interactions on length scales ranging from the microscopic neighbourhood of atoms to the macroscopic size of organisms. The spatial configuration, in spite of that, is encoded into one single character per amino acid using a twenty character alphabet, an apparent contradiction that is not fully understood to date. This thesis is concerned with problems of protein structure and the relationship of protein sequence and structure. It is tried to integrate the different approaches typically carried out by physicists in the field that investigate very simplified model systems, e.g. single helices, with the bioinformatics approach to build powerful analysis tools. The first approach often leads to oversimplified systems that do not describe native proteins as a whole, while the second can be too heuristic and too involved to answer fundamental questions. We start from defining vectorial descriptions of protein structure, similar in form to sequence descriptions, to firstly compare protein structures, i.e. to perform structure alignments, and discuss several measures for structural similarity. From these we derive a statistical structural similarity score for pairs of protein structure based on their spatial superimposition. Then we utilize a previously known ansatz to exploit the sequence to structure correlation in order to predict vectorial structure descriptions from protein sequence. These predicted profiles are then used within the same alignment framework to align protein sequences. For these alignments a basic evolutionary similarity measure between protein sequences is derived. Large part of this thesis is dedicated to the objective assessment of alignment methods including the new method presented and a number of establish programs. A commonly used measure of structural similarity, the Percentage of Structural Identity (PSI), is discussed and generalized to cover an internal degree of freedom in structure that was ignored formerly. The improvement is achieved by very simple but powerful reasoning. The resulting scheme is also applicable to detect hinges in protein structures. Concluding, we state that protein structure, despite its complexity, is indeed to a large extent one-dimensional. The unification of structure and sequence alignments under a single formalism gives some insight into the relation of sequence and structure in proteins.
|Place of Publication:||Darmstadt|
|Classification DDC:||500 Naturwissenschaften und Mathematik > 500 Naturwissenschaften
500 Naturwissenschaften und Mathematik > 530 Physik
500 Naturwissenschaften und Mathematik > 570 Biowissenschaften, Biologie
|Divisions:||05 Department of Physics > Institute for condensed matter physics|
|Date Deposited:||20 Mar 2009 10:42|
|Last Modified:||07 Dec 2012 11:55|
|Referees:||Porto, Prof. Dr. Markus and Drossel, Prof. Dr. Barbara|
|Refereed:||16 February 2009|