Article ID: | iaor2004525 |
Country: | United States |
Volume: | 18 |
Issue: | 8 |
Start Page Number: | 1116 |
End Page Number: | 1123 |
Publication Date: | Aug 2002 |
Journal: | Bioinformatics |
Authors: | Pupko T., Pe'er I., Hasegawa M., Graur D., Friedman N. |
Keywords: | programming: branch and bound, programming: integer |
Motivation: We developed an algorithm to reconstruct ancestral sequences, taking into account the rate variation among sites of the protein sequences. Our algorithm maximizes the joint probability of the ancestral sequences, assuming that the rate is gamma distributed among sites. Our algorithm provably finds the global maximum. The use of ‘joint’ reconstruction is motivated by studies that use the sequences at all the internal nodes in a phylogenetic tree, such as, for instance, the inference of patterns of amino-acid replacement, or tracing the biochemical changes that occurred during the evolution of a given protein family. Results: We give an algorithm that guarantees finding the global maximum. The efficient search method makes our method applicable to datasets with large number sequences. We analyze ancestral sequences of five gene families, exploring the effect of the amount of among-site rate-variation, and the degree of sequence divergence on the resulting ancestral states.