Article ID: | iaor20003514 |
Country: | United States |
Volume: | 183 |
Issue: | 2 |
Start Page Number: | 195 |
End Page Number: | 205 |
Publication Date: | Nov 1996 |
Journal: | Journal of Theoretical Biology |
Authors: | Stanfel Larry E. |
Keywords: | programming: integer, statistics: multivariate |
Each amino acid is represented by a vector of numerical measurements for the attributes of volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. Inter-residue distances are then calculated according to common metrics, and we introduce a new clustering objective function derived from information theoretic considerations. The argument of the function are the inter-object distances of the things to be clustered: in this case the amino acids. By means of approximating the solution of a integer programming problem, then, the residues are partitioned into clusters. The clusters obtained are compared with groups obtained in substitution/mutation studies and found to be similar. Thus, probably the strongest and most objective evidence to date is supplied for believing that physico-chemical properties account for the viability of substitutions and that the important similarities/differences are explained by a relatively small and simple set of properties.