| Article ID: | iaor20003514 |
| Country: | United States |
| Volume: | 183 |
| Issue: | 2 |
| Start Page Number: | 195 |
| End Page Number: | 205 |
| Publication Date: | Nov 1996 |
| Journal: | Journal of Theoretical Biology |
| Authors: | Stanfel Larry E. |
| Keywords: | programming: integer, statistics: multivariate |
Each amino acid is represented by a vector of numerical measurements for the attributes of volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. Inter-residue distances are then calculated according to common metrics, and we introduce a new clustering objective function derived from information theoretic considerations. The argument of the function are the inter-object distances of the things to be clustered: in this case the amino acids. By means of approximating the solution of a integer programming problem, then, the residues are partitioned into clusters. The clusters obtained are compared with groups obtained in substitution/mutation studies and found to be similar. Thus, probably the strongest and most objective evidence to date is supplied for believing that physico-chemical properties account for the viability of substitutions and that the important similarities/differences are explained by a relatively small and simple set of properties.