Identification and analysis of multigene families by comparison of exon fingerprints

Identification and analysis of multigene families by comparison of exon fingerprints

0.00 Avg rating0 Votes
Article ID: iaor1997136
Country: United States
Volume: 249
Issue: 2
Start Page Number: 342
End Page Number: 359
Publication Date: Apr 1995
Journal: Journal of Molecular Biology
Authors: , , , ,
Keywords: programming: dynamic
Abstract:

Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 53 exons completes in just over one minute. The short ‘sequence’ length of exon fingerprints in comparison is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX perfroms better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.

Reviews

Required fields are marked *. Your email address will not be published.