Article ID: | iaor2007228 |
Country: | United States |
Volume: | 16 |
Issue: | 4 |
Start Page Number: | 360 |
End Page Number: | 370 |
Publication Date: | Sep 2004 |
Journal: | INFORMS Journal On Computing |
Authors: | Kimmel Gad, Sharan Roded, Shamir Rod |
Keywords: | heuristics |
The study of haplotypes and their diversity in a population is central to disease-association research. We study several problems arising in haplotype block partitioning. Our objective function is the total number of distinct haplotypes in blocks. We show that the problem is NP-hard when there are errors or missing data, and provide approximation algorithms for several of its variants. We also give an algorithm that solves the problem with high probability under a probabilistic model that allows noise and missing data. In addition, we study the multipopulation case, where one has to partition the haplotypes into populations and seek a different block partition in each one. We provide a heuristic for that problem and use it to analyze simulated and real data. On simulated data, our blocks resemble the true partition more than the blocks generated by the LD-based algorithm of Gabriel