An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants

An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants

0.00 Avg rating0 Votes
Article ID: iaor2012420
Volume: 62
Issue: 3
Start Page Number: 951
End Page Number: 981
Publication Date: Apr 2012
Journal: Algorithmica
Authors: , ,
Keywords: combinatorial analysis, simulation: applications
Abstract:

Combinatorial (or rule‐based) methods for inferring haplotypes from genotypes on a pedigree have been studied extensively in the recent literature. These methods generally try to reconstruct the haplotypes of each individual so that the total number of recombinants is minimized in the pedigree. The problem is NP‐hard, although it is known that the number of recombinants in a practical dataset is usually very small. In this paper, we consider the question of how to efficiently infer haplotypes on a large pedigree when the number of recombinants is bounded by a small constant, i.e. the so called k‐recombinant haplotype configuration (k‐RHC) problem. We introduce a simple probabilistic model for k‐RHC where the prior haplotype probability of a founder and the haplotype transmission probability from a parent to a child are all assumed to follow the uniform distribution and k random recombination events are assumed to have taken place uniformly and independently in the pedigree. We present an O(mnlog k+1 n) time algorithm for k‐RHC on tree pedigrees without mating loops, where m is the number of loci and n is the size of the input pedigree, and prove that when 90 log n <m <n 3, the algorithm can correctly find a feasible haplotype configuration that obeys the Mendelian law of inheritance and requires no more than k recombinants with probability 1 O ( k 2 log 2 n mn + 1 n 2 ) equ1 . The algorithm is efficient when k is of a moderate value and could thus be used to infer haplotypes from genotypes on large tree pedigrees efficiently in practice. We have implemented the algorithm as a C++ program named Treek‐RHC. The implementation incorporates several ideas for dealing with missing data and data with a large number of recombinants effectively. Our experimental results on both simulated and real datasets show that Treek‐RHC can reconstruct haplotypes with a high accuracy and is much faster than the best combinatorial method in the literature.

Reviews

Required fields are marked *. Your email address will not be published.