Article ID: | iaor2017510 |
Volume: | 58 |
Issue: | 4 |
Start Page Number: | 473 |
End Page Number: | 491 |
Publication Date: | Dec 2016 |
Journal: | Australian & New Zealand Journal of Statistics |
Authors: | Chen Jie, Ji Tieming |
Keywords: | statistics: general, simulation, datamining, statistics: inference |
In this ‘Big Data’ era, statisticians inevitably encounter data generated from various disciplines. In particular, advances in bio‐technology have enabled scientists to produce enormous datasets in various biological experiments. In the last two decades, we have seen high‐throughput microarray data resulting from various genomic studies. Recently, next generation sequencing (NGS) technology has been playing an important role in the study of genomic features, resulting in vast amount of NGS data. One frequent application of NGS technology is in the study of DNA copy number variants (CNVs). The resulting NGS read count data are then used by researchers to formulate their various scientific approaches to accurately detect CNVs. Computational and statistical approaches to the detection of CNVs using NGS data are, however, very limited at present. In this review paper, we will focus on read‐depth analysis in CNV detection and give a brief summary of currently used statistical analysis methods in searching for CNVs using NGS data. In addition, based on the review, we discuss the challenges we face and future research directions. The ultimate goal of this review paper is to give a timely exposition of the surveyed statistical methods to researchers in related fields.