A Recursive Formulation for a Rank Sum Statistic Used to Detect Genomic Copy Number Variation


Nam Tran Hoang

Document Type


Publication Date



Copy number variation (CNV) results from duplications and deletions of genomic DNA. Since CNVs were found to correlate with a number of genetic diseases, detecting and characterizing CNV is a major goal of genetic research. Recently, a rank-based method has been developed to analyze raw CNV.

This method involves a rank comparison of a sample DNA across multiple DNA sections, against multiple controls. The overall CNV of the sample is then determined by a statistical comparison of the sample's Rank-Sum against the discrete null distribution. As such, the accuracy of this method depends, to a large degree, on an accurate representation of the null distribution. So far, the exact null distribution has only been approximated using the continuous Irwin-Hall distribution.

This study includes the rigorous proof of several recursive formulations for the weights of the random Rank-Sum statistic. Unexpectedly, these recursive formulae give the generalized form of the binomial coefficients. The descriptive statistics of the exact null distribution are also derived.

The approximated Irwin-Hall distribution is compared to the exact null distribution, from which it is shown to underestimate the standard deviation and overestimate the kurtosis. Using data simulations, the approximated Irwin-Hall distribution also increases the likelihood of type I error (false positive) and gives an overstatement of the test power. Hence, the use of these recursive formulae improves the ability of this rank-based method to detect CNV.

Faculty Mentor

Craig Jackson

This document is currently not available here.