Title

A Recursive Formulation for a Rank Sum Statistic Used to Detect Genomic Copy Number Variation

Authors

Nam Tran Hoang

Document Type

Article

Publication Date

4-20-2016

Abstract

Copy number variation (CNV) results from duplications and deletions of genomic DNA. Since CNVs were found to correlate with a number of genetic diseases, detecting and characterizing CNV is a major goal of genetic research. Recently, a rank-based method has been developed to analyze raw CNV.

This method involves a rank comparison of a sample DNA across multiple DNA sections, against multiple controls. The overall CNV of the sample is then determined by a statistical comparison of the sample's Rank-Sum against the discrete null distribution. As such, the accuracy of this method depends, to a large degree, on an accurate representation of the null distribution. So far, the exact null distribution has only been approximated using the continuous Irwin-Hall distribution.

This study includes the rigorous proof of several recursive formulations for the weights of the random Rank-Sum statistic. Unexpectedly, these recursive formulae give the generalized form of the binomial coefficients. The descriptive statistics of the exact null distribution are also derived.

The approximated Irwin-Hall distribution is compared to the exact null distribution, from which it is shown to underestimate the standard deviation and overestimate the kurtosis. Using data simulations, the approximated Irwin-Hall distribution also increases the likelihood of type I error (false positive) and gives an overstatement of the test power. Hence, the use of these recursive formulae improves the ability of this rank-based method to detect CNV.

Faculty Mentor

Craig Jackson

This document is currently not available here.

Share

COinS