Event Title

A Recursive Formulation for a Rank Sum Statistic Used to Detect Genomic Copy Number Variation

Presentation Type

Presentation

Location

Schimmel/Conrades Science Center 180

Start Date

20-4-2016 4:15 PM

End Date

20-4-2016 4:35 PM

Disciplines

Genomics

Abstract

Copy number variation (CNV) results from duplications and deletions of genomic DNA. Since CNVs were found to correlate with a number of genetic diseases, detecting and characterizing CNV is a major goal of genetic research. Recently, a rank-based method has been developed to analyze raw CNV.

This method involves a rank comparison of a sample DNA across multiple DNA sections, against multiple controls. The overall CNV of the sample is then determined by a statistical comparison of the sample's Rank-Sum against the discrete null distribution. As such, the accuracy of this method depends, to a large degree, on an accurate representation of the null distribution. So far, the exact null distribution has only been approximated using the continuous Irwin-Hall distribution.

This study includes the rigorous proof of several recursive formulations for the weights of the random Rank-Sum statistic. Unexpectedly, these recursive formulae give the generalized form of the binomial coefficients. The descriptive statistics of the exact null distribution are also derived.

The approximated Irwin-Hall distribution is compared to the exact null distribution, from which it is shown to underestimate the standard deviation and overestimate the kurtosis. Using data simulations, the approximated Irwin-Hall distribution also increases the likelihood of type I error (false positive) and gives an overstatement of the test power. Hence, the use of these recursive formulae improves the ability of this rank-based method to detect CNV.

Faculty Mentor

Craig Jackson

 
Apr 20th, 4:15 PM Apr 20th, 4:35 PM

A Recursive Formulation for a Rank Sum Statistic Used to Detect Genomic Copy Number Variation

Schimmel/Conrades Science Center 180

Copy number variation (CNV) results from duplications and deletions of genomic DNA. Since CNVs were found to correlate with a number of genetic diseases, detecting and characterizing CNV is a major goal of genetic research. Recently, a rank-based method has been developed to analyze raw CNV.

This method involves a rank comparison of a sample DNA across multiple DNA sections, against multiple controls. The overall CNV of the sample is then determined by a statistical comparison of the sample's Rank-Sum against the discrete null distribution. As such, the accuracy of this method depends, to a large degree, on an accurate representation of the null distribution. So far, the exact null distribution has only been approximated using the continuous Irwin-Hall distribution.

This study includes the rigorous proof of several recursive formulations for the weights of the random Rank-Sum statistic. Unexpectedly, these recursive formulae give the generalized form of the binomial coefficients. The descriptive statistics of the exact null distribution are also derived.

The approximated Irwin-Hall distribution is compared to the exact null distribution, from which it is shown to underestimate the standard deviation and overestimate the kurtosis. Using data simulations, the approximated Irwin-Hall distribution also increases the likelihood of type I error (false positive) and gives an overstatement of the test power. Hence, the use of these recursive formulae improves the ability of this rank-based method to detect CNV.