Identification of Signal, Noise, and Indistinguishable Subsets in High-Dimensional Data Analysis

Jeng, X. Jessie

Abstract:Motivated by applications in high-dimensional data analysis where strong signals often stand out easily and weak ones may be indistinguishable from the noise, we develop a statistical framework to provide a novel categorization of the data into the signal, noise, and indistinguishable subsets. The three-subset categorization is especially relevant under high-dimensionality as a large proportion of signals can be obscured by the large amount of noise. Understanding the three-subset phenomenon is important for the researchers in real applications to design efficient follow-up studies. %For example, candidates belonging to the signal subset may have priority for more focused study, while those in the noise subset can be removed; and, for candidates in the indistinguishable subset, additional data may be collected to further separate weak signals from the noise. We develop an efficient data-driven procedure to identify the three subsets. Theoretical study shows that, under certain conditions, only signals are included in the identified signal subset while the remaining signals are included in the identified indistinguishable subsets with high probability. Moreover, the proposed procedure adapts to the unknown signal intensity, so that the identified indistinguishable subset shrinks with the true indistinguishable subset when signals become stronger. The procedure is examined and compared with methods based on FDR control using Monte Carlo simulation. Further, it is applied successfully in a real-data application to identify genomic variants having different signal intensity.

Comments:	30 pages
Subjects:	Methodology (stat.ME)
Cite as:	arXiv:1305.0220 [stat.ME]
	(or arXiv:1305.0220v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.1305.0220

Statistics > Methodology

Title:Identification of Signal, Noise, and Indistinguishable Subsets in High-Dimensional Data Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators