Finite sample approximation results for principal component analysis: a matrix perturbation approach

Nadler, Boaz

doi:10.1214/08-AOS618

Mathematics > Statistics Theory

arXiv:0901.3245 (math)

[Submitted on 21 Jan 2009]

Title:Finite sample approximation results for principal component analysis: a matrix perturbation approach

Authors:Boaz Nadler

View PDF

Abstract: Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of $n$ observations (samples), each with $p$ variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size $n$, and those of the limiting population PCA as $n\to\infty$. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit $p,n\to\infty$, with $p/n=c$. We present a matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite $p,n$ where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size $n$, the eigenvector of sample PCA may exhibit a sharp "loss of tracking," suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.

Comments:	Published in at this http URL the Annals of Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)
Subjects:	Statistics Theory (math.ST)
MSC classes:	62H25, 62E17 (Primary) 15A42 (Secondary)
Report number:	IMS-AOS-AOS618
Cite as:	arXiv:0901.3245 [math.ST]
	(or arXiv:0901.3245v1 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.0901.3245
Journal reference:	Annals of Statistics 2008, Vol. 36, No. 6, 2791-2817
Related DOI:	https://doi.org/10.1214/08-AOS618

Submission history

From: Boaz Nadler [view email] [via VTEX proxy]
[v1] Wed, 21 Jan 2009 12:05:10 UTC (226 KB)

Mathematics > Statistics Theory

Title:Finite sample approximation results for principal component analysis: a matrix perturbation approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Finite sample approximation results for principal component analysis: a matrix perturbation approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators