Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

Zhang, Qiaoya; She, Yiyuan

Statistics > Computation

arXiv:1512.03883 (stat)

[Submitted on 12 Dec 2015 (v1), last revised 28 Jan 2016 (this version, v2)]

Title:Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

Authors:Qiaoya Zhang, Yiyuan She

View PDF

Abstract:Principal Component Analysis (PCA) is a dimension reduction technique. It produces inconsistent estimators when the dimensionality is moderate to high, which is often the problem in modern large-scale applications where algorithm scalability and model interpretability are difficult to achieve, not to mention the prevalence of missing values. While existing sparse PCA methods alleviate inconsistency, they are constrained to the Gaussian assumption of classical PCA and fail to address algorithm scalability issues. We generalize sparse PCA to the broad exponential family distributions under high-dimensional setup, with built-in treatment for missing values. Meanwhile we propose a family of iterative sparse generalized PCA (SG-PCA) algorithms such that despite the non-convexity and non-smoothness of the optimization task, the loss function decreases in every iteration. In terms of ease and intuitive parameter tuning, our sparsity-inducing regularization is far superior to the popular Lasso. Furthermore, to promote overall scalability, accelerated gradient is integrated for fast convergence, while a progressive screening technique gradually squeezes out nuisance dimensions of a large-scale problem for feasible optimization. High-dimensional simulation and real data experiments demonstrate the efficiency and efficacy of SG-PCA.

Subjects:	Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:1512.03883 [stat.CO]
	(or arXiv:1512.03883v2 [stat.CO] for this version)
	https://doi.org/10.48550/arXiv.1512.03883

Submission history

From: Yiyuan She [view email]
[v1] Sat, 12 Dec 2015 06:45:05 UTC (95 KB)
[v2] Thu, 28 Jan 2016 02:36:12 UTC (95 KB)

Statistics > Computation

Title:Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Computation

Title:Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators