Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Zhang, Thomas T. C. K.; Toso, Leonardo F.; Anderson, James; Matni, Nikolai

Statistics > Machine Learning

arXiv:2308.04428 (stat)

[Submitted on 8 Aug 2023 (v1), last revised 12 Oct 2024 (this version, v4)]

Title:Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Authors:Thomas T.C.K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

View PDF HTML (experimental)

Abstract:A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic representation learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent scheme proposed independently in Collins et al., (2021) and Nayer and Vaswani (2022), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.

Comments:	Appeared at ICLR 2024 (spotlight presentation)
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2308.04428 [stat.ML]
	(or arXiv:2308.04428v4 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2308.04428

Submission history

From: Thomas Zhang [view email]
[v1] Tue, 8 Aug 2023 17:56:20 UTC (613 KB)
[v2] Mon, 22 Jul 2024 13:36:50 UTC (877 KB)
[v3] Sat, 27 Jul 2024 13:23:20 UTC (877 KB)
[v4] Sat, 12 Oct 2024 20:17:36 UTC (877 KB)

Statistics > Machine Learning

Title:Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators