Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Dong, Yijun; Phan, Hoang; Pan, Xiang; Lei, Qi

Computer Science > Machine Learning

arXiv:2407.06120 (cs)

[Submitted on 8 Jul 2024]

Title:Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Authors:Yijun Dong, Hoang Phan, Xiang Pan, Qi Lei

View PDF

Abstract:We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2407.06120 [cs.LG]
	(or arXiv:2407.06120v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.06120

Submission history

From: Yijun Dong [view email]
[v1] Mon, 8 Jul 2024 16:57:26 UTC (1,441 KB)

Computer Science > Machine Learning

Title:Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators