Efficient Augmentation via Data Subsampling

Kuchnik, Michael; Smith, Virginia

Computer Science > Machine Learning

arXiv:1810.05222 (cs)

[Submitted on 11 Oct 2018 (v1), last revised 1 Mar 2019 (this version, v2)]

Title:Efficient Augmentation via Data Subsampling

Authors:Michael Kuchnik, Virginia Smith

View PDF

Abstract:Data augmentation is commonly used to encode invariances in learning methods. However, this process is often performed in an inefficient manner, as artificial examples are created by applying a number of transformations to all points in the training set. The resulting explosion of the dataset size can be an issue in terms of storage and training costs, as well as in selecting and tuning the optimal set of transformations to apply. In this work, we demonstrate that it is possible to significantly reduce the number of data points included in data augmentation while realizing the same accuracy and invariance benefits of augmenting the entire dataset. We propose a novel set of subsampling policies, based on model influence and loss, that can achieve a 90% reduction in augmentation set size while maintaining the accuracy gains of standard data augmentation.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1810.05222 [cs.LG]
	(or arXiv:1810.05222v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1810.05222

Submission history

From: Michael Kuchnik [view email]
[v1] Thu, 11 Oct 2018 19:50:08 UTC (505 KB)
[v2] Fri, 1 Mar 2019 13:23:42 UTC (638 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Michael Kuchnik
Virginia Smith

export BibTeX citation

Computer Science > Machine Learning

Title:Efficient Augmentation via Data Subsampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Efficient Augmentation via Data Subsampling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators