Active Learning Over Multiple Domains in Natural Language Tasks

Longpre, Shayne; Reisler, Julia; Huang, Edward Greg; Lu, Yi; Frank, Andrew; Ramesh, Nikhil; DuBois, Chris

Computer Science > Computation and Language

arXiv:2202.00254v1 (cs)

[Submitted on 1 Feb 2022 (this version), latest version 8 Feb 2022 (v2)]

Title:Active Learning Over Multiple Domains in Natural Language Tasks

Authors:Shayne Longpre, Julia Reisler, Edward Greg Huang, Yi Lu, Andrew Frank, Nikhil Ramesh, Chris DuBois

View PDF

Abstract:Studies of active learning traditionally assume the target and source data stem from a single domain. However, in realistic applications, practitioners often require active learning with multiple sources of out-of-distribution data, where it is unclear a priori which data sources will help or hurt the target domain. We survey a wide variety of techniques in active learning (AL), domain shift detection (DS), and multi-domain sampling to examine this challenging setting for question answering and sentiment analysis. We ask (1) what family of methods are effective for this task? And, (2) what properties of selected examples and domains achieve strong results? Among 18 acquisition functions from 4 families of methods, we find H- Divergence methods, and particularly our proposed variant DAL-E, yield effective results, averaging 2-3% improvements over the random baseline. We also show the importance of a diverse allocation of domains, as well as room-for-improvement of existing methods on both domain and example selection. Our findings yield the first comprehensive analysis of both existing and novel methods for practitioners faced with multi-domain active learning for natural language tasks.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2202.00254 [cs.CL]
	(or arXiv:2202.00254v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2202.00254

Submission history

From: Edward Greg Huang [view email]
[v1] Tue, 1 Feb 2022 07:27:18 UTC (2,881 KB)
[v2] Tue, 8 Feb 2022 08:01:59 UTC (2,881 KB)

Computer Science > Computation and Language

Title:Active Learning Over Multiple Domains in Natural Language Tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Active Learning Over Multiple Domains in Natural Language Tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators