An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Arumae, Kristjan; Sun, Qing; Bhatia, Parminder

Computer Science > Computation and Language

arXiv:2010.00784 (cs)

[Submitted on 1 Oct 2020]

Title:An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Authors:Kristjan Arumae, Qing Sun, Parminder Bhatia

View PDF

Abstract:Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.

Comments:	arXiv admin note: text overlap with arXiv:2004.03794
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.00784 [cs.CL]
	(or arXiv:2010.00784v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.00784

Submission history

From: Kristjan Arumae [view email]
[v1] Thu, 1 Oct 2020 09:20:18 UTC (1,666 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kristjan Arumae
Qing Sun
Parminder Bhatia

export BibTeX citation

Computer Science > Computation and Language

Title:An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators