We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Olsson, Fredrik; Sahlgren, Magnus

Computer Science > Computation and Language

arXiv:2110.05464 (cs)

[Submitted on 11 Oct 2021]

Title:We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Authors:Fredrik Olsson, Magnus Sahlgren

View PDF

Abstract:In this paper, we identify the state of data as being an important reason for failure in applied Natural Language Processing (NLP) projects. We argue that there is a gap between academic research in NLP and its application to problems outside academia, and that this gap is rooted in poor mutual understanding between academic researchers and their non-academic peers who seek to apply research results to their operations. To foster transfer of research results from academia to non-academic settings, and the corresponding influx of requirements back to academia, we propose a method for improving the communication between researchers and external stakeholders regarding the accessibility, validity, and utility of data based on Data Readiness Levels \cite{lawrence2017data}. While still in its infancy, the method has been iterated on and applied in multiple innovation and research projects carried out with stakeholders in both the private and public sectors. Finally, we invite researchers and practitioners to share their experiences, and thus contributing to a body of work aimed at raising awareness of the importance of data readiness for NLP.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2110.05464 [cs.CL]
	(or arXiv:2110.05464v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.05464

Submission history

From: Fredrik Olsson [view email]
[v1] Mon, 11 Oct 2021 17:55:07 UTC (1,754 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CY

< prev | next >

new | recent | 2021-10

Change to browse by:

cs
cs.AI
cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Fredrik Olsson
Magnus Sahlgren

export BibTeX citation

Computer Science > Computation and Language

Title:We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators