The Evolution of LLM Adoption in Industry Data Curation Practices

Qian, Crystal; Liu, Michael Xieyang; Reif, Emily; Simon, Grady; Hussein, Nada; Clement, Nathan; Wexler, James; Cai, Carrie J.; Terry, Michael; Kahng, Minsuk

Abstract:As large language models (LLMs) grow increasingly adept at processing unstructured text data, they offer new opportunities to enhance data curation workflows. This paper explores the evolution of LLM adoption among practitioners at a large technology company, evaluating the impact of LLMs in data curation tasks through participants' perceptions, integration strategies, and reported usage scenarios. Through a series of surveys, interviews, and user studies, we provide a timely snapshot of how organizations are navigating a pivotal moment in LLM evolution. In Q2 2023, we conducted a survey to assess LLM adoption in industry for development tasks (N=84), and facilitated expert interviews to assess evolving data needs (N=10) in Q3 2023. In Q2 2024, we explored practitioners' current and anticipated LLM usage through a user study involving two LLM-based prototypes (N=12). While each study addressed distinct research goals, they revealed a broader narrative about evolving LLM usage in aggregate. We discovered an emerging shift in data understanding from heuristic-first, bottom-up approaches to insights-first, top-down workflows supported by LLMs. Furthermore, to respond to a more complex data landscape, data practitioners now supplement traditional subject-expert-created 'golden datasets' with LLM-generated 'silver' datasets and rigorously validated 'super golden' datasets curated by diverse experts. This research sheds light on the transformative role of LLMs in large-scale analysis of unstructured data and highlights opportunities for further tool development.

Comments:	19 pages, 4 tables, 3 figures
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.16089 [cs.HC]
	(or arXiv:2412.16089v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2412.16089

Computer Science > Human-Computer Interaction

Title:The Evolution of LLM Adoption in Industry Data Curation Practices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators