Automated Annotation with Generative AI Requires Validation

Pangakis, Nicholas; Wolken, Samuel; Fasching, Neil

Computer Science > Computation and Language

arXiv:2306.00176 (cs)

[Submitted on 31 May 2023]

Title:Automated Annotation with Generative AI Requires Validation

Authors:Nicholas Pangakis, Samuel Wolken, Neil Fasching

View PDF

Abstract:Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures, but their performance varies across annotation tasks due to prompt quality, text data idiosyncrasies, and conceptual difficulty. Because these challenges will persist even as LLM technology improves, we argue that any automated annotation process using an LLM must validate the LLM's performance against labels generated by humans. To this end, we outline a workflow to harness the annotation potential of LLMs in a principled, efficient way. Using GPT-4, we validate this approach by replicating 27 annotation tasks across 11 datasets from recent social science articles in high-impact journals. We find that LLM performance for text annotation is promising but highly contingent on both the dataset and the type of annotation task, which reinforces the necessity to validate on a task-by-task basis. We make available easy-to-use software designed to implement our workflow and streamline the deployment of LLMs for automated annotation.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.00176 [cs.CL]
	(or arXiv:2306.00176v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.00176

Submission history

From: Nicholas Pangakis [view email]
[v1] Wed, 31 May 2023 20:50:45 UTC (395 KB)

Computer Science > Computation and Language

Title:Automated Annotation with Generative AI Requires Validation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Automated Annotation with Generative AI Requires Validation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators