Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent

He, Zeyu; Naphade, Saniya; Huang, Ting-Hao 'Kenneth'

doi:10.1145/3706598.3714319

Computer Science > Human-Computer Interaction

arXiv:2502.11267 (cs)

[Submitted on 16 Feb 2025]

Title:Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent

Authors:Zeyu He, Saniya Naphade, Ting-Hao 'Kenneth' Huang

View PDF HTML (experimental)

Abstract:Millions of users prompt large language models (LLMs) for various tasks, but how good are people at prompt engineering? Do users actually get closer to their desired outcome over multiple iterations of their prompts? These questions are crucial when no gold-standard labels are available to measure progress. This paper investigates a scenario in LLM-powered data labeling, "prompting in the dark," where users iteratively prompt LLMs to label data without using manually-labeled benchmarks. We developed PromptingSheet, a Google Sheets add-on that enables users to compose, revise, and iteratively label data through spreadsheets. Through a study with 20 participants, we found that prompting in the dark was highly unreliable-only 9 participants improved labeling accuracy after four or more iterations. Automated prompt optimization tools like DSPy also struggled when few gold labels were available. Our findings highlight the importance of gold labels and the needs, as well as the risks, of automated support in human prompt engineering, providing insights for future tool design.

Comments:	Accepted By CHI 2025
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2502.11267 [cs.HC]
	(or arXiv:2502.11267v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2502.11267
Related DOI:	https://doi.org/10.1145/3706598.3714319

Submission history

From: Zeyu He [view email]
[v1] Sun, 16 Feb 2025 20:54:26 UTC (16,747 KB)

Computer Science > Human-Computer Interaction

Title:Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators