Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems

Harvey, Emma; Sheng, Emily; Blodgett, Su Lin; Chouldechova, Alexandra; Garcia-Gathright, Jean; Olteanu, Alexandra; Wallach, Hanna

Computer Science > Computers and Society

arXiv:2411.15662 (cs)

[Submitted on 23 Nov 2024]

Title:Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems

Authors:Emma Harvey, Emily Sheng, Su Lin Blodgett, Alexandra Chouldechova, Jean Garcia-Gathright, Alexandra Olteanu, Hanna Wallach

View PDF HTML (experimental)

Abstract:To facilitate the measurement of representational harms caused by large language model (LLM)-based systems, the NLP research community has produced and made publicly available numerous measurement instruments, including tools, datasets, metrics, benchmarks, annotation instructions, and other techniques. However, the research community lacks clarity about whether and to what extent these instruments meet the needs of practitioners tasked with developing and deploying LLM-based systems in the real world, and how these instruments could be improved. Via a series of semi-structured interviews with practitioners in a variety of roles in different organizations, we identify four types of challenges that prevent practitioners from effectively using publicly available instruments for measuring representational harms caused by LLM-based systems: (1) challenges related to using publicly available measurement instruments; (2) challenges related to doing measurement in practice; (3) challenges arising from measurement tasks involving LLM-based systems; and (4) challenges specific to measuring representational harms. Our goal is to advance the development of instruments for measuring representational harms that are well-suited to practitioner needs, thus better facilitating the responsible development and deployment of LLM-based systems.

Comments:	NeurIPS 2024 Workshop on Evaluating Evaluations (EvalEval)
Subjects:	Computers and Society (cs.CY)
Cite as:	arXiv:2411.15662 [cs.CY]
	(or arXiv:2411.15662v1 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.2411.15662

Submission history

From: Emma Harvey [view email]
[v1] Sat, 23 Nov 2024 22:13:38 UTC (46 KB)

Computer Science > Computers and Society

Title:Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators