Beyond Text: Characterizing Domain Expert Needs in Document Research

Gururaja, Sireesh; Gandhi, Nupoor; Milbauer, Jeremiah; Strubell, Emma

Computer Science > Computation and Language

arXiv:2504.12495 (cs)

[Submitted on 16 Apr 2025]

Title:Beyond Text: Characterizing Domain Expert Needs in Document Research

Authors:Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer, Emma Strubell

View PDF HTML (experimental)

Abstract:Working with documents is a key part of almost any knowledge work, from contextualizing research in a literature review to reviewing legal precedent. Recently, as their capabilities have expanded, primarily text-based NLP systems have often been billed as able to assist or even automate this kind of work. But to what extent are these systems able to model these tasks as experts conceptualize and perform them now? In this study, we interview sixteen domain experts across two domains to understand their processes of document research, and compare it to the current state of NLP systems. We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document in addition its content; existing approaches in NLP and adjacent fields that explicitly center the document as an object, rather than as merely a container for text, tend to better reflect our participants' priorities, though they are often less accessible outside their research communities. We call on the NLP community to more carefully consider the role of the document in building useful tools that are accessible, personalizable, iterative, and socially aware.

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2504.12495 [cs.CL]
	(or arXiv:2504.12495v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.12495

Submission history

From: Sireesh Gururaja [view email]
[v1] Wed, 16 Apr 2025 21:24:41 UTC (8,980 KB)

Computer Science > Computation and Language

Title:Beyond Text: Characterizing Domain Expert Needs in Document Research

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Beyond Text: Characterizing Domain Expert Needs in Document Research

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators