Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

Soni, Nikita; Balasubramanian, Niranjan; Schwartz, H. Andrew; Hovy, Dirk

Computer Science > Computation and Language

arXiv:2401.12492 (cs)

[Submitted on 23 Jan 2024 (v1), last revised 18 Jul 2024 (this version, v3)]

Title:Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

Authors:Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy

View PDF HTML (experimental)

Abstract:Pre-trained language models consider the context of neighboring words and documents but lack any author context of the human generating the text. However, language depends on the author's states, traits, social, situational, and environmental attributes, collectively referred to as human context (Soni et al., 2024). Human-centered natural language processing requires incorporating human context into language models. Currently, two methods exist: pre-training with 1) group-wise attributes (e.g., over-45-year-olds) or 2) individual traits. Group attributes are simple but coarse -- not all 45-year-olds write the same way -- while individual traits allow for more personalized representations, but require more complex modeling and data. It is unclear which approach benefits what tasks. We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on five user- and document-level tasks. Our results show that there is no best approach, but that human-centered language modeling holds avenues for different methods.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2401.12492 [cs.CL]
	(or arXiv:2401.12492v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.12492

Submission history

From: Nikita Soni [view email]
[v1] Tue, 23 Jan 2024 05:20:35 UTC (7,780 KB)
[v2] Tue, 26 Mar 2024 19:28:15 UTC (7,901 KB)
[v3] Thu, 18 Jul 2024 21:57:20 UTC (7,901 KB)

Computer Science > Computation and Language

Title:Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators