Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs

Yang, Xiulin; Aoyama, Tatsuya; Yao, Yuekun; Wilcox, Ethan

Abstract:Do LLMs offer insights into human language learning? A common argument against this idea is that because their architecture and training paradigm are so vastly different from humans, LLMs can learn arbitrary inputs as easily as natural languages. In this paper, we test this claim by training LMs to model impossible and typologically unattested languages. Unlike previous work, which has focused exclusively on English, we conduct experiments on 12 natural languages from 4 language families. Our results show that while GPT-2 small can primarily distinguish attested languages from their impossible counterparts, it does not achieve perfect separation between all the attested languages and all the impossible ones. We further test whether GPT-2 small distinguishes typologically attested from unattested languages with different NP orders by manipulating word order based on Greenberg's Universal 20. We find that the model's perplexity scores do not distinguish attested vs. unattested word orders, as long as the unattested variants maintain constituency structure. These findings suggest that language models exhibit some human-like inductive biases, though these biases are weaker than those found in human learners.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.18795 [cs.CL]
	(or arXiv:2502.18795v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.18795

Computer Science > Computation and Language

Title:Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators