Oddballness: universal anomaly detection with language models

Graliński, Filip; Staruch, Ryszard; Jurkiewicz, Krzysztof

Computer Science > Computation and Language

arXiv:2409.03046 (cs)

[Submitted on 4 Sep 2024]

Title:Oddballness: universal anomaly detection with language models

Authors:Filip Graliński, Ryszard Staruch, Krzysztof Jurkiewicz

View PDF HTML (experimental)

Abstract:We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2409.03046 [cs.CL]
	(or arXiv:2409.03046v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.03046

Submission history

From: Filip Graliński [view email]
[v1] Wed, 4 Sep 2024 19:31:20 UTC (10 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-09

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Oddballness: universal anomaly detection with language models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Oddballness: universal anomaly detection with language models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators