Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight

Ding, Xi; Wang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.18298 (cs)

[Submitted on 24 Dec 2024]

Title:Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight

Authors:Xi Ding, Lei Wang

View PDF HTML (experimental)

Abstract:Video anomaly detection (VAD) has witnessed significant advancements through the integration of large language models (LLMs) and vision-language models (VLMs), addressing critical challenges such as interpretability, temporal reasoning, and generalization in dynamic, open-world scenarios. This paper presents an in-depth review of cutting-edge LLM-/VLM-based methods in 2024, focusing on four key aspects: (i) enhancing interpretability through semantic insights and textual explanations, making visual anomalies more understandable; (ii) capturing intricate temporal relationships to detect and localize dynamic anomalies across video frames; (iii) enabling few-shot and zero-shot detection to minimize reliance on large, annotated datasets; and (iv) addressing open-world and class-agnostic anomalies by using semantic understanding and motion features for spatiotemporal coherence. We highlight their potential to redefine the landscape of VAD. Additionally, we explore the synergy between visual and textual modalities offered by LLMs and VLMs, highlighting their combined strengths and proposing future directions to fully exploit the potential in enhancing video anomaly detection.

Comments:	Research report
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.18298 [cs.CV]
	(or arXiv:2412.18298v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.18298

Submission history

From: Xi Ding [view email]
[v1] Tue, 24 Dec 2024 09:05:37 UTC (13,095 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators