Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Zhang, Guosheng; Wang, Keyao; Yue, Haixiao; Liu, Ajian; Zhang, Gang; Yao, Kun; Ding, Errui; Wang, Jingdong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.01720 (cs)

[Submitted on 3 Jan 2025]

Title:Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Authors:Guosheng Zhang, Keyao Wang, Haixiao Yue, Ajian Liu, Gang Zhang, Kun Yao, Errui Ding, Jingdong Wang

View PDF HTML (experimental)

Abstract:Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems. Most existing FAS methods are formulated as binary classification tasks, providing confidence scores without interpretation. They exhibit limited generalization in out-of-domain scenarios, such as new environments or unseen spoofing types. In this work, we introduce a multimodal large language model (MLLM) framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS), which transforms the FAS task into an interpretable visual question answering (VQA) paradigm. Specifically, we propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images, enriching the model's supervision with natural language interpretations. To mitigate the impact of noisy captions during training, we develop a Lopsided Language Model (L-LM) loss function that separates loss calculations for judgment and interpretation, prioritizing the optimization of the former. Furthermore, to enhance the model's perception of global visual features, we design a Globally Aware Connector (GAC) to align multi-level visual representations with the language model. Extensive experiments on standard and newly devised One to Eleven cross-domain benchmarks, comprising 12 public datasets, demonstrate that our method significantly outperforms state-of-the-art methods.

Comments:	Accepted to AAAI2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.01720 [cs.CV]
	(or arXiv:2501.01720v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.01720

Submission history

From: Guosheng Zhang [view email]
[v1] Fri, 3 Jan 2025 09:25:04 UTC (2,798 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators