Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

Das, Sayantan; Kolahdouzi, Mojtaba; Özparlak, Levent; Hickie, Will; Etemad, Ali

Computer Science > Computer Vision and Pattern Recognition

arXiv:2306.06881 (cs)

[Submitted on 12 Jun 2023 (v1), last revised 9 Feb 2024 (this version, v2)]

Title:Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

Authors:Sayantan Das, Mojtaba Kolahdouzi, Levent Özparlak, Will Hickie, Ali Etemad

View PDF HTML (experimental)

Abstract:We present a novel approach for the detection of deepfake videos using a pair of vision transformers pre-trained by a self-supervised masked autoencoding setup. Our method consists of two distinct components, one of which focuses on learning spatial information from individual RGB frames of the video, while the other learns temporal consistency information from optical flow fields generated from consecutive frames. Unlike most approaches where pre-training is performed on a generic large corpus of images, we show that by pre-training on smaller face-related datasets, namely Celeb-A (for the spatial learning component) and YouTube Faces (for the temporal learning component), strong results can be obtained. We perform various experiments to evaluate the performance of our method on commonly used datasets namely FaceForensics++ (Low Quality and High Quality, along with a new highly compressed version named Very Low Quality) and Celeb-DFv2 datasets. Our experiments show that our method sets a new state-of-the-art on FaceForensics++ (LQ, HQ, and VLQ), and obtains competitive results on Celeb-DFv2. Moreover, our method outperforms other methods in the area in a cross-dataset setup where we fine-tune our model on FaceForensics++ and test on CelebDFv2, pointing to its strong cross-dataset generalization ability.

Comments:	This paper has been accepted by IEEE International Joint Conference on Biometrics (IJCB 2023)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2306.06881 [cs.CV]
	(or arXiv:2306.06881v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2306.06881

Submission history

From: Sayantan Das [view email]
[v1] Mon, 12 Jun 2023 05:49:23 UTC (4,490 KB)
[v2] Fri, 9 Feb 2024 12:25:03 UTC (15,600 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators