GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Deressa, Deressa Wodajo; Mareen, Hannes; Lambert, Peter; Atnafu, Solomon; Akhtar, Zahid; Van Wallendael, Glenn

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.07036 (cs)

[Submitted on 13 Jul 2023 (v1), last revised 4 Mar 2025 (this version, v2)]

Title:GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Authors:Deressa Wodajo Deressa, Hannes Mareen, Peter Lambert, Solomon Atnafu, Zahid Akhtar, Glenn Van Wallendael

View PDF HTML (experimental)

Abstract:Deepfakes have raised significant concerns due to their potential to spread false information and compromise digital media integrity. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes Autoencoder and Variational Autoencoder to learn from the latent data distribution. By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v$2$) datasets. The proposed GenConViT model demonstrates strong performance in deepfake video detection, achieving high accuracy across the tested datasets. While our model shows promising results in deepfake video detection by leveraging visual and latent features, we demonstrate that further work is needed to improve its generalizability, i.e., when encountering out-of-distribution data. Our model provides an effective solution for identifying a wide range of fake videos while preserving media integrity. The open-source code for GenConViT is available at this https URL.

Comments:	11 pages, 4 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.07036 [cs.CV]
	(or arXiv:2307.07036v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.07036

Submission history

From: Deressa Wodajo [view email]
[v1] Thu, 13 Jul 2023 19:27:40 UTC (4,932 KB)
[v2] Tue, 4 Mar 2025 10:43:51 UTC (5,177 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators