A Multimodal Framework for Deepfake Detection

Gandhi, Kashish; Kulkarni, Prutha; Shah, Taran; Chaudhari, Piyush; Narvekar, Meera; Ghag, Kranti

doi:10.53555/jes.v20i10s.6126

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.03487 (cs)

[Submitted on 4 Oct 2024]

Title:A Multimodal Framework for Deepfake Detection

Authors:Kashish Gandhi, Prutha Kulkarni, Taran Shah, Piyush Chaudhari, Meera Narvekar, Kranti Ghag

View PDF HTML (experimental)

Abstract:The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.

Comments:	22 pages, 14 figures, Accepted in Journal of Electrical Systems
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Cite as:	arXiv:2410.03487 [cs.CV]
	(or arXiv:2410.03487v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.03487
Related DOI:	https://doi.org/10.53555/jes.v20i10s.6126

Submission history

From: Kashish Gandhi [view email]
[v1] Fri, 4 Oct 2024 14:59:10 UTC (8,647 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Multimodal Framework for Deepfake Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Multimodal Framework for Deepfake Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators