Unified Framework for Deepfake Detection in Images, Videos, and Audio
DOI:
https://doi.org/10.32628/CSEIT251117239Keywords:
Deepfake Detection, Artificial Intelligence, Ma- chine Learning, Convolutional Neural Networks (CNN), Audio- Visual Forensics, Multimodal Learning, Real-Time Detection, TensorFlow, Flask, CybersecurityAbstract
Unified Framework for Deepfake Detection in Images, Videos, and Audio is a comprehensive system designed to identify manipulated multimedia content across multiple modalities. The project utilizes state-of-the-art deep learning techniques such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and spectrogram-based analysis to detect synthetic media generated by advanced AI tools. By integrating visual and auditory feature extraction pipelines, the framework ensures robust and reliable identification of fake images, video frame manipulations, and voice synthesis-based deepfakes. The proposed unified approach eliminates the need for separate detection systems by combining multimodal data analysis within a single architecture. Developed using Python, TensorFlow, Flask, and React.js, the framework supports real-time detection, visual analytics, and alert mechanisms for suspected deepfake content. Experimental results demonstrate high detection accuracy and adaptability against emerging deepfake generation techniques, confirming the system’s potential in digital forensics, social media verification, and cybersecurity applications. This work emphasizes the importance of developing unified, AI-driven tools to combat misinformation and safeguard the authenticity of digital content in modern communication networks.
Downloads
References
Dolhansky, B., Howes, R., Pflaum, B., Baram, N., and Ferrer, C. C., “The Deepfake Detection Challenge Dataset,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Verdoliva, L., “Media Forensics and DeepFakes: An Overview,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 5, pp. 910–932, 2023. DOI: https://doi.org/10.1109/JSTSP.2020.3002101
Mittal, T., Bhattacharya, U., and Chandra, R., “Emotions Don’t Lie: A Multimodal Deepfake Detection Method using Audio-Visual Cues,” IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 456–468, 2024.
Matern, F., Riess, C., and Stamminger, M., “Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations,” IEEE Winter Applica- tions of Computer Vision Workshops (WACVW), pp. 83–92, 2021. DOI: https://doi.org/10.1109/WACVW.2019.00020
Agarwal, S., and Subramanian, R., “Audio-Visual Transformer Models for Unified Deepfake Detection,” IEEE Transactions on Multimedia, vol. 26, no. 4, pp. 2015–2026, 2025.
Zhang, J., and Lin, K., “Multimodal Fusion Networks for Deepfake De- tection in Images and Videos,” IEEE Access, vol. 13, pp. 55411–55425, 2025.
Kim, S., and Park, J., “Spectrogram-Based CNN Models for Audio Deepfake Detection,” IEEE Signal Processing Letters, vol. 32, pp. 240–245, 2024.
Li, Y., and Chang, J., “Generalized Adversarial Training for Robust Deepfake Detection,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 412–428, 2025. DOI: https://doi.org/10.1109/TIFS.2025.3581021
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.