Future-proofing deepfake detection by integrating audio, video, and text

Lisena, Pasquale; Lee, Kong Aik; Wang, Yi; Todisco, Massimiliano; Galdi, Chiara; Troncy, Raphael; Evans, Nicholas; Weiwei Lin, Chau, Lap-Pui; Mak, Man-Wai

ACM AI Letters, February 2026

The rapid advancement of AI-generated content has made deepfakes increasingly realistic, posing serious risks to identity security, social trust, and public and democratic institutions. Existing detection systems, typically focused on single modalities such as video or audio, often fail to generalize to new manipulation techniques and cannot effectively detect hybrid or low-effort deepfakes. In this perspective letter, we advocate for a new paradigm in deepfake detection, that emphasizes the integration of audio, video, and textual content. We examine the limitations of current systems, including their over-reliance on outdated datasets and limited adversarial robustness. We outline the technical motivations for integrating these modalities, and highlight emerging research directions. By aligning detection strategies with the multimodal nature of AI-driven manipulation, we call for a new generation of systems that are more generalizable and trustworthy.

Detail

Document

DOI

BIBTEX

Type:

Journal

Date:

2026-02-17

Department:

Data Science

Eurecom Ref:

8639

© ACM, 2026. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM AI Letters, February 2026 https://doi.org/10.1145/3797958