MDD: a mask diffusion detector to protect speaker verification systems from adversarial perturbations

Bai, Yibo; Chen, Sizhou; Panariello, Michele; Zhang, Xiao-Lei; Todisco, Massimiliano; Evans, Nicholas

APSIPA ASC 2025, 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 22-24 October 2025, Shangri-la, Singapore

Speaker verification systems are increasingly deployed in security-sensitive applications but remain highly vulnerable to adversarial perturbations. In this work, we propose the Mask Diffusion Detector (MDD), a novel adversarial detection and purification framework based on a text-conditioned masked diffusion model. During training, MDD applies partial masking to Mel-spectrograms and progressively adds noise through a forward diffusion process, simulating the degradation of clean speech features. A reverse process then reconstructs the clean representation conditioned on the input transcription. Unlike prior approaches, MDD does not require adversarial examples or large-scale pretraining. Experimental results show that MDD achieves strong adversarial detection performance and outperforms prior state-of-the-art methods, including both diffusionbased and neural codec-based approaches. Furthermore, MDD effectively purifies adversarially-manipulated speech, restoring speaker verification performance to levels close to those observed under clean conditions. These findings demonstrate the potential of diffusion-based masking strategies for secure and reliable speaker verification systems.

Detail

Document

ARXIV

HAL

BIBTEX

Type:

Conférence

City:

Shangri-la

Date:

2025-08-26

Department:

Sécurité numérique

Eurecom Ref:

8372

© 2025 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.