One detector fits all: Robust and adaptive detection of malicious packages from PyPI to enterprises

Montaruli, Biagio; Compagna, Luca; Ponta,Serena Elisa; Balzarotti, Davide

ACSAC 2025, Annual Computer Security Applications Conference, 8-12 December 2025, Honolulu, Hawaii, USA

The rise of supply chain attacks via malicious Python packages calls for robust and adaptable detection solutions. However, current approaches overlook two critical challenges: (i) robustness to adversarial source code transformations, and (ii) the lack of adaptability to different actors in the software supply chain with different false positive rate (FPR) requirements, from repository maintainers (very low FPR) to enterprise security teams (higher FPR tolerance). To address these challenges, in this work we introduce a new robust detector that can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems. To thoroughly evaluate the robustness of our detector, we propose a novel methodology to generate adversarial packages by leveraging a new set of fine-grained code transformations based on code obfuscation techniques. By combining these adversarial packages with adversarial training (AT), we enhance the robustness of our detector by 2.5×. We comprehensively evaluate the effectiveness of AT by testing our detector against a large dataset of 122,398 packages collected daily from PyPI over 80 days, showing that AT needs to be applied carefully: on the one hand, it makes the detector more robust to obfuscations and allows finding 10% more obfuscated packages, but on the other hand it introduces a negative effect by slightly decreasing the performance on nonobfuscated packages. To demonstrate its adaptability in production, we conduct two vetting case studies by tuning the detector to different FPR thresholds: (i) one for PyPI maintainers with a low FPR (0.1%) and (ii) one for enterprise security teams with a higher FPR (10%). In the first case study, we evaluate our final detector on 91,949 packages collected over 37 days, achieving an average daily detection rate of 2.48 malicious packages with only 2.18 false positives per day. In the second one, we analyze 1,596 packages adopted by a multinational software company, achieving only 1.24 false positives on average per day. These results show that our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives. Overall, our detector uncovered a total of 346 malicious packages, now reported to the community.

Detail

Document

ARXIV

BIBTEX

Type:

Conférence

City:

Honolulu

Date:

2025-12-08

Department:

Sécurité numérique

Eurecom Ref:

8532

© 2025 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.