Can we trust the judges? Validation of factuality evaluation methods via answer perturbation

Gharsallah, Sarra. Robaldo, Adele. Tokareva, Mariia. Gatti Pinheiro, Giovanni. Guendouz, Ilyana; Troncy, Raphaël; Papotti, Paolo; Michiardi, Pietro
EvalLLM 2025, Workshop on Evaluation Generative Models and Challenges, colocated with TALN, 30 June 2025, Marseille, France


Type:
Poster / Demo
City:
Marseille
Date:
2025-06-30
Department:
Data Science
Eurecom Ref:
8291

PERMALINK : https://www.eurecom.fr/publication/8291