LLMs instead of human judges? A large scale empirical study across 20 NLP evaluation tasks
Bavaresco, A., Bernardi, R., Bertolazzi, L., Elliott, D., Fernández, R., Gatt, A., Ghaleb, E., Giulianelli, M., Hanna, M., Koller, A., Martins, A. F. T., Mondorf, P., Neplenbroek, V., Pezzelle, S., Plank, B., Schlangen, D., Suglia, A., Surikuchi, A. K., Takmaz, E., & Testoni, A.
(in press). LLMs instead of human judges? A large scale empirical study across 20 NLP evaluation tasks. In
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Publication type
Proceedings paper
Share this page