LLMEval-Fair introduces a dynamic, contamination-resistant evaluation framework for LLMs based on a large question bank and validates it via a 30-month study of nearly 60 models showing performance ceilings and hidden contamination issues.
Double Hundred Policy
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models
LLMEval-Fair introduces a dynamic, contamination-resistant evaluation framework for LLMs based on a large question bank and validates it via a 30-month study of nearly 60 models showing performance ceilings and hidden contamination issues.