Double Hundred Policy

The models we selected in main paper was evaluated across three prompting paradigms: Zero-Shot (ZS), Few- Shot (FS), Chain-of-Thought (CoT) · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

cs.CL · 2025-08-07 · conditional · novelty 6.0

LLMEval-Fair introduces a dynamic, contamination-resistant evaluation framework for LLMs based on a large question bank and validates it via a 30-month study of nearly 60 models showing performance ceilings and hidden contamination issues.

citing papers explorer

Showing 1 of 1 citing paper.

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models cs.CL · 2025-08-07 · conditional · none · ref 13
LLMEval-Fair introduces a dynamic, contamination-resistant evaluation framework for LLMs based on a large question bank and validates it via a 30-month study of nearly 60 models showing performance ceilings and hidden contamination issues.

Double Hundred Policy

fields

years

verdicts

representative citing papers

citing papers explorer