T.; and Yadav, V

Ben Etzine, Hanna Mazzawi, Lior Wolf · 2025 · arXiv 2503.05551

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

SEAL revives saturated benchmarks via adaptive LLM meta-judging in elimination matches, matching full pairwise accuracy with roughly half the calls across code, math, QA, and agent tasks.

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

cs.AI · 2025-07-30 · unverdicted · novelty 6.0

League of LLMs organizes LLMs into a self-governed mutual evaluation league using dynamic, transparent, objective, and professional criteria to distinguish model capabilities with 70.7% top-k ranking stability.

citing papers explorer

Showing 1 of 1 citing paper after filters.

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models cs.AI · 2025-07-30 · unverdicted · none · ref 16
League of LLMs organizes LLMs into a self-governed mutual evaluation league using dynamic, transparent, objective, and professional criteria to distinguish model capabilities with 70.7% top-k ranking stability.

T.; and Yadav, V

fields

years

verdicts

representative citing papers

citing papers explorer