pith. sign in

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 2 2025 1

verdicts

UNVERDICTED 3

representative citing papers

LIMO: Less is More for Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.

citing papers explorer

Showing 3 of 3 citing papers.

  • When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias cs.AI · 2026-04-20 · unverdicted · none · ref 7

    VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.

  • MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment cs.LG · 2026-04-22 · unverdicted · none · ref 64

    MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

  • LIMO: Less is More for Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 34

    LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.