pith. sign in

hub

From generation to judgment: Opportunities and challenges of LLM-as-a-judge

22 Pith papers cite this work, alongside 44 external citations. Polarity classification is still indexing.

22 Pith papers citing it
44 external citations · Crossref

hub tools

citation-role summary

background 3 method 1

citation-polarity summary

years

2026 22

representative citing papers

Trip+: Benchmarking Agents in Personalized Interactive Travel Planning

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Trip+ benchmark evaluates language model agents on generating and revising personalized minute-level travel itineraries under dynamic interactions, finding consistent gaps where models produce feasible but exhausting plans that ignore traveler profiles.

When AI Says It Feels

cs.AI · 2026-06-04 · unverdicted · novelty 5.0

LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.

POLARIS: Guiding Small Models to Write Long Stories

cs.CL · 2026-06-02 · unverdicted · novelty 5.0

POLARIS trains Qwen3.5-9B via GRPO with LLM-as-judge rewards and human-reference injection, yielding a model competitive with larger open-weight models on length adherence and quality, including generalization to 3x training length.

citing papers explorer

Showing 22 of 22 citing papers.