pith. machine review for the scientific record. sign in

Toward generalizable evaluation in the llm era: A survey beyond benchmarks

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 4

representative citing papers

Security in LLM-as-a-Judge: A Comprehensive SoK

cs.CR · 2026-03-31 · accept · novelty 8.0

The first SoK on LLM-as-a-Judge security organizes attacks targeting judges, attacks using judges, defenses leveraging judges, and security-domain applications while flagging vulnerabilities.

Hint Tuning: Less Data Makes Better Reasoners

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models without accuracy loss.

citing papers explorer

Showing 4 of 4 citing papers.