pith. machine review for the scientific record. sign in

hub

Judgebench: A benchmark for evaluating llm-based judges

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 13 2024 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

Green Shielding: A User-Centric Approach Towards Trustworthy AI

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

Green Shielding introduces CUE criteria and the HCM-Dx benchmark to demonstrate that routine prompt variations systematically alter LLM diagnostic behavior along clinically relevant dimensions, producing Pareto-like tradeoffs in plausibility versus coverage.

citing papers explorer

Showing 2 of 2 citing papers after filters.