Reference-guided verdict: Llms-as-judges in automatic evaluation of free-form text.arXiv preprint arXiv:2408.09235, 2024

Sher Badshah, Hassan Sajjad · 2024 · arXiv 2408.09235

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

cs.CL · 2026-03-20 · conditional · novelty 7.0

Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 3 of 3 citing papers.

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis cs.CL · 2026-03-20 · conditional · none · ref 7
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 7
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 8
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Reference-guided verdict: Llms-as-judges in automatic evaluation of free-form text.arXiv preprint arXiv:2408.09235, 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer