HD - Eval : Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition , February 2024

Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang · 2024 · arXiv 2402.15754

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers

cs.CR · 2026-01-06 · unverdicted · novelty 6.0

MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.

Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences

cs.LG · 2026-05-15 · unverdicted · novelty 5.0 · 2 refs

Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 3 of 3 citing papers.

LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers cs.CR · 2026-01-06 · unverdicted · none · ref 51
MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences cs.LG · 2026-05-15 · unverdicted · none · ref 2 · 2 links
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 152
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

HD - Eval : Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition , February 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer