MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.
HD - Eval : Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition , February 2024
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers
MESSALA is a new LLM framework that produces report evaluations closer to veteran SOC practitioners than prior LLM methods by combining a custom checklist with granularization guidelines and multi-perspective scoring.
-
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.