Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

From generation to judgment: Opportunities, challenges of llm-as-a-judge , author= · 2025

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

FlowEval: Reference-based Evaluation of Generated User Interfaces

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

FlowEval evaluates generated UIs by measuring how closely their navigation flows match real websites via reference-based similarity metrics and shows strong correlation with human expert judgments.

Evaluating Multi-turn Human-AI Interaction

cs.HC · 2026-05-18 · unverdicted · novelty 6.0

Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.

Optimal Transport for LLM Reward Modeling from Noisy Preference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.

From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.

Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG

cs.AI · 2026-05-07 · unverdicted · novelty 4.0

TGS-RAG adds graph-to-text re-ranking with global voting and text-to-graph orphan path bridging to improve precision and efficiency in multi-hop RAG over prior baselines.

citing papers explorer

Showing 6 of 6 citing papers.

FlowEval: Reference-based Evaluation of Generated User Interfaces cs.MA · 2026-05-05 · unverdicted · none · ref 33
FlowEval evaluates generated UIs by measuring how closely their navigation flows match real websites via reference-based similarity metrics and shows strong correlation with human expert judgments.
Evaluating Multi-turn Human-AI Interaction cs.HC · 2026-05-18 · unverdicted · none · ref 49
Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.
Optimal Transport for LLM Reward Modeling from Noisy Preference cs.LG · 2026-05-07 · unverdicted · none · ref 209
SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.
From Fallback to Frontline: When Can LLMs be Superior Annotators of Human Perspectives? cs.AI · 2026-04-20 · unverdicted · none · ref 17
LLMs can be statistically superior to humans at estimating group-level judgments on subjective tasks because of their low variance and decoupled representation-processing biases.
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction cs.CL · 2026-05-07 · unverdicted · none · ref 65
StraTA improves LLM agent success rates to 93.1% on ALFWorld and 84.2% on WebShop by sampling a compact initial strategy and training it jointly with action execution via hierarchical GRPO-style rollouts.
Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG cs.AI · 2026-05-07 · unverdicted · none · ref 7
TGS-RAG adds graph-to-text re-ranking with global voting and text-to-graph orphan path bridging to improve precision and efficiency in multi-hop RAG over prior baselines.

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

fields

years

verdicts

representative citing papers

citing papers explorer