LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
LegalAgentBench: Evaluating LLM agents in legal domain
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.
citing papers explorer
-
Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
-
LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.