LegalAgentBench: Evaluating LLM agents in legal domain

Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, Minlie Huang · 2025 · DOI 10.18653/v1/2025.acl-long.116

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.

citing papers explorer

Showing 2 of 2 citing papers.

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence? cs.CL · 2026-05-27 · unverdicted · none · ref 48
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks cs.CL · 2026-06-08 · unverdicted · none · ref 16
LexRubric is a rubric-based benchmark containing 649 instances and 12,337 atomic criteria for diagnostic evaluation of LLMs on open-ended Chinese legal consultation and judicial examination tasks across 14 scenarios.

LegalAgentBench: Evaluating LLM agents in legal domain

fields

years

verdicts

representative citing papers

citing papers explorer