Prompting from the bench: Large-scale pretraining is not sufficient to prepare LLMs for ordinary meaning analysis

Abhishek Purushothama , Junghyun Min , Brandon Waldon , Nathan Schneider

Authors on Pith no claims yet

classification 💻 cs.CL

keywords modelslegalinterpretationllmsordinaryquestionsevaluationlanguage

read the original abstract

In the U.S. judicial system, a widespread approach to legal interpretation entails assessing how a legal text would be understood by an `ordinary' speaker of the language. Recent scholarship has proposed that legal practitioners leverage large language models (LLMs) to ascertain a text's ordinary meaning. But are LLMs up to the task? As textual interpretation questions arise in spheres ranging from criminal law to civil rights, we argue it is crucial that models not be taken as authoritative without rigorous evaluation. This work offers an empirical argument against LLM-assisted interpretation as recently practiced by legal scholars and federal judges, who reasoned the large amount of data that models see in training would enable models to illuminate how people ordinarily use certain words or phrases. In controlled experiments, we find failures in robustness which cast doubt on this assumption and raise serious questions about the utility of these models in practice. For the models in our evaluation, slight changes to the format of a question can lead to wildly different conclusions -- a vulnerability that parties with an interest in the outcome could exploit. Comparing with a dataset where people were asked similar legal interpretation questions, we see that these models are at best moderately correlated to human judgments -- not strong enough given the stakes in this domain.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Speaking of Language: Reflections on Metalanguage Research in NLP
cs.CL 2026-04 unverdicted novelty 3.0

This reflection paper highlights metalanguage in NLP, links it to LLMs, and lists understudied future directions.