Natural Language Processing in the Legal Domain
Pith reviewed 2026-05-24 09:27 UTC · model grok-4.3
The pith
Legal NLP research has expanded in volume, tasks, languages, and methodological sophistication from 2013 to 2024, now aligning with general NLP standards on data sharing and reproducibility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Analysis of a nearly complete corpus of nearly one thousand NLP and law papers shows steady growth in publication count, task diversity, and language coverage, accompanied by rising use of advanced methods that now match general NLP and by rising rates of data and code availability that now match broader scientific norms.
What carries the argument
A constructed corpus of nearly 1000 papers from 2013-2024, used to track trends in publication volume, tasks, languages, method sophistication, and reproducibility practices.
If this is right
- Publication volume in legal NLP will keep rising.
- A broader set of tasks and languages will be addressed.
- Methods will continue to converge with those used in general NLP.
- Data and code availability will become the norm, raising overall reliability.
Where Pith is reading between the lines
- Practical legal tools may emerge more rapidly as methods align with general NLP.
- Non-English legal systems could see accelerated coverage as language diversity grows.
- Reproducibility gains may draw in researchers from adjacent fields like computational social science.
- The next phase could involve direct integration of legal NLP outputs into court or firm workflows.
Load-bearing premise
The collected papers form a nearly complete and representative sample of all legal NLP work in the period, so the observed trends accurately describe the field.
What would settle it
Discovery of a large set of omitted legal NLP papers from 2013-2024 whose methods or data practices show no increase in sophistication or reproducibility.
Figures
read the original abstract
We summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of nearly one thousand NLP & Law related papers published between 2013-2024. Our analysis highlights several major trends. Namely, we document an increasing number of papers written, tasks undertaken, and languages covered over the course of the past decade. We observe an increase in the sophistication of the methods which researchers deployed in this applied context. Legal NLP is beginning to match not only the methodological sophistication of general NLP but also the professional standards of data availability and code reproducibility observed within the broader scientific community. We believe all of these trends bode well for the future of the field and point to an exciting next phase for the Legal NLP community.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript summarizes the current state of NLP & Law by constructing a corpus of nearly 1000 papers (2013-2024) and documenting trends of increasing publication volume, task diversity, language coverage, methodological sophistication, and adherence to data/code reproducibility standards, concluding that the field is aligning with general NLP practices.
Significance. If the corpus construction is transparent and representative, the work supplies a useful field overview that could help prioritize research directions and community standards in legal NLP.
major comments (1)
- [Abstract / Corpus construction] Abstract and methods (corpus construction section): the central claim that observed trends in method sophistication and reproducibility reflect field-wide developments rests on the corpus being 'nearly complete' and representative, yet no explicit search protocol, queried databases, keyword sets, inclusion/exclusion criteria, or external validation (e.g., against known legal NLP venue lists) is described. Without these, systematic under-sampling of non-English, workshop, or non-standard terminology papers cannot be ruled out, rendering the trend claims unverifiable.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the transparency of our corpus construction. We address the concern in detail below and will revise the manuscript to incorporate the requested information.
read point-by-point responses
-
Referee: [Abstract / Corpus construction] Abstract and methods (corpus construction section): the central claim that observed trends in method sophistication and reproducibility reflect field-wide developments rests on the corpus being 'nearly complete' and representative, yet no explicit search protocol, queried databases, keyword sets, inclusion/exclusion criteria, or external validation (e.g., against known legal NLP venue lists) is described. Without these, systematic under-sampling of non-English, workshop, or non-standard terminology papers cannot be ruled out, rendering the trend claims unverifiable.
Authors: We agree that the current description of corpus construction lacks the level of detail needed to fully substantiate claims of representativeness. In the revised manuscript we will add a dedicated subsection under Methods that explicitly documents: the databases and repositories searched (ACL Anthology, arXiv, Semantic Scholar, Google Scholar, and others); the complete keyword sets and Boolean queries used; the inclusion/exclusion criteria applied (covering language, venue type, and terminology); and any validation steps performed against external lists of legal NLP venues or prior surveys. These additions will allow readers to assess potential sampling biases and will strengthen the verifiability of the reported trends. revision: yes
Circularity Check
No circularity: purely descriptive survey without derivations or fitted predictions
full rationale
The paper constructs a corpus of ~1000 papers and reports observed trends in publication volume, task diversity, language coverage, method sophistication, and data/code release rates. No equations, predictions, or parameters are fitted; the central claims are direct empirical summaries of the collected papers. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear. The representativeness assumption is an external validity concern, not a circular reduction of the reported observations to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A literature search combined with manual review can identify nearly all relevant papers in the NLP & Law domain between 2013-2024.
Forward citations
Cited by 1 Pith paper
-
MAP-Law: Coverage-Driven Retrieval Control for Multi-Turn Legal Consultation
MAP-Law dynamically controls retrieval depth in legal AI by computing element coverage, evidence coverage, and marginal gain on a joint node graph, reaching 0.86 element coverage with 58% fewer rounds than fixed basel...
Reference graph
Works this paper leans on
-
[1]
Chalkidis, I. et al. Lexglue: A benchmark dataset for legal language understanding in english. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 4310–4330 (2022)
work page 2022
- [2]
-
[3]
Ruhl, J., Katz, D. & Bommarito, M. Harnessing legal complexity. Science 355, 1377–1378 (2017)
work page 2017
- [4]
-
[5]
M., Coupette, C., Beckedorf, J
Katz, D. M., Coupette, C., Beckedorf, J. & Hartung, D. Complex societies and the growth of the law. Sci. reports 10, 1–14 (2020)
work page 2020
-
[6]
Ruhl, J. B. & Katz, D. M. Measuring, monitoring, and managing legal complexity. Iowa L. Rev. 101, 191 (2015)
work page 2015
-
[7]
Staudt, R. W. All the wild possibilities: Technology that attacks barriers to access to justice. Loy. LAL Rev. 42, 1117 (2008)
work page 2008
- [8]
-
[9]
Rhode, D. L. Access to justice (Oxford University Press, 2004)
work page 2004
-
[10]
Susskind, R. E. Online courts and the future of justice (Oxford University Press, 2019)
work page 2019
-
[11]
Sandefur, R. L. & Teufel, J. Assessing america’s access to civil justice crisis. UC Irvine L. Rev. 11, 753 (2020)
work page 2020
-
[12]
Prescott, J. J. Improving access to justice in state courts with platform technology. Vand. L. Rev.70, 1993 (2017)
work page 1993
-
[13]
Susskind, R. E. Tomorrow’s lawyers: An introduction to your future(Oxford University Press, 2017)
work page 2017
-
[14]
Barton, B. H. & Bibas, S. Rebooting justice: More technology, fewer lawyers, and the future of law(Encounter Books, 2017)
work page 2017
-
[15]
Kobayashi, B. H. & Ribstein, L. E. Law’s information revolution. Ariz. L. Rev. 53, 1169 (2011)
work page 2011
-
[16]
Hadfield, G. K. The cost of law: Promoting access to justice through the (un) corporate practice of law. Int. Rev. Law Econ. 38, 43–63 (2014)
work page 2014
-
[17]
Barton, B. H. & Rhode, D. L. Access to justice and routine legal services: New technologies meet bar regulators. Hast. LJ 70, 955 (2018)
work page 2018
-
[18]
Natural language processing for lawyers and judges
Fagan, F. Natural language processing for lawyers and judges. Mich. L. Rev. 119, 1399 (2020)
work page 2020
-
[19]
Livermore, M. A. & Rockmore, D. N. Law as Data: Computation, Text, & the Future of Legal Analysis (Santa Fe Institute Press, 2019)
work page 2019
-
[20]
Kolt, N. Predicting consumer contracts. Berkeley Technol. Law J.37 (2022)
work page 2022
-
[21]
Bommarito II, M. J., Katz, D. M. & Detterman, E. M. Lexnlp: Natural language processing and information extraction for legal and regulatory texts. In Research Handbook on Big Data Law, 216–227 (Edward Elgar Publishing, 2021)
work page 2021
-
[22]
Law and word order: Nlp in legal tech
Dale, R. Law and word order: Nlp in legal tech. Nat. Lang. Eng. 25, 211–217 (2019). 11/13
work page 2019
-
[23]
Engstrom, D. F. & Gelbach, J. B. Legal tech, civil procedure, and the future of adversarialism. U. Pa. L. Rev.169, 1001 (2020)
work page 2020
-
[24]
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
work page 1986
-
[25]
LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning.Nature 521, 436–444 (2015)
work page 2015
-
[26]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. neural information processing systems 26 (2013)
work page 2013
-
[27]
Pennington, J., Socher, R. & Manning, C. D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 (2014)
work page 2014
-
[28]
Peters, M. E. et al. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2227–2237 (2018)
work page 2018
-
[29]
Vaswani, A. et al. Attention is all you need. Adv. neural information processing systems 30 (2017)
work page 2017
-
[30]
Tay, Y ., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: A survey.ACM Comput. Surv. 55, 1–28 (2022)
work page 2022
-
[31]
M., Gebru, T., McMillan-Major, A
Bender, E. M., Gebru, T., McMillan-Major, A. & Mitchell, M. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623 (2021)
work page 2021
-
[32]
Kenton, J. D. M.-W. C. & Toutanova, L. K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186 (2019)
work page 2019
-
[33]
Brown, T. et al. Language models are few-shot learners. Adv. neural information processing systems 33, 1877–1901 (2020)
work page 1901
-
[34]
Zaheer, M. et al. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 33, 17283–17297 (2020)
work page 2020
-
[35]
Scao, T. L. et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Thoppilan, R. et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Zheng, L., Guha, N., Anderson, B. R., Henderson, P. & Ho, D. E. When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, 159–168 (2021)
work page 2021
-
[38]
Chalkidis, I., Fergadiotis, M. & Androutsopoulos, I. Multieurlex-a multi-lingual and multi-label legal document classifica- tion dataset for zero-shot cross-lingual transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 6974–6996 (2021)
work page 2021
- [39]
- [40]
-
[41]
Huang, J. et al. Large language models can self-improve. arXiv preprint arXiv:2210.11610 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[42]
Wu, T. et al. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, 1–10 (2022)
work page 2022
-
[43]
Large Language Models Are Human-Level Prompt Engineers
Zhou, Y .et al. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022)
work page internal anchor Pith review arXiv 2022
-
[44]
Zhong, H. et al. How does nlp benefit legal system: A summary of legal artificial intelligence. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5218–5230 (2020)
work page 2020
-
[45]
Katz, D. M., Dolin, R. & Bommarito, M. J. Legal informatics (Cambridge University Press, 2021)
work page 2021
-
[46]
Ashley, K. D. Artificial intelligence and legal analytics: new tools for law practice in the digital age(Cambridge University Press, 2017)
work page 2017
-
[47]
Bartolo, M., Tylinski, K. & Moore, A. Pre-trained contextual embeddings for litigation code classification. In LegalAIIA@ ICAIL, 38–45 (2019)
work page 2019
-
[48]
Constantinou, V . & Kabiri, M. Detecting anomalous invoice line items in the legal case lifecycle. arXiv preprint arXiv:2012.14511 (2020). 12/13
-
[49]
Rossi, J. & Kanoulas, E. Query generation for patent retrieval with keyword extraction based on syntactic features. In JURIX, 210–214 (2018)
work page 2018
-
[50]
Turing, A. M. Computing machinery and intelligence. Mind 59, 433–460 (1950)
work page 1950
-
[51]
Syntactic structures (De Gruyter Mouton, 1957)
Chomsky, N. Syntactic structures (De Gruyter Mouton, 1957)
work page 1957
-
[52]
Schank, R. C., Goldman, N. M., Rieger III, C. J. & Riesbeck, C. Margie: Memory analysis response generation, and inference on english. In IJCAI, 255–261 (1973)
work page 1973
-
[53]
Lehnert, W. G. A conceptual theory of question answering. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1, 158–164 (1977)
work page 1977
-
[54]
The future of computing beyond moore’s law
Shalf, J. The future of computing beyond moore’s law. Philos. Transactions Royal Soc. A 378, 20190061 (2020)
work page 2020
-
[55]
Gupta, P. et al. An economic perspective of disk vs. flash media in archival storage. In 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, 249–254 (IEEE, 2014)
work page 2014
-
[56]
Zhou, M., Duan, N., Liu, S. & Shum, H.-Y . Progress in neural nlp: modeling, learning, and reasoning. Engineering 6, 275–290 (2020)
work page 2020
-
[57]
Governatori, G. et al. Thirty years of artificial intelligence and law: the first decade.Artif. Intell. Law 30, 481–519 (2022)
work page 2022
-
[58]
Anderson, J. A. & Rosenfeld, E. Talking nets: An oral history of neural networks (MiT Press, 2000)
work page 2000
-
[59]
Munafò, M. R. et al. A manifesto for reproducible science. Nat. human behaviour 1, 1–9 (2017)
work page 2017
-
[60]
Ivie, P. & Thain, D. Reproducibility in scientific computing. ACM Comput. Surv. (CSUR) 51, 1–36 (2018)
work page 2018
-
[61]
Power laws in citation distributions: evidence from scopus
Brzezinski, M. Power laws in citation distributions: evidence from scopus. Scientometrics 103, 213–228 (2015)
work page 2015
-
[62]
Owlia, P., Vasei, M., Goliaei, B. & Nassiri, I. Normalized impact factor (nif): an adjusted method for calculating the citation rate of biomedical journals. J. biomedical informatics 44, 216–220 (2011)
work page 2011
- [63]
-
[64]
Beltagy, I., Lo, K. & Cohan, A. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)
-
[65]
Taylor, R. et al. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [66]
-
[67]
Wu, X. et al. A survey of human-in-the-loop for machine learning. Futur. Gener. Comput. Syst. (2022)
work page 2022
- [68]
-
[69]
Wang, Q. et al. Visual genealogy of deep neural networks. IEEE transactions on visualization computer graphics 26, 3340–3352 (2019)
work page 2019
-
[70]
Qian, K. et al. Xnlp: A living survey for xai research in natural language processing. In 26th International Conference on Intelligent User Interfaces-Companion, 78–80 (2021)
work page 2021
-
[71]
Coupette, C. & Hartung, D. Sharing and caring: Creating a culture of constructive criticism in computational legal studies. arXiv preprint arXiv:2205.01071 (2022). 13/13
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.