pith. machine review for the scientific record. sign in

arxiv: 2605.12012 · v1 · submitted 2026-05-12 · 💻 cs.AI

Recognition: no theorem link

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:58 UTC · model grok-4.3

classification 💻 cs.AI
keywords LegalCheckretrieval-augmented generationlegal draftingmunicipal lawobjection lettersexpert-in-the-loopAI in public sectorcontext-augmented generation
0
0 comments X

The pith

A retrieval- and context-augmented system generates near-final municipal legal advice letters in minutes rather than hours.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LegalCheck, a system that automates the drafting of objection response letters for public-sector legal departments facing staff shortages and high volumes. It combines retrieval of relevant laws and precedents from curated bases with controlled prompting to add case-specific details, followed by expert review to ensure soundness. In a real deployment at the Municipality of Amsterdam, the system produced drafts that captured 80 to 100 percent of essential legal reasoning while maintaining consistency and factual accuracy. Legal professionals reported reduced workload and consistent application of standards without replacing their judgment. A sympathetic reader would care because the approach shows how domain knowledge and human oversight can make AI practical for regulated, high-stakes tasks like municipal law.

Core claim

LegalCheck automates the drafting of objection response letters by retrieving relevant laws and precedents, using controlled prompting to incorporate external knowledge and case-specific details into a coherent draft, and relying on expert-in-the-loop review to confirm legal soundness. In the Amsterdam deployment, it produced near-final advice letters in minutes rather than hours, with outputs grounded in actual regulations and prior cases that included the vast majority of required legal reasoning.

What carries the argument

The LegalCheck pipeline, which pairs Retrieval-Augmented Generation to pull laws and precedents with Context-Augmented Generation to tailor content to each case, all wrapped in an expert-in-the-loop review.

If this is right

  • Substantial reduction in drafting time for objection letters.
  • Improved consistency in how legal standards are applied across cases.
  • Positive acceptance by legal professionals who retain final judgment.
  • Explainable outputs that cite the underlying regulations and precedents.
  • Demonstration that responsible AI deployment is possible in the legal domain through augmentation rather than full automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval-plus-context approach could extend to other administrative law tasks such as permit reviews or compliance checks.
  • Curated knowledge bases would need regular updates to track changes in regulations or new case law.
  • If expert review becomes a bottleneck at higher volumes, the system might require additional automated checks for common error patterns.
  • Deployment in other municipalities would depend on creating equivalent curated legal databases tailored to local rules.

Load-bearing premise

Expert-in-the-loop review combined with retrieval from curated legal knowledge bases is enough to prevent legally significant errors or omissions in the generated drafts.

What would settle it

A generated letter that omits or misstates a key legal requirement or precedent, is approved by the expert reviewer, and later leads to an incorrect decision that is successfully challenged on appeal.

Figures

Figures reproduced from arXiv: 2605.12012 by Julien Rossi, Virgill van der Meer.

Figure 1
Figure 1. Figure 1: Overview of the LegalCheck pipeline, combining RAG with multi-stage CAG. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Screenshot of the LegalCheck user interface during [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Public-sector legal departments in the Netherlands face acute staff shortages, increased case volumes, and increased pressure to meet regulatory compliance. This paper presents LegalCheck, a novel system that addresses these challenges by automating the drafting of objection response letters through a combination of Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG). Using a large language model (LLM) alongside curated legal knowledge bases, LegalCheck performs retrieval of relevant laws and precedents, and uses controlled prompting to incorporate both external knowledge and case-specific details into a coherent draft. An expert-in-the-loop review ensures that each generated letter is legally sound and contextually appropriate. In a real-world deployment within the Municipality of Amsterdam, LegalCheck produced near-final advice letters in minutes rather than hours, while maintaining high legal consistency and factual accuracy. The output is based on actual regulations and prior cases, providing explainable outputs that captured the vast majority of required legal reasoning (often 80\% to 100\% of essential content). Legal professionals found that the system reduced their workload and ensured a consistent application of legal standards, without replacing human judgment. These results demonstrate substantial efficiency gains, improved legal consistency, and positive user acceptance. More broadly, this work illustrates how responsible AI can be deployed in the legal domain by augmenting LLMs with domain knowledge and governance mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents LegalCheck, a system that combines Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG) with LLMs and curated legal knowledge bases to draft municipal objection response letters. It incorporates controlled prompting for external knowledge and case details, followed by expert-in-the-loop review. The central claims concern a real-world deployment in the Municipality of Amsterdam, where the system produced near-final letters in minutes rather than hours, captured 80-100% of essential legal reasoning, maintained high consistency and accuracy, reduced workload, and ensured consistent legal standards without replacing human judgment.

Significance. If the deployment outcomes hold under rigorous scrutiny, the work would demonstrate a practical, governed application of LLMs in a high-stakes public-sector legal setting, addressing staff shortages through knowledge-augmented generation and human oversight. The emphasis on explainable, regulation-grounded outputs and retention of expert review offers a template for responsible AI deployment; however, the current lack of supporting data limits its immediate contribution to the literature on AI-assisted legal drafting.

major comments (2)
  1. [Abstract] Abstract: The performance claims (near-final letters in minutes vs. hours; 80-100% capture of essential legal reasoning; high consistency and factual accuracy) are asserted without any reported sample size, quantitative metrics, baselines, error analysis, definition of 'essential content,' or inter-rater reliability measures. This absence directly undermines verification of the central efficiency and reliability assertions.
  2. [Deployment description] Deployment description: The expert-in-the-loop review is described as ensuring legal soundness, yet no protocol details, measurement of residual errors/omissions, comparison against unaided drafting, or assessment of whether retrieval/prompting failures are systematically caught are supplied. This leaves the sufficiency of the human safeguard untested and the workload-reduction claim unsupported.
minor comments (1)
  1. [Abstract] Abstract: The acronym 'CAG' is introduced without an explicit definition or citation distinguishing it from standard RAG techniques.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript describing the LegalCheck system. We appreciate the emphasis on the need for greater transparency in reporting the deployment outcomes. Below, we provide point-by-point responses to the major comments, outlining how we will revise the paper to address the concerns while maintaining the integrity of our reported experiences.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The performance claims (near-final letters in minutes vs. hours; 80-100% capture of essential legal reasoning; high consistency and factual accuracy) are asserted without any reported sample size, quantitative metrics, baselines, error analysis, definition of 'essential content,' or inter-rater reliability measures. This absence directly undermines verification of the central efficiency and reliability assertions.

    Authors: The referee correctly identifies that the abstract makes strong claims without accompanying quantitative details. This is because the paper reports on a practical deployment rather than a laboratory-style evaluation with predefined metrics. The 80-100% figure and time savings were observed by the legal team during use, but no formal counting of sample size or error analysis was performed. We will revise the abstract to qualify these claims as 'observed in deployment' and add a new subsection in the paper detailing the evaluation approach used by the experts, including how 'essential content' was defined (as the key legal arguments and references required for a complete response). We will also explicitly state the lack of baselines and inter-rater measures as a limitation of the current study. revision: yes

  2. Referee: [Deployment description] Deployment description: The expert-in-the-loop review is described as ensuring legal soundness, yet no protocol details, measurement of residual errors/omissions, comparison against unaided drafting, or assessment of whether retrieval/prompting failures are systematically caught are supplied. This leaves the sufficiency of the human safeguard untested and the workload-reduction claim unsupported.

    Authors: We will provide more protocol details in the revised manuscript, describing the steps the experts followed in reviewing the drafts, such as verifying citations to laws and precedents, checking for logical consistency, and ensuring the response addresses the objection points. However, we did not measure residual errors after review or perform a comparison to unaided drafting, as the deployment was not designed as an A/B test. The workload reduction is supported by anecdotal reports from the users who noted faster turnaround. We will add this information and acknowledge that the human safeguard's effectiveness is assumed based on the experts' expertise rather than quantified. This is a genuine limitation we will highlight. revision: partial

Circularity Check

0 steps flagged

No circularity: system description with no derivations or fitted parameters

full rationale

The paper is a descriptive account of a RAG/CAG system for drafting legal letters, augmented by expert review, with a high-level report of an Amsterdam deployment. No equations, parameters, or predictive derivations exist that could reduce to their own inputs by construction. The central claims rest on qualitative deployment outcomes rather than any self-referential fitting, uniqueness theorem, or ansatz smuggling. Absence of mathematical structure means none of the enumerated circularity patterns apply; the evaluation limitations noted in the skeptic attack concern evidence strength, not internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no equations, fitted parameters, or new postulated entities; the system relies on pre-existing LLMs, curated legal databases, and standard prompting techniques.

pith-pipeline@v0.9.0 · 5538 in / 1154 out tokens · 53142 ms · 2026-05-13T04:58:09.276141+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    AI4Citizens. 2025. Ethical leaflet: Get transparency about the moral implications of technology used. Interreg Europe – Good prac- tices. https://www.interregeurope.eu/good-practices/ethical-leaflet-get- transparency-about-moral-implications-of-technology-used

  2. [2]

    Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.PeerJ Computer Science2 (2016), e93. doi:10.7717/peerj-cs.93

  3. [3]

    Automation Bias

    Saar Alon-Barkat and Madalina Busuioc. 2023. Human–AI Interactions in Public Sector Decision Making: “Automation Bias” and “Selective Adherence” to Algo- rithmic Advice.Journal of Public Administration Research and Theory33, 1 (Jan. 2023), 153–169. doi:10.1093/jopart/muac007

  4. [4]

    In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

    Saleema Amershi, Daniel Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, and Jaime Teevan. 2019. Guidelines for Human–AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM. doi:10.1145/3290605.3300233 Article 3, pp. 1–13

  5. [5]

    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2020. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.Information Fus...

  6. [6]

    Kevin D. Ashley. 2017.Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press

  7. [7]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani et al. 2021.On the Opportunities and Risks of Foundation Models. Technical Report. Stanford Institute for Human-Centered Artificial Intelligence. https://arxiv.org/abs/2108.07258 arXiv:2108.07258

  8. [8]

    Chien and M Kim

    Colleen V. Chien and M Kim. 2024. Generative AI and Legal Aid: Results from a Field Study and 100 Use Cases to Bridge the Access to Justice Gap. SSRN Working Paper (UC Berkeley Public Law Research Paper; forthcoming in Loyola of Los Angeles Law Review). https://ssrn.com/abstract=4733061

  9. [9]

    Davenport and Rajeev Ronanki

    Thomas H. Davenport and Rajeev Ronanki. 2018. Artificial Intelligence for the Real World.Harvard Business Review96, 1 (2018), 108–116

  10. [10]

    European Commission: Directorate-General for Communications Networks, Con- tent and Technology and High-Level Expert Group on Artificial Intelligence. 2019. Ethics Guidelines for Trustworthy AI. doi:10.2759/346720

  11. [11]

    European Union. 2016. Regulation (EU) 2016/679 (General Data Protection Regulation). Official Journal of the European Union, L 119, 1–88. https://eur- lex.europa.eu/eli/reg/2016/679/oj

  12. [12]

    Samer Faraj, Stella Pachidi, and Karim Sayegh. 2018. Working and organizing in the age of the learning algorithm.Information and Organization28, 1 (2018), 62–70. doi:10.1016/j.infoandorg.2018.02.005

  13. [13]

    Gemeente Amsterdam. 2024. Amsterdam’s vision on AI (English version). https: //www.amsterdam.nl/innovatie/amsterdamse-visie-ai/

  14. [14]

    Haesevoets, B

    T. Haesevoets, B. Verschuere, and A. Roets. 2025. AI adoption in public ad- ministration: Perspectives of public sector managers and public sector non- managerial employees.Government Information Quarterly42, 2 (2025), 102029. doi:10.1016/j.giq.2025.102029

  15. [15]

    Poletav, and S

    Yousra Hashem, Jonathan Bright, Shreya Chakraborty, Kait Onslow, James Fran- cis, A. Poletav, and S. Esnaashari. 2025. Mapping the Potential: Generative AI and Public Sector Work. Using time use data to identify opportunities for AI adoption in Great Britain’s public sector. https://www.turing.ac.uk/sites/default/files/2025- 05/ons_tus_final_report.pdf

  16. [16]

    Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM. doi:10.1145/3290605.3300830 pp. 1–16

  17. [17]

    Bommarito, and Josh Blackman

    Daniel Martin Katz, Michael J. Bommarito, and Josh Blackman. 2017. A General Approach for Predicting the Behavior of the Supreme Court of the United States. PLoS ONE12, 4 (2017), e0174698. doi:10.1371/journal.pone.0174698

  18. [18]

    John P. Kotter. 1996.Leading Change. Harvard Business School Press

  19. [19]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems 33. 9459–9474

  20. [20]

    Chih-Hao Lin and Pei-Ju Cheng. 2024. Legal documents drafting with fine- tuned pre-trained large language model. InProceedings of the 12th International Conference on Software Engineering & Trends (SE 2024). Copenhagen, Denmark. doi:10.48550/arXiv.2406.04202

  21. [21]

    Surani, M

    Varun Magesh, F. Surani, M. Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho. 2024. Hallucination-free? Assessing the reliability of leading AI legal research tools. arXiv preprint. doi:10.48550/arXiv.2405.20362

  22. [22]

    Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, and Tomer Sagi. 2025. Source Attribution in Retrieval- Augmented Generation.CoRRabs/2507.04480 (2025). https://arxiv.org/abs/2507. 04480 arXiv preprint

  23. [23]

    OpenAI. 2024. Hello GPT-4o. https://openai.com/nl-NL/index/hello-gpt-4o/

  24. [24]

    PwC. 2023. Half of Dutch jobs might be significantly changed by gen- erative AI. PwC Netherlands. https://www.pwc.nl/en/insights-and- publications/themes/the-future-of-work/half-of-dutch-jobs-might-be- significantly-changed-by-generative-ai.html

  25. [25]

    Jirui Qi, Gabriele Sarti, Raquel Fernández, and Arianna Bisazza. 2024. Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Gen- eration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Miami, Florida, USA. doi:10.18653/v1/2024.emnlp-main.347

  26. [26]

    Barry, David R

    Daniel Schwarcz, Sam Manning, Patrick J. Barry, David R. Cleveland, J. J. Prescott, and Beverly Rich. 2025.AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice. Technical Report. Min- nesota Legal Studies Research Paper No. 25-16 (SSRN). Available at SSRN: https://ssrn.com/abstract=5162111

  27. [27]

    SenGupta

    R. SenGupta. 2025. Legal sector’s adaptability proves the key to success. Financial Times. https://www.ft.com/content/6df512fe-c1a7-4ed1-be3d-d3dc5e5b2944

  28. [28]

    Peizhang Shao, Linrui Xu, Jinxi Wang, Wei Zhou, and Xingyu Wu. 2025. When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance. doi:10.48550/arXiv.2507.07748

  29. [29]

    Benedict Sheehy and Yee-Fui Ng. 2024. The Challenges of AI-Decision-Making in Government and Administrative Law: A Proposal for Regulatory Design.Indiana Law Review57, 3 (June 2024), 665–699. doi:10.18060/28360

  30. [30]

    Harry Surden. 2019. Artificial Intelligence and Law: An Overview.Georgia State University Law Review35, 4 (2019), 1305–1337

  31. [31]

    2019.Tomorrow’s Lawyers: An Introduction to Your Future(2nd ed.)

    Richard Susskind. 2019.Tomorrow’s Lawyers: An Introduction to Your Future(2nd ed.). Oxford University Press

  32. [32]

    Thomson Reuters. 2025. Less drudge, more expertise: How AI is redefining the future of legal professionals in Australia. The Guardian (Thomson Reuters AI Futures). https://www.theguardian.com/thomson-reuters-ai- futures/2025/jul/21/less-drudge-more-expertise-how-ai-is-redefining-the- future-of-legal-professionals-in-australia

  33. [33]

    European Union. 2024. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act).Official Journal of the European Union(2024)

  34. [34]

    Vereniging van Nederlandse Gemeenten (VNG). 2024. Pilot big data & AI- tools voor efficiëntere afhandeling bezwaarschriften. VNG website (Oct 29, 2024). URL: https://vng.nl/artikelen/pilot-big-data-ai-tools-voor-efficientere- afhandeling-bezwaarschriften

  35. [35]

    Vatamanu and M

    Andrei F. Vatamanu and M. Tofan. 2025. Integrating artificial intelligence into public administration: Challenges and vulnerabilities.Administrative Sciences15, 4 (2025), 149. doi:10.3390/admsci15040149

  36. [36]

    Morris, Gordon B

    Viswanath Venkatesh, Michael G. Morris, Gordon B. Davis, and Fred D. Davis

  37. [37]

    User Acceptance of Information Technology: Toward a Unified View.MIS Quarterly27, 3 (2003), 425–478

  38. [38]

    S. Weerts. 2025. Generative AI in public administration in light of the regulatory awakening in the US and EU.Cambridge Forum on AI: Law and Governance(2025), e3. doi:10.1017/cfl.2024.10

  39. [39]

    Wiratunga, R

    N. Wiratunga, R. Abeyratne, L. Jayawardena, K. Martin, S. Massie, I. Nkisi-Orji, R. Weerasinghe, A. Liret, and B. Fleisch. 2024. CBR-RAG: Case-Based reasoning for retrieval augmented generation in LLMs for legal question answering. arXiv preprint. doi:10.48550/arXiv.2404.04302

  40. [40]

    Wrzesniowska

    L. Wrzesniowska. 2023.Can AI make a case? AI vs. lawyer in the Dutch legal context. Master’s thesis. University of Amsterdam. Master’s thesis; later appearing in LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters ICAIL 2026, June 08–12, 2026, Singapore The International Journal of Law, Ethics, and Technology

  41. [41]

    Liming Zhu, Qinghua Lu, Ding Ming, Sung Une Lee, and Chen Wang. 2025. Designing Meaningful Human Oversight in AI. doi:10.2139/ssrn.5501939 SSRN working paper