arxiv: 2604.09069 · v1 · submitted 2026-04-10 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System

Arnab Bhattacharya, Balaramamahanthi Deepak Patnaik, Debtanu Datta, Noel Shallum, Parjanya Aditya Shukla, Pradeep Reddy Vanga, Saptarshi Ghosh, Shubham Kumar Nigam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords Court Judgment PredictionLegal ReasoningIndian Legal SystemRAG PipelineStructured ExplanationsFine-tuned LLMsEvidence AlignmentTransparent AI

0 comments

The pith

NyayaMind combines retrieval of statutes and precedents with fine-tuned language models to output structured court judgments including issues, arguments, rationale, and decision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NyayaMind as an open-source framework that splits legal judgment prediction into a retrieval step and a reasoning step to produce transparent, structured outputs for Indian court cases. It uses a retrieval-augmented generation pipeline to surface relevant statutes and precedents, then applies domain-tuned models to generate the sequence of issues, arguments, rationale, and final decision that courts typically follow. The central goal is to improve both predictive accuracy and the quality of explanations so that the system's reasoning aligns more closely with established judicial practices than earlier CJPE systems. If the approach works, it would make AI tools more usable for legal research and decision support because the outputs could be checked against real case documents.

Core claim

NyayaMind integrates a Retrieval Module that employs a RAG pipeline to identify legally relevant statutes and precedent cases from large-scale legal corpora with a Prediction Module that utilizes reasoning-oriented LLMs fine-tuned for the Indian legal domain to generate structured outputs including issues, arguments, rationale, and the final decision, yielding improved explanation quality and evidence alignment compared to existing CJPE approaches.

What carries the argument

The two-component architecture of a RAG-based Retrieval Module that surfaces statutes and precedents paired with a Prediction Module of fine-tuned LLMs that emits the four-part judicial structure of issues, arguments, rationale, and decision.

If this is right

Legal AI outputs become checkable against the same statutes and precedents that human judges cite.
The separation of retrieval and reasoning allows independent verification of evidence sources.
Structured outputs enable automated comparison against real court judgments for alignment metrics.
The framework scales to large Indian legal corpora without requiring hand-crafted rules for each case type.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same split-retrieval-and-reason design could be tested on legal systems outside India that also publish structured judgments.
If retrieval errors remain low, the framework might reduce hallucinations in legal AI by grounding every generated sentence in returned documents.
Future work could measure whether the generated rationales actually predict the final decision more reliably than black-box models.

Load-bearing premise

Fine-tuned language models will reliably produce the exact four-part structure of issues, arguments, rationale, and decision used in Indian courts while the retrieval step will return only accurate and complete legal references without omissions or errors.

What would settle it

Expert legal reviewers finding that a substantial portion of NyayaMind outputs either omit required judicial elements, cite non-existent or irrelevant statutes, or produce rationales that contradict the actual reasoning in the retrieved precedents.

Figures

Figures reproduced from arXiv: 2604.09069 by Arnab Bhattacharya, Balaramamahanthi Deepak Patnaik, Debtanu Datta, Noel Shallum, Parjanya Aditya Shukla, Pradeep Reddy Vanga, Saptarshi Ghosh, Shubham Kumar Nigam.

**Figure 1.** Figure 1: Comparison of our NyayaMind System with existing prior approaches. over, existing approaches rarely model the full deliberative structure of judicial reasoning, such as identifying legal issues, articulating petitioner and respondent arguments, evaluating statutory provisions, and deriving conclusions through logical justification. As a result, many current systems behave as black-box predictors rather th… view at source ↗

**Figure 2.** Figure 2: Representation of different stages in the flow of our [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Court Judgment Prediction and Explanation (CJPE) aims to predict a judicial decision and provide a legally grounded explanation for a given case based on the facts, legal issues, arguments, cited statutes, and relevant precedents. For such systems to be practically useful in judicial or legal research settings, they must not only achieve high predictive performance but also generate transparent and structured legal reasoning that aligns with established judicial practices. In this work, we present NyayaMind, an open-source framework designed to enable transparent and scalable legal reasoning for the Indian judiciary. The proposed framework integrates retrieval, reasoning, and verification mechanisms to emulate the structured decision-making process typically followed in courts. Specifically, NyayaMind consists of two main components: a Retrieval Module and a Prediction Module. The Retrieval Module employs a RAG pipeline to identify legally relevant statutes and precedent cases from large-scale legal corpora, while the Prediction Module utilizes reasoning-oriented LLMs fine-tuned for the Indian legal domain to generate structured outputs including issues, arguments, rationale, and the final decision. Our extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment compared to existing CJPE approaches, providing a promising step toward trustworthy AI-assisted legal decision support systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NyayaMind is a domain-specific RAG and LLM framework for structured Indian legal reasoning, but its claims of significant improvements lack any reported metrics or evaluation details.

read the letter

NyayaMind is a framework that combines retrieval-augmented generation with fine-tuned large language models to handle court judgment prediction and explanation for the Indian legal system. It breaks the output into issues, arguments, rationale, and decision to make the reasoning more transparent. The new part is the specific tailoring to Indian CJPE tasks with this two-module setup and the open-source release. It takes established techniques and applies them to a large legal corpus of statutes and precedents in that jurisdiction. That is a legitimate step for making legal AI more usable in practice there. The paper does a decent job describing how the retrieval module pulls relevant documents and how the prediction module generates the structured response. The emphasis on aligning with judicial practices is the right direction for this kind of work. The main weakness is in the results section. The abstract talks about extensive results and expert evaluation showing significant improvements in explanation quality and evidence alignment, but it provides zero concrete numbers, no baselines, no dataset information, and no details on the expert process. There is no inter-rater reliability score or rubric for how alignment was measured. This makes it impossible to assess whether the improvements are meaningful or just stated. The stress-test point about missing validation details holds up based on what is shown. There are no new algorithms or formal proofs here, which is expected for an applied systems paper. The assumptions about the RAG pipeline being reliable and the LLM producing legally sound structures are standard but untested in the provided description. This kind of paper is aimed at researchers in legal technology and NLP for low-resource or domain-specific legal systems. A reader interested in building tools for Indian courts or similar setups could find the architecture description helpful as a template. It deserves a serious referee because the topic is timely and the framework is described in enough detail to evaluate. Reviewers could ask for the missing quantitative evidence and perhaps some case studies or error analysis. I would recommend sending it to peer review, but only after the authors add the actual evaluation metrics and protocols. Without those, it is hard to take the central claims at face value.

Referee Report

2 major / 1 minor

Summary. The paper introduces NyayaMind, an open-source framework for Court Judgment Prediction and Explanation (CJPE) in the Indian legal system. It consists of a Retrieval Module employing a RAG pipeline to surface relevant statutes and precedents, and a Prediction Module that uses reasoning-oriented LLMs fine-tuned on the Indian legal domain to produce structured outputs (issues, arguments, rationale, and final decision). The authors claim that extensive experimental results and expert evaluation demonstrate significant improvements in explanation quality and evidence alignment over existing CJPE approaches.

Significance. If the empirical claims hold with proper validation, the work could advance transparent AI-assisted legal tools by aligning generated reasoning with established judicial structures in a large jurisdiction. The open-source release and emphasis on RAG plus domain-specific fine-tuning offer potential for reproducibility and extension, though the current lack of supporting metrics limits immediate assessment of practical utility.

major comments (2)

[Abstract] Abstract: the assertion that 'extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment' is presented without any quantitative metrics, baselines, dataset statistics, or error analysis, which is load-bearing for the central claim of improvement over prior CJPE methods.
[Expert Evaluation] Expert evaluation section: no details are supplied on the number of experts, their legal qualifications, inter-rater reliability (e.g., Cohen's kappa), or rubric-based scoring for alignment of generated issues/arguments/rationale with judicial practice and for precision of RAG-retrieved statutes/precedents, preventing verification that outputs match established practices without omissions or errors.

minor comments (1)

[Title] The title uses an unspaced hyphen ('NyayaMind- A Framework'); standard academic formatting would use a colon or spaced em-dash.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below. We agree that greater transparency is needed to substantiate the claims and will revise the manuscript to address these points.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment' is presented without any quantitative metrics, baselines, dataset statistics, or error analysis, which is load-bearing for the central claim of improvement over prior CJPE methods.

Authors: We agree that the abstract would be strengthened by including supporting quantitative information. The full manuscript already contains these details in the Experiments and Results sections (including metrics on explanation quality, evidence alignment, baselines, dataset statistics, and error analysis). We will revise the abstract to concisely incorporate key metrics and a brief reference to the evaluation setup, ensuring the central claim is properly grounded while respecting length limits. revision: yes
Referee: [Expert Evaluation] Expert evaluation section: no details are supplied on the number of experts, their legal qualifications, inter-rater reliability (e.g., Cohen's kappa), or rubric-based scoring for alignment of generated issues/arguments/rationale with judicial practice and for precision of RAG-retrieved statutes/precedents, preventing verification that outputs match established practices without omissions or errors.

Authors: We acknowledge this gap in the presentation of our evaluation methodology. We will expand the Expert Evaluation section to specify the number of experts, their legal qualifications and experience with Indian case law, inter-rater reliability measures (including Cohen's kappa), and the rubric used to score alignment of generated issues/arguments/rationale with judicial practice as well as precision of retrieved statutes and precedents. This will enable independent verification of the process. revision: yes

Circularity Check

0 steps flagged

No circularity: framework uses standard components with external evaluation

full rationale

The paper presents NyayaMind as a modular system combining RAG-based retrieval and fine-tuned LLMs for structured legal output generation. No equations, derivations, or first-principles predictions appear that reduce to quantities defined by the model's own fitted outputs or self-referential definitions. Claims of improved explanation quality rest on external expert evaluation and comparisons to prior CJPE methods rather than any internal self-definition or fitted-input renaming. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premises that LLMs can be fine-tuned to emulate judicial reasoning patterns and that retrieval will surface complete and accurate legal context. No free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1171 out tokens · 66745 ms · 2026-05-10T17:04:59.242625+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 7 canonical work pages · 2 internal anchors

[1]

InProceedings of the 2025 Conference on 9 Empirical Methods in Natural Language Processing, pages 18128–18142

Unilaw-r1: A large language model for legal rea- soning with reinforcement learning and iterative in- ference. InProceedings of the 2025 Conference on 9 Empirical Methods in Natural Language Processing, pages 18128–18142. Chenlong Deng, Kelong Mao, Yuyao Zhang, and Zhicheng Dou

2025
[2]

InFind- ings of the Association for Computational Linguistics: EMNLP 2024, pages 784–796, Miami, Florida, USA

Enabling discriminative reason- ing in LLMs for legal judgment prediction. InFind- ings of the Association for Computational Linguistics: EMNLP 2024, pages 784–796, Miami, Florida, USA. Association for Computational Linguistics. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer

2024
[3]

QLoRA: Efficient Finetuning of Quantized LLMs

Qlora: Efficient finetuning of quantized llms.Preprint, arXiv:2305.14314. Elena Esposito

work page internal anchor Pith review arXiv
[4]

arXiv preprint arXiv:2505.12864 , year=

Lexam: Benchmarking legal reasoning on 340 law exams.arXiv preprint arXiv:2505.12864. Leilei Gan, Kun Kuang, Yi Yang, and Fei Wu

work page arXiv
[5]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models.Preprint, arXiv:2106.09685. Yunyun Huang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, and Bin Luo

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Chin-Yew Lin

Dependency learning for legal judgment prediction with a unified text-to-text trans- former.arXiv preprint arXiv:2112.06370. Chin-Yew Lin

work page arXiv
[7]

InFindings of the As- sociation for Computational Linguistics: ACL 2023, pages 4883–4898, Toronto, Canada

Prototype-based interpretability for legal citation prediction. InFindings of the As- sociation for Computational Linguistics: ACL 2023, pages 4883–4898, Toronto, Canada. Association for Computational Linguistics. Gintare Makauskaite-Samuole

2023
[8]

Shubham Kumar Nigam, Deepak Patnaik Balaramama- hanthi, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, and Arnab Bhattacharya. 2025a. NyayaAnu- mana and INLegalLlama: The largest Indian legal judgment prediction dataset and specialized language model for enhanced decision analysis. InProceed- ings of the 31st International Conference on Compu- tational L...

2024
[9]

Xuran Wang, Xinguang Zhang, Vanessa Hoo, Zhouhang Shao, and Xuguang Zhang

Fill in the BLANC: human-free quality estimation of document summaries.CoRR, abs/2002.09836. Xuran Wang, Xinguang Zhang, Vanessa Hoo, Zhouhang Shao, and Xuguang Zhang

work page arXiv 2002
[10]

InProceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing, pages 12060–12075, Singapore

Precedent-enhanced legal judgment prediction with LLM and domain-model collaboration. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing, pages 12060–12075, Singapore. Association for Computational Linguistics. Xinyu Yang, Chenlong Deng, Tongyu Wen, Binyu Xie, and Zhicheng Dou

2023
[11]

Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xiaozhong Liu

Lawthinker: A deep re- search legal agent in dynamic environments.arXiv preprint arXiv:2602.12056. Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xiaozhong Liu

work page arXiv
[12]

Elevating legal LLM responses: Harnessing trainable logical structures and semantic knowledge with legal reason- ing. InProceedings of the 2025 Conference of the 10 Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5630– 5642, Albuquerque, New Mexico. Association fo...

2025
[13]

In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,

Bertscore: Evalu- ating text generation with BERT. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,

2020
[14]

A Implementation Details This section provides implementation details of the retrieval and training components used in NyayaMind

Lras: Ad- vanced legal reasoning with agentic search.arXiv preprint arXiv:2601.07296. A Implementation Details This section provides implementation details of the retrieval and training components used in NyayaMind. We describe three retrieval archi- tectures, Endee, Milvus, and Vespa, along with their indexing strategies, hybrid retrieval mecha- nisms, a...

work page arXiv
[15]

A.2 Hybrid Retrieval with Milvus We implemented Milvus as a self-hosted vector database to store and query dense vector represen- tations

11 Parameter Value System GPUs NVIDIA H200 LoRA Parameters Rank 16 Alpha 16 Dropout 0 Target Modules q, k, v, o, gate, up, down projec- tions Training Parameters Epochs 8 Batch Size 1 Max Sequence Length 16,384 Learning Rate1×10 −4 Scheduler Cosine annealing Warmup Ratio 0.1 Weight Decay 0.01 Early Stopping Patience 3 Optimizer AdamW (8-bit) Inference Par...

2048
[16]

<|assistant|>

BM25 Retrieval Not used Applied on top-500 vector results Native parallel full-corpus BM25 Vector Configuration Embedding Modelsnowflake-arctic-embed-m-v2.0(768-d) Distance Metric Cosine L2 L2 Index Type HNSW (M=16, ef_con=128) IVF_FLAT (nlist=2048) or FLAT HNSW (M=16, ef_ins=200) Precision FP16 FP32 FP32 (Paged attributes) Chunk Size 4096 4096 8192 Table...

2048