Recognition: unknown
NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System
Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3
The pith
NyayaMind combines retrieval of statutes and precedents with fine-tuned language models to output structured court judgments including issues, arguments, rationale, and decision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NyayaMind integrates a Retrieval Module that employs a RAG pipeline to identify legally relevant statutes and precedent cases from large-scale legal corpora with a Prediction Module that utilizes reasoning-oriented LLMs fine-tuned for the Indian legal domain to generate structured outputs including issues, arguments, rationale, and the final decision, yielding improved explanation quality and evidence alignment compared to existing CJPE approaches.
What carries the argument
The two-component architecture of a RAG-based Retrieval Module that surfaces statutes and precedents paired with a Prediction Module of fine-tuned LLMs that emits the four-part judicial structure of issues, arguments, rationale, and decision.
If this is right
- Legal AI outputs become checkable against the same statutes and precedents that human judges cite.
- The separation of retrieval and reasoning allows independent verification of evidence sources.
- Structured outputs enable automated comparison against real court judgments for alignment metrics.
- The framework scales to large Indian legal corpora without requiring hand-crafted rules for each case type.
Where Pith is reading between the lines
- The same split-retrieval-and-reason design could be tested on legal systems outside India that also publish structured judgments.
- If retrieval errors remain low, the framework might reduce hallucinations in legal AI by grounding every generated sentence in returned documents.
- Future work could measure whether the generated rationales actually predict the final decision more reliably than black-box models.
Load-bearing premise
Fine-tuned language models will reliably produce the exact four-part structure of issues, arguments, rationale, and decision used in Indian courts while the retrieval step will return only accurate and complete legal references without omissions or errors.
What would settle it
Expert legal reviewers finding that a substantial portion of NyayaMind outputs either omit required judicial elements, cite non-existent or irrelevant statutes, or produce rationales that contradict the actual reasoning in the retrieved precedents.
Figures
read the original abstract
Court Judgment Prediction and Explanation (CJPE) aims to predict a judicial decision and provide a legally grounded explanation for a given case based on the facts, legal issues, arguments, cited statutes, and relevant precedents. For such systems to be practically useful in judicial or legal research settings, they must not only achieve high predictive performance but also generate transparent and structured legal reasoning that aligns with established judicial practices. In this work, we present NyayaMind, an open-source framework designed to enable transparent and scalable legal reasoning for the Indian judiciary. The proposed framework integrates retrieval, reasoning, and verification mechanisms to emulate the structured decision-making process typically followed in courts. Specifically, NyayaMind consists of two main components: a Retrieval Module and a Prediction Module. The Retrieval Module employs a RAG pipeline to identify legally relevant statutes and precedent cases from large-scale legal corpora, while the Prediction Module utilizes reasoning-oriented LLMs fine-tuned for the Indian legal domain to generate structured outputs including issues, arguments, rationale, and the final decision. Our extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment compared to existing CJPE approaches, providing a promising step toward trustworthy AI-assisted legal decision support systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NyayaMind, an open-source framework for Court Judgment Prediction and Explanation (CJPE) in the Indian legal system. It consists of a Retrieval Module employing a RAG pipeline to surface relevant statutes and precedents, and a Prediction Module that uses reasoning-oriented LLMs fine-tuned on the Indian legal domain to produce structured outputs (issues, arguments, rationale, and final decision). The authors claim that extensive experimental results and expert evaluation demonstrate significant improvements in explanation quality and evidence alignment over existing CJPE approaches.
Significance. If the empirical claims hold with proper validation, the work could advance transparent AI-assisted legal tools by aligning generated reasoning with established judicial structures in a large jurisdiction. The open-source release and emphasis on RAG plus domain-specific fine-tuning offer potential for reproducibility and extension, though the current lack of supporting metrics limits immediate assessment of practical utility.
major comments (2)
- [Abstract] Abstract: the assertion that 'extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment' is presented without any quantitative metrics, baselines, dataset statistics, or error analysis, which is load-bearing for the central claim of improvement over prior CJPE methods.
- [Expert Evaluation] Expert evaluation section: no details are supplied on the number of experts, their legal qualifications, inter-rater reliability (e.g., Cohen's kappa), or rubric-based scoring for alignment of generated issues/arguments/rationale with judicial practice and for precision of RAG-retrieved statutes/precedents, preventing verification that outputs match established practices without omissions or errors.
minor comments (1)
- [Title] The title uses an unspaced hyphen ('NyayaMind- A Framework'); standard academic formatting would use a colon or spaced em-dash.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below. We agree that greater transparency is needed to substantiate the claims and will revise the manuscript to address these points.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that 'extensive results and expert evaluation demonstrate that NyayaMind significantly improves the quality of explanation and evidence alignment' is presented without any quantitative metrics, baselines, dataset statistics, or error analysis, which is load-bearing for the central claim of improvement over prior CJPE methods.
Authors: We agree that the abstract would be strengthened by including supporting quantitative information. The full manuscript already contains these details in the Experiments and Results sections (including metrics on explanation quality, evidence alignment, baselines, dataset statistics, and error analysis). We will revise the abstract to concisely incorporate key metrics and a brief reference to the evaluation setup, ensuring the central claim is properly grounded while respecting length limits. revision: yes
-
Referee: [Expert Evaluation] Expert evaluation section: no details are supplied on the number of experts, their legal qualifications, inter-rater reliability (e.g., Cohen's kappa), or rubric-based scoring for alignment of generated issues/arguments/rationale with judicial practice and for precision of RAG-retrieved statutes/precedents, preventing verification that outputs match established practices without omissions or errors.
Authors: We acknowledge this gap in the presentation of our evaluation methodology. We will expand the Expert Evaluation section to specify the number of experts, their legal qualifications and experience with Indian case law, inter-rater reliability measures (including Cohen's kappa), and the rubric used to score alignment of generated issues/arguments/rationale with judicial practice as well as precision of retrieved statutes and precedents. This will enable independent verification of the process. revision: yes
Circularity Check
No circularity: framework uses standard components with external evaluation
full rationale
The paper presents NyayaMind as a modular system combining RAG-based retrieval and fine-tuned LLMs for structured legal output generation. No equations, derivations, or first-principles predictions appear that reduce to quantities defined by the model's own fitted outputs or self-referential definitions. Claims of improved explanation quality rest on external expert evaluation and comparisons to prior CJPE methods rather than any internal self-definition or fitted-input renaming. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
InProceedings of the 2025 Conference on 9 Empirical Methods in Natural Language Processing, pages 18128–18142
Unilaw-r1: A large language model for legal rea- soning with reinforcement learning and iterative in- ference. InProceedings of the 2025 Conference on 9 Empirical Methods in Natural Language Processing, pages 18128–18142. Chenlong Deng, Kelong Mao, Yuyao Zhang, and Zhicheng Dou
2025
-
[2]
InFind- ings of the Association for Computational Linguistics: EMNLP 2024, pages 784–796, Miami, Florida, USA
Enabling discriminative reason- ing in LLMs for legal judgment prediction. InFind- ings of the Association for Computational Linguistics: EMNLP 2024, pages 784–796, Miami, Florida, USA. Association for Computational Linguistics. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer
2024
-
[3]
QLoRA: Efficient Finetuning of Quantized LLMs
Qlora: Efficient finetuning of quantized llms.Preprint, arXiv:2305.14314. Elena Esposito
work page internal anchor Pith review arXiv
-
[4]
arXiv preprint arXiv:2505.12864 , year=
Lexam: Benchmarking legal reasoning on 340 law exams.arXiv preprint arXiv:2505.12864. Leilei Gan, Kun Kuang, Yi Yang, and Fei Wu
-
[5]
LoRA: Low-Rank Adaptation of Large Language Models
Lora: Low-rank adaptation of large language models.Preprint, arXiv:2106.09685. Yunyun Huang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, and Bin Luo
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Dependency learning for legal judgment prediction with a unified text-to-text trans- former.arXiv preprint arXiv:2112.06370. Chin-Yew Lin
-
[7]
InFindings of the As- sociation for Computational Linguistics: ACL 2023, pages 4883–4898, Toronto, Canada
Prototype-based interpretability for legal citation prediction. InFindings of the As- sociation for Computational Linguistics: ACL 2023, pages 4883–4898, Toronto, Canada. Association for Computational Linguistics. Gintare Makauskaite-Samuole
2023
-
[8]
Shubham Kumar Nigam, Deepak Patnaik Balaramama- hanthi, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, and Arnab Bhattacharya. 2025a. NyayaAnu- mana and INLegalLlama: The largest Indian legal judgment prediction dataset and specialized language model for enhanced decision analysis. InProceed- ings of the 31st International Conference on Compu- tational L...
2024
-
[9]
Xuran Wang, Xinguang Zhang, Vanessa Hoo, Zhouhang Shao, and Xuguang Zhang
Fill in the BLANC: human-free quality estimation of document summaries.CoRR, abs/2002.09836. Xuran Wang, Xinguang Zhang, Vanessa Hoo, Zhouhang Shao, and Xuguang Zhang
-
[10]
InProceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing, pages 12060–12075, Singapore
Precedent-enhanced legal judgment prediction with LLM and domain-model collaboration. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing, pages 12060–12075, Singapore. Association for Computational Linguistics. Xinyu Yang, Chenlong Deng, Tongyu Wen, Binyu Xie, and Zhicheng Dou
2023
-
[11]
Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xiaozhong Liu
Lawthinker: A deep re- search legal agent in dynamic environments.arXiv preprint arXiv:2602.12056. Rujing Yao, Yang Wu, Chenghao Wang, Jingwei Xiong, Fang Wang, and Xiaozhong Liu
-
[12]
Elevating legal LLM responses: Harnessing trainable logical structures and semantic knowledge with legal reason- ing. InProceedings of the 2025 Conference of the 10 Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5630– 5642, Albuquerque, New Mexico. Association fo...
2025
-
[13]
In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
Bertscore: Evalu- ating text generation with BERT. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
2020
-
[14]
Lras: Ad- vanced legal reasoning with agentic search.arXiv preprint arXiv:2601.07296. A Implementation Details This section provides implementation details of the retrieval and training components used in NyayaMind. We describe three retrieval archi- tectures, Endee, Milvus, and Vespa, along with their indexing strategies, hybrid retrieval mecha- nisms, a...
-
[15]
A.2 Hybrid Retrieval with Milvus We implemented Milvus as a self-hosted vector database to store and query dense vector represen- tations
11 Parameter Value System GPUs NVIDIA H200 LoRA Parameters Rank 16 Alpha 16 Dropout 0 Target Modules q, k, v, o, gate, up, down projec- tions Training Parameters Epochs 8 Batch Size 1 Max Sequence Length 16,384 Learning Rate1×10 −4 Scheduler Cosine annealing Warmup Ratio 0.1 Weight Decay 0.01 Early Stopping Patience 3 Optimizer AdamW (8-bit) Inference Par...
2048
-
[16]
<|assistant|>
BM25 Retrieval Not used Applied on top-500 vector results Native parallel full-corpus BM25 Vector Configuration Embedding Modelsnowflake-arctic-embed-m-v2.0(768-d) Distance Metric Cosine L2 L2 Index Type HNSW (M=16, ef_con=128) IVF_FLAT (nlist=2048) or FLAT HNSW (M=16, ef_ins=200) Precision FP16 FP32 FP32 (Paged attributes) Chunk Size 4096 4096 8192 Table...
2048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.