Recognition: unknown
PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs
Pith reviewed 2026-05-10 05:59 UTC · model grok-4.3
The pith
PoliLegalLM uses continued pretraining on a large legal corpus plus staged fine-tuning and preference reinforcement to outperform similar-sized models on legal tasks while matching larger general models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PoliLegalLM is developed with a unified training framework that integrates continued pretraining on a large-scale high-quality legal corpus, progressive supervised fine-tuning, and preference-based reinforcement learning. This enables effective learning of domain-specific knowledge and adaptation to diverse legal tasks. On three benchmarks including LawBench, LexEval, and the real-world PoliLegal dataset, the model achieves strong and consistent performance, outperforming competitive models of similar scale, remaining highly competitive with significantly larger models, and recording the best results on real-world legal scenarios.
What carries the argument
Unified training framework of continued pretraining on a legal corpus, followed by progressive supervised fine-tuning and preference-based reinforcement learning.
If this is right
- Smaller models can reach competitive accuracy on legal reasoning without matching the parameter count of general models.
- The staged training approach improves handling of real-world legal scenarios more than standard fine-tuning alone.
- Better knowledge grounding from the corpus reduces hallucinated legal citations and incomplete coverage.
- Progressive stages allow incremental gains in task alignment across political and legal applications.
- Domain-specific training pipelines offer practical value for deploying LLMs in professional legal settings.
Where Pith is reading between the lines
- Similar corpus-plus-staged-training methods could be applied to other high-precision domains such as medicine or regulatory compliance to test whether scale can be traded for specialization.
- The focus on real-world datasets implies that future legal AI evaluations should prioritize live cases over static benchmarks to measure actual utility.
- Combining this model with external retrieval systems for statutes and precedents might further cut citation errors beyond what training alone achieves.
- If the training sequence generalizes, organizations could fine-tune smaller base models on their own private legal archives instead of relying on ever-larger public models.
Load-bearing premise
The performance improvements come directly from the legal corpus and the three-stage training sequence rather than from data overlap with the test sets or selective evaluation choices.
What would settle it
Evaluating the model on a fresh collection of legal questions and cases created after the corpus was assembled and confirmed to have no overlap with any training data.
read the original abstract
Large language models (LLMs) have achieved remarkable success in general-domain tasks, yet their direct application to the legal domain remains challenging due to hallucinated legal citations, incomplete knowledge coverage, and weak structured reasoning. To address these issues, we propose PoliLegalLM, a domain-specific large language model tailored for political and legal applications. Our approach adopts a unified training framework that integrates continued pretraining, progressive supervised fine-tuning, and preference-based reinforcement learning to jointly enhance legal knowledge grounding, task alignment, and reasoning capability. We construct a large-scale, high-quality legal corpus and design a structured post-training pipeline, enabling the model to effectively learn domain-specific knowledge and adapt to diverse legal tasks. We evaluate PoliLegalLM on three representative benchmarks, including LawBench, LexEval, and a real-world dataset, PoliLegal. Experimental results demonstrate that PoliLegalLM achieves strong and consistent performance, outperforming competitive models of similar scale and remaining highly competitive with significantly larger models, while achieving the best results on real-world legal scenarios. These results highlight the effectiveness of our training paradigm and the practical value of domain-specific LLMs for real-world legal applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PoliLegalLM, a domain-specific LLM for political and legal affairs. It describes construction of a large-scale legal corpus from statutes, cases, and regulations, followed by a unified training pipeline of continued pretraining, progressive supervised fine-tuning (SFT), and preference-based reinforcement learning (RL). The model is evaluated on LawBench, LexEval, and a custom real-world PoliLegal dataset, with claims of outperforming similar-scale models, remaining competitive with larger models, and achieving the best results on real-world legal scenarios. Ablation studies and n-gram decontamination against public benchmarks are included to support the contributions of each training stage.
Significance. If the reported gains hold under scrutiny, the work provides a useful technical report on domain adaptation for legal tasks, with practical value for real-world applications. Credit is due for the explicit decontamination steps, ablation results isolating progressive SFT and preference RL, and evaluation on a custom real-world dataset alongside standard benchmarks. This strengthens the case for structured post-training pipelines in specialized domains.
major comments (1)
- [Section 4] Section 4 (Experiments): While ablation tables isolate the incremental gains from progressive SFT and preference RL, the central claim of 'genuine improvements in legal knowledge grounding' would be strengthened by reporting the absolute performance deltas with standard deviations across at least three random seeds, as single-run results leave open the possibility that observed gains fall within noise.
minor comments (3)
- [Abstract] Abstract: The summary paragraph asserts performance gains without any numerical values or error bars; adding one or two key metrics (e.g., average accuracy on LawBench) would improve the standalone readability of the abstract.
- [Section 3.2] Section 3.2 (Corpus Construction): The n-gram filtering procedure for decontamination is described at a high level; specifying the exact n-gram length, overlap threshold, and the fraction of tokens removed would aid reproducibility.
- [Figure 2] Figure 2 (Training Pipeline): The diagram of the progressive SFT stages is clear, but the axis labels on the accompanying loss curves are too small to read the epoch numbers and loss scales without magnification.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The suggestion to strengthen the experimental claims is noted, and we address it in detail below.
read point-by-point responses
-
Referee: [Section 4] Section 4 (Experiments): While ablation tables isolate the incremental gains from progressive SFT and preference RL, the central claim of 'genuine improvements in legal knowledge grounding' would be strengthened by reporting the absolute performance deltas with standard deviations across at least three random seeds, as single-run results leave open the possibility that observed gains fall within noise.
Authors: We appreciate the referee's point that reporting standard deviations across multiple random seeds would further bolster confidence in the observed gains. However, the continued pretraining stage on our 50B+ token legal corpus requires substantial computational resources (hundreds of GPU-hours per run), rendering multiple independent seeds impractical under our available infrastructure. To mitigate concerns about run-to-run variability, we have designed the evaluation protocol to emphasize consistency: (1) all models are evaluated on the same fixed test splits with identical decoding parameters; (2) improvements appear uniformly across three independent benchmarks with different task distributions (LawBench, LexEval, and the real-world PoliLegal dataset); and (3) the ablation study isolates the contribution of each post-training stage while controlling for data and model size. We believe these controls, together with the explicit n-gram decontamination, provide sufficient evidence that the gains are attributable to the training pipeline rather than stochastic noise. We will add a brief limitations paragraph in Section 4 acknowledging the single-run nature of the results. revision: partial
Circularity Check
No significant circularity; empirical evaluation with decontamination
full rationale
The paper reports an empirical LLM training pipeline (continued pretraining on a constructed legal corpus, progressive SFT, preference RL) followed by evaluation on LawBench, LexEval, and PoliLegal. No mathematical derivation, first-principles claim, or fitted parameter is presented as a 'prediction' that reduces to its own inputs by construction. The methods explicitly include n-gram decontamination against public benchmarks and ablation studies isolating training stages. These steps are independent of the final benchmark scores and do not rely on self-citation chains or renaming of known results. The central claim therefore rests on externally falsifiable experimental outcomes rather than tautological reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- hyperparameters for continued pretraining, SFT, and RL stages
Reference graph
Works this paper leans on
-
[1]
2026 , eprint=
A Survey of Large Language Models , author=. 2026 , eprint=
2026
-
[2]
2020 , eprint=
Language Models are Few-Shot Learners , author=. 2020 , eprint=
2020
-
[3]
2024 , eprint=
GPT-4 Technical Report , author=. 2024 , eprint=
2024
-
[4]
2025 , eprint=
A Survey of Reinforcement Learning for Large Reasoning Models , author=. 2025 , eprint=
2025
-
[5]
2026 , eprint=
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey , author=. 2026 , eprint=
2026
-
[6]
Journal of Legal Analysis , volume=
Large legal fictions: Profiling legal hallucinations in large language models , author=. Journal of Legal Analysis , volume=. 2024 , publisher=
2024
-
[7]
2023 , eprint=
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models , author=. 2023 , eprint=
2023
-
[8]
2024 , eprint=
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model , author=. 2024 , eprint=
2024
-
[9]
2025 , eprint=
An Explicit Syllogistic Legal Reasoning Framework for Large Language Models , author=. 2025 , eprint=
2025
-
[10]
2025 , eprint=
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications , author=. 2025 , eprint=
2025
-
[11]
Artificial Intelligence Review , year=
LegalAi research in LLM Era: data, modeling and evaluation , author=. Artificial Intelligence Review , year=
-
[12]
2024 , eprint=
SaulLM-7B: A pioneering Large Language Model for Law , author=. 2024 , eprint=
2024
-
[13]
2023 , eprint=
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services , author=. 2023 , eprint=
2023
-
[14]
2023 , eprint=
Legal Syllogism Prompting: Teaching Large Language Models for Legal Judgment Prediction , author=. 2023 , eprint=
2023
-
[15]
Findings of the Association for Computational Linguistics:
Feng Yao and Chaojun Xiao and Xiaozhi Wang and Zhiyuan Liu and Lei Hou and Cunchao Tu and Juanzi Li and Yun Liu and Weixing Shen and Maosong Sun , editor =. Findings of the Association for Computational Linguistics:. 2022 , url =. doi:10.18653/v1/2022.findings-acl.17 , timestamp =
-
[16]
2009 , publisher=
The probabilistic relevance framework: BM25 and beyond , author=. 2009 , publisher=
2009
-
[17]
Journal of documentation , volume=
A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , volume=. 1972 , publisher=
1972
-
[18]
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
SAILER: structure-aware pre-trained language model for legal case retrieval , author=. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[19]
AI Open , volume=
Lawformer: A pre-trained language model for chinese legal long documents , author=. AI Open , volume=. 2021 , publisher=
2021
-
[20]
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
UniLR: Unleashing the power of LLMs on multiple legal tasks with a unified legal retriever , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[21]
Query2doc: Query expansion with large language models
Query2doc: Query expansion with large language models , author=. arXiv preprint arXiv:2303.07678 , year=
-
[22]
arXiv preprint arXiv:2308.16753 , year=
Context aware query rewriting for text rankers using llm , author=. arXiv preprint arXiv:2308.16753 , year=
-
[23]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[24]
doi:10.57967/hf/3488 , publisher =
Xiaofeng Shi and Lulu Zhao and Hua Zhou and Donglin Hao , title =. doi:10.57967/hf/3488 , publisher =
-
[25]
2025 , eprint=
Qwen3 Technical Report , author=. 2025 , eprint=
2025
-
[26]
GitHub repository , howpublished=
Wanwei He and Jiabao Wen and Lei Zhang and Hao Cheng and Bowen Qin and Yunshui Li and Feng Jiang and Junying Chen and Benyou Wang and Min Yang , title=. GitHub repository , howpublished=. 2023 , publisher=
2023
-
[27]
Wu, Yiquan and Liu, Yuhang and Liu, Yifei and Li, Ang and Zhou, Siying and Kuang, Kun , title =
-
[28]
Proceedings of the 33rd ACM International Conference on information and knowledge management , pages=
Lawllm: Law large language model for the us legal system , author=. Proceedings of the 33rd ACM International Conference on information and knowledge management , pages=
-
[29]
2024 , eprint=
InternLM-Law: An Open Source Chinese Legal Large Language Model , author=. 2024 , eprint=
2024
-
[30]
2025 , month = nov, day =
GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum , author =. 2025 , month = nov, day =
2025
-
[31]
2025 , month = aug, url =
Qwen3-235B-A22B-Thinking-2507 , author =. 2025 , month = aug, url =
2025
-
[32]
2025 , eprint=
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=
2025
-
[33]
2025 , url =
Gemini 2.5: Our most intelligent models are getting even better , author =. 2025 , url =
2025
-
[34]
Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond , url=
Hu, Yinghao and Yu, Yaoyao and Gan, Leilei and Wei, Bin and Kuang, Kun and Wu, Fei , year=. Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond , url=. doi:10.18653/v1/2025.findings-emnlp.742 , booktitle=
-
[35]
2025 , eprint=
Legal Mathematical Reasoning with LLMs: Procedural Alignment through Two-Stage Reinforcement Learning , author=. 2025 , eprint=
2025
-
[36]
2025 , eprint=
LexPro-1.0 Technical Report , author=. 2025 , eprint=
2025
-
[37]
2025 , eprint=
Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference , author=. 2025 , eprint=
2025
-
[38]
2024 , eprint=
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model , author=. 2024 , eprint=
2024
-
[39]
Self-Rewarding Language Models
Self-rewarding language models , author=. arXiv preprint arXiv:2401.10020 , year=
work page internal anchor Pith review arXiv
-
[40]
Some things are more cringe than others: Preference optimization with the pairwise cringe loss , author=. arXiv preprint arXiv:2312.16682 , year=
-
[41]
2024 , eprint=
Iterative Reasoning Preference Optimization , author=. 2024 , eprint=
2024
-
[42]
2023 , eprint=
RRHF: Rank Responses to Align Language Models with Human Feedback without tears , author=. 2023 , eprint=
2023
-
[43]
2024 , eprint=
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation , author=. 2024 , eprint=
2024
-
[44]
arXiv preprint arXiv:2403.07691 , year=
Orpo: Monolithic preference optimization without reference model , author=. arXiv preprint arXiv:2403.07691 , volume=
-
[45]
HuggingFace repository , howpublished=
Wang ZeJun , title=. HuggingFace repository , howpublished=. 2022 , publisher=
2022
-
[46]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
Proximal Policy Optimization Algorithms
Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , editor =...
2022
-
[49]
Hierarchical C hinese Legal event extraction via Pedal Attention Mechanism
Shen, Shirong and Qi, Guilin and Li, Zhen and Bi, Sheng and Wang, Lusheng. Hierarchical C hinese Legal event extraction via Pedal Attention Mechanism. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.9
-
[50]
Proceedings of the 31st international conference on computational linguistics , pages=
Fine-tuning large language models for improving factuality in legal question answering , author=. Proceedings of the 31st international conference on computational linguistics , pages=
-
[51]
Event Extraction for Criminal Legal Text , year=
Li, Qingquan and Zhang, Qifan and Yao, Junjie and Zhang, Yingjie , booktitle=. Event Extraction for Criminal Legal Text , year=
-
[52]
2025 , eprint=
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models , author=. 2025 , eprint=
2025
-
[53]
2025 , eprint=
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training , author=. 2025 , eprint=
2025
-
[54]
2025 , publisher =
Cong Liu and Zhong Wang and Shengyu Shen and Jialiang Peng and Xiaoli Zhang and Zhendong Du and Yafang Wang , title =. 2025 , publisher =
2025
-
[55]
2025 , howpublished =
Chinese-Qwen3-235B-2507-Distill-data-110k-SFT , author =. 2025 , howpublished =
2025
-
[56]
2023 , eprint=
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models , author=. 2023 , eprint=
2023
-
[57]
2023 , eprint=
LawBench: Benchmarking Legal Knowledge of Large Language Models , author=. 2023 , eprint=
2023
-
[58]
2024 , eprint=
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models , author=. 2024 , eprint=
2024
-
[59]
WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report
WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report , author=. arXiv preprint arXiv:2604.06737 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.