arxiv: 2604.17543 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs

Yuting Huang , Yinghao Hu , Qian Xiao , Wenlin Zhong , Yiquan Wu , Taishi Zhou , Moke Chen , Changlong Sun

show 2 more authors

Kun Kuang Fei Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords legal language modelsdomain-specific LLMscontinued pretrainingsupervised fine-tuningpreference reinforcement learninglegal reasoning benchmarksreal-world legal evaluation

0 comments

The pith

PoliLegalLM uses continued pretraining on a large legal corpus plus staged fine-tuning and preference reinforcement to outperform similar-sized models on legal tasks while matching larger general models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PoliLegalLM as a domain-specific large language model built for political and legal affairs. It addresses hallucinated citations and weak reasoning in general models through a unified training pipeline that starts with continued pretraining on a constructed high-quality legal corpus, moves to progressive supervised fine-tuning, and ends with preference-based reinforcement learning. This sequence is intended to improve knowledge grounding, task alignment, and structured reasoning in one integrated process. The model is tested on LawBench, LexEval, and a new real-world PoliLegal dataset, where it shows consistent gains. A reader would care because the results suggest smaller specialized models can become reliable for practical legal work without requiring the scale of general-purpose systems.

Core claim

PoliLegalLM is developed with a unified training framework that integrates continued pretraining on a large-scale high-quality legal corpus, progressive supervised fine-tuning, and preference-based reinforcement learning. This enables effective learning of domain-specific knowledge and adaptation to diverse legal tasks. On three benchmarks including LawBench, LexEval, and the real-world PoliLegal dataset, the model achieves strong and consistent performance, outperforming competitive models of similar scale, remaining highly competitive with significantly larger models, and recording the best results on real-world legal scenarios.

What carries the argument

Unified training framework of continued pretraining on a legal corpus, followed by progressive supervised fine-tuning and preference-based reinforcement learning.

If this is right

Smaller models can reach competitive accuracy on legal reasoning without matching the parameter count of general models.
The staged training approach improves handling of real-world legal scenarios more than standard fine-tuning alone.
Better knowledge grounding from the corpus reduces hallucinated legal citations and incomplete coverage.
Progressive stages allow incremental gains in task alignment across political and legal applications.
Domain-specific training pipelines offer practical value for deploying LLMs in professional legal settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar corpus-plus-staged-training methods could be applied to other high-precision domains such as medicine or regulatory compliance to test whether scale can be traded for specialization.
The focus on real-world datasets implies that future legal AI evaluations should prioritize live cases over static benchmarks to measure actual utility.
Combining this model with external retrieval systems for statutes and precedents might further cut citation errors beyond what training alone achieves.
If the training sequence generalizes, organizations could fine-tune smaller base models on their own private legal archives instead of relying on ever-larger public models.

Load-bearing premise

The performance improvements come directly from the legal corpus and the three-stage training sequence rather than from data overlap with the test sets or selective evaluation choices.

What would settle it

Evaluating the model on a fresh collection of legal questions and cases created after the corpus was assembled and confirmed to have no overlap with any training data.

read the original abstract

Large language models (LLMs) have achieved remarkable success in general-domain tasks, yet their direct application to the legal domain remains challenging due to hallucinated legal citations, incomplete knowledge coverage, and weak structured reasoning. To address these issues, we propose PoliLegalLM, a domain-specific large language model tailored for political and legal applications. Our approach adopts a unified training framework that integrates continued pretraining, progressive supervised fine-tuning, and preference-based reinforcement learning to jointly enhance legal knowledge grounding, task alignment, and reasoning capability. We construct a large-scale, high-quality legal corpus and design a structured post-training pipeline, enabling the model to effectively learn domain-specific knowledge and adapt to diverse legal tasks. We evaluate PoliLegalLM on three representative benchmarks, including LawBench, LexEval, and a real-world dataset, PoliLegal. Experimental results demonstrate that PoliLegalLM achieves strong and consistent performance, outperforming competitive models of similar scale and remaining highly competitive with significantly larger models, while achieving the best results on real-world legal scenarios. These results highlight the effectiveness of our training paradigm and the practical value of domain-specific LLMs for real-world legal applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a competent domain-adaptation report for legal LLMs that includes useful practical details like decontamination and ablations, but sticks to standard methods without new ideas.

read the letter

PoliLegalLM is a straightforward technical report on adapting a large language model to political and legal tasks using continued pretraining on a custom corpus, progressive supervised fine-tuning, and preference reinforcement learning. The authors assembled a large-scale legal corpus from statutes, cases, and regulations, applied n-gram filtering to avoid benchmark overlap, and included ablation studies that separate the effects of the SFT and RL stages. Their model outperforms similar-scale baselines on LawBench and LexEval while staying competitive with much larger models, and it leads on their real-world PoliLegal dataset. What stands out is the attention to practical details. Decontamination and the ablations are not always present in these reports, and they make the results more believable. The inclusion of a real-world evaluation set also helps show that the improvements translate beyond standard benchmarks. The limitations are the usual ones for this format. The training pipeline follows the established pattern without introducing new components or theoretical insights. The performance claims rest on the reported scores, and while the ablations support the framework, the absolute gains may still be modest once you account for the base model choice and training scale. The political side of the model gets less specific treatment than the legal one, and questions about bias across different legal systems or languages are left open. This paper is mainly for researchers and engineers already building or fine-tuning LLMs for legal applications who want a worked example with data sources and training stages laid out. It is not aimed at readers looking for fundamental advances in model architecture or reasoning. I would recommend sending it for peer review. The methods are reproducible enough, the evaluation protocol is reasonable, and the extra steps on data quality give it enough substance to be worth referee time.

Referee Report

1 major / 3 minor

Summary. The manuscript introduces PoliLegalLM, a domain-specific LLM for political and legal affairs. It describes construction of a large-scale legal corpus from statutes, cases, and regulations, followed by a unified training pipeline of continued pretraining, progressive supervised fine-tuning (SFT), and preference-based reinforcement learning (RL). The model is evaluated on LawBench, LexEval, and a custom real-world PoliLegal dataset, with claims of outperforming similar-scale models, remaining competitive with larger models, and achieving the best results on real-world legal scenarios. Ablation studies and n-gram decontamination against public benchmarks are included to support the contributions of each training stage.

Significance. If the reported gains hold under scrutiny, the work provides a useful technical report on domain adaptation for legal tasks, with practical value for real-world applications. Credit is due for the explicit decontamination steps, ablation results isolating progressive SFT and preference RL, and evaluation on a custom real-world dataset alongside standard benchmarks. This strengthens the case for structured post-training pipelines in specialized domains.

major comments (1)

[Section 4] Section 4 (Experiments): While ablation tables isolate the incremental gains from progressive SFT and preference RL, the central claim of 'genuine improvements in legal knowledge grounding' would be strengthened by reporting the absolute performance deltas with standard deviations across at least three random seeds, as single-run results leave open the possibility that observed gains fall within noise.

minor comments (3)

[Abstract] Abstract: The summary paragraph asserts performance gains without any numerical values or error bars; adding one or two key metrics (e.g., average accuracy on LawBench) would improve the standalone readability of the abstract.
[Section 3.2] Section 3.2 (Corpus Construction): The n-gram filtering procedure for decontamination is described at a high level; specifying the exact n-gram length, overlap threshold, and the fraction of tokens removed would aid reproducibility.
[Figure 2] Figure 2 (Training Pipeline): The diagram of the progressive SFT stages is clear, but the axis labels on the accompanying loss curves are too small to read the epoch numbers and loss scales without magnification.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The suggestion to strengthen the experimental claims is noted, and we address it in detail below.

read point-by-point responses

Referee: [Section 4] Section 4 (Experiments): While ablation tables isolate the incremental gains from progressive SFT and preference RL, the central claim of 'genuine improvements in legal knowledge grounding' would be strengthened by reporting the absolute performance deltas with standard deviations across at least three random seeds, as single-run results leave open the possibility that observed gains fall within noise.

Authors: We appreciate the referee's point that reporting standard deviations across multiple random seeds would further bolster confidence in the observed gains. However, the continued pretraining stage on our 50B+ token legal corpus requires substantial computational resources (hundreds of GPU-hours per run), rendering multiple independent seeds impractical under our available infrastructure. To mitigate concerns about run-to-run variability, we have designed the evaluation protocol to emphasize consistency: (1) all models are evaluated on the same fixed test splits with identical decoding parameters; (2) improvements appear uniformly across three independent benchmarks with different task distributions (LawBench, LexEval, and the real-world PoliLegal dataset); and (3) the ablation study isolates the contribution of each post-training stage while controlling for data and model size. We believe these controls, together with the explicit n-gram decontamination, provide sufficient evidence that the gains are attributable to the training pipeline rather than stochastic noise. We will add a brief limitations paragraph in Section 4 acknowledging the single-run nature of the results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation with decontamination

full rationale

The paper reports an empirical LLM training pipeline (continued pretraining on a constructed legal corpus, progressive SFT, preference RL) followed by evaluation on LawBench, LexEval, and PoliLegal. No mathematical derivation, first-principles claim, or fitted parameter is presented as a 'prediction' that reduces to its own inputs by construction. The methods explicitly include n-gram decontamination against public benchmarks and ablation studies isolating training stages. These steps are independent of the final benchmark scores and do not rely on self-citation chains or renaming of known results. The central claim therefore rests on externally falsifiable experimental outcomes rather than tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unverified assertion that the authors' legal corpus is high-quality and that the three-stage training pipeline produces genuine domain grounding; no independent evidence for corpus quality or absence of contamination is supplied in the abstract.

free parameters (1)

hyperparameters for continued pretraining, SFT, and RL stages
Standard LLM training choices (learning rates, batch sizes, preference weights) that are fitted or tuned to achieve reported benchmark scores.

pith-pipeline@v0.9.0 · 5530 in / 1221 out tokens · 42403 ms · 2026-05-10T05:59:05.459803+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 12 canonical work pages · 4 internal anchors

[1]

2026 , eprint=

A Survey of Large Language Models , author=. 2026 , eprint=

2026
[2]

2020 , eprint=

Language Models are Few-Shot Learners , author=. 2020 , eprint=

2020
[3]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024
[4]

2025 , eprint=

A Survey of Reinforcement Learning for Large Reasoning Models , author=. 2025 , eprint=

2025
[5]

2026 , eprint=

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey , author=. 2026 , eprint=

2026
[6]

Journal of Legal Analysis , volume=

Large legal fictions: Profiling legal hallucinations in large language models , author=. Journal of Legal Analysis , volume=. 2024 , publisher=

2024
[7]

2023 , eprint=

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models , author=. 2023 , eprint=

2023
[8]

2024 , eprint=

Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model , author=. 2024 , eprint=

2024
[9]

2025 , eprint=

An Explicit Syllogistic Legal Reasoning Framework for Large Language Models , author=. 2025 , eprint=

2025
[10]

2025 , eprint=

HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications , author=. 2025 , eprint=

2025
[11]

Artificial Intelligence Review , year=

LegalAi research in LLM Era: data, modeling and evaluation , author=. Artificial Intelligence Review , year=
[12]

2024 , eprint=

SaulLM-7B: A pioneering Large Language Model for Law , author=. 2024 , eprint=

2024
[13]

2023 , eprint=

DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services , author=. 2023 , eprint=

2023
[14]

2023 , eprint=

Legal Syllogism Prompting: Teaching Large Language Models for Legal Judgment Prediction , author=. 2023 , eprint=

2023
[15]

Findings of the Association for Computational Linguistics:

Feng Yao and Chaojun Xiao and Xiaozhi Wang and Zhiyuan Liu and Lei Hou and Cunchao Tu and Juanzi Li and Yun Liu and Weixing Shen and Maosong Sun , editor =. Findings of the Association for Computational Linguistics:. 2022 , url =. doi:10.18653/v1/2022.findings-acl.17 , timestamp =

work page doi:10.18653/v1/2022.findings-acl.17 2022
[16]

2009 , publisher=

The probabilistic relevance framework: BM25 and beyond , author=. 2009 , publisher=

2009
[17]

Journal of documentation , volume=

A statistical interpretation of term specificity and its application in retrieval , author=. Journal of documentation , volume=. 1972 , publisher=

1972
[18]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

SAILER: structure-aware pre-trained language model for legal case retrieval , author=. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[19]

AI Open , volume=

Lawformer: A pre-trained language model for chinese legal long documents , author=. AI Open , volume=. 2021 , publisher=

2021
[20]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

UniLR: Unleashing the power of LLMs on multiple legal tasks with a unified legal retriever , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[21]

Query2doc: Query expansion with large language models

Query2doc: Query expansion with large language models , author=. arXiv preprint arXiv:2303.07678 , year=

work page arXiv
[22]

arXiv preprint arXiv:2308.16753 , year=

Context aware query rewriting for text rankers using llm , author=. arXiv preprint arXiv:2308.16753 , year=

work page arXiv
[23]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[24]

doi:10.57967/hf/3488 , publisher =

Xiaofeng Shi and Lulu Zhao and Hua Zhou and Donglin Hao , title =. doi:10.57967/hf/3488 , publisher =

work page doi:10.57967/hf/3488
[25]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[26]

GitHub repository , howpublished=

Wanwei He and Jiabao Wen and Lei Zhang and Hao Cheng and Bowen Qin and Yunshui Li and Feng Jiang and Junying Chen and Benyou Wang and Min Yang , title=. GitHub repository , howpublished=. 2023 , publisher=

2023
[27]

Wu, Yiquan and Liu, Yuhang and Liu, Yifei and Li, Ang and Zhou, Siying and Kuang, Kun , title =
[28]

Proceedings of the 33rd ACM International Conference on information and knowledge management , pages=

Lawllm: Law large language model for the us legal system , author=. Proceedings of the 33rd ACM International Conference on information and knowledge management , pages=
[29]

2024 , eprint=

InternLM-Law: An Open Source Chinese Legal Large Language Model , author=. 2024 , eprint=

2024
[30]

2025 , month = nov, day =

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum , author =. 2025 , month = nov, day =

2025
[31]

2025 , month = aug, url =

Qwen3-235B-A22B-Thinking-2507 , author =. 2025 , month = aug, url =

2025
[32]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025
[33]

2025 , url =

Gemini 2.5: Our most intelligent models are getting even better , author =. 2025 , url =

2025
[34]

Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond , url=

Hu, Yinghao and Yu, Yaoyao and Gan, Leilei and Wei, Bin and Kuang, Kun and Wu, Fei , year=. Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond , url=. doi:10.18653/v1/2025.findings-emnlp.742 , booktitle=

work page doi:10.18653/v1/2025.findings-emnlp.742 2025
[35]

2025 , eprint=

Legal Mathematical Reasoning with LLMs: Procedural Alignment through Two-Stage Reinforcement Learning , author=. 2025 , eprint=

2025
[36]

2025 , eprint=

LexPro-1.0 Technical Report , author=. 2025 , eprint=

2025
[37]

2025 , eprint=

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference , author=. 2025 , eprint=

2025
[38]

2024 , eprint=

LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model , author=. 2024 , eprint=

2024
[39]

Self-Rewarding Language Models

Self-rewarding language models , author=. arXiv preprint arXiv:2401.10020 , year=

work page internal anchor Pith review arXiv
[40]

Some things are more cringe than others: Itera- tive preference optimization with the pairwise cringe loss.arXiv preprint arXiv:2312.16682, 2023

Some things are more cringe than others: Preference optimization with the pairwise cringe loss , author=. arXiv preprint arXiv:2312.16682 , year=

work page arXiv
[41]

2024 , eprint=

Iterative Reasoning Preference Optimization , author=. 2024 , eprint=

2024
[42]

2023 , eprint=

RRHF: Rank Responses to Align Language Models with Human Feedback without tears , author=. 2023 , eprint=

2023
[43]

2024 , eprint=

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation , author=. 2024 , eprint=

2024
[44]

arXiv preprint arXiv:2403.07691 , year=

Orpo: Monolithic preference optimization without reference model , author=. arXiv preprint arXiv:2403.07691 , volume=

work page arXiv
[45]

HuggingFace repository , howpublished=

Wang ZeJun , title=. HuggingFace repository , howpublished=. 2022 , publisher=

2022
[46]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , editor =...

2022
[49]

Hierarchical C hinese Legal event extraction via Pedal Attention Mechanism

Shen, Shirong and Qi, Guilin and Li, Zhen and Bi, Sheng and Wang, Lusheng. Hierarchical C hinese Legal event extraction via Pedal Attention Mechanism. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.9

work page doi:10.18653/v1/2020.coling-main.9 2020
[50]

Proceedings of the 31st international conference on computational linguistics , pages=

Fine-tuning large language models for improving factuality in legal question answering , author=. Proceedings of the 31st international conference on computational linguistics , pages=
[51]

Event Extraction for Criminal Legal Text , year=

Li, Qingquan and Zhang, Qifan and Yao, Junjie and Zhang, Yingjie , booktitle=. Event Extraction for Criminal Legal Text , year=
[52]

2025 , eprint=

Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models , author=. 2025 , eprint=

2025
[53]

2025 , eprint=

OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training , author=. 2025 , eprint=

2025
[54]

2025 , publisher =

Cong Liu and Zhong Wang and Shengyu Shen and Jialiang Peng and Xiaoli Zhang and Zhendong Du and Yafang Wang , title =. 2025 , publisher =

2025
[55]

2025 , howpublished =

Chinese-Qwen3-235B-2507-Distill-data-110k-SFT , author =. 2025 , howpublished =

2025
[56]

2023 , eprint=

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models , author=. 2023 , eprint=

2023
[57]

2023 , eprint=

LawBench: Benchmarking Legal Knowledge of Large Language Models , author=. 2023 , eprint=

2023
[58]

2024 , eprint=

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models , author=. 2024 , eprint=

2024
[59]

WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report

WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report , author=. arXiv preprint arXiv:2604.06737 , year=

work page internal anchor Pith review Pith/arXiv arXiv