An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning
Pith reviewed 2026-05-25 09:50 UTC · model grok-4.3
The pith
External knowledge from law provisions plus a number learning network lets models automatically set the right number of charges per case.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results.
What carries the argument
The number learning network (NLN) that learns label cardinality from external knowledge extracted from law provisions and then adjusts prediction thresholds.
If this is right
- Attaching the approach to existing deep learning models raises both macro-F1 and micro-F1 on multi-label legal cases.
- The gains hold on the largest published Chinese legal dataset and are larger on the multi-label subset.
- Manual threshold tuning for label count is replaced by automatic adjustment driven by the learned label numbers.
- The method produces final predictions by merging per-label probabilities with the predicted label cardinality.
Where Pith is reading between the lines
- The same two-phase structure could be tested on multi-label tasks outside law where external text sources encode cardinality rules.
- If the NLN learns stable cardinality signals, the approach might reduce post-hoc calibration needs in other multi-label NLP settings.
- Comparing label-number accuracy before and after adding law-provision knowledge would isolate how much the external source contributes.
Load-bearing premise
The external knowledge extracted from law provisions supplies independent information about the correct number of labels that the base model’s probability outputs do not already capture.
What would settle it
Reproduce the experiments on the same Chinese law dataset after disabling the number learning network and check whether the reported 3-15 percent F1 gains disappear.
Figures
read the original abstract
Multi-label charge prediction is a task to predict the corresponding accusations for legal cases, and recently becomes a hot topic. However, current studies use rough methods to deal with the label number. These methods manually set parameters to select label numbers, which has an effect in final prediction quality. We propose an external knowledge enhanced multi-label charge prediction approach that has two phases. One is charge label prediction phase with external knowledge from law provisions, the other one is number learning phase with a number learning network (NLN) designed. Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results. In experiments, our approach is connected to some state of-the art deep learning models. By testing on the biggest published Chinese law dataset, we find that our approach has improvements on these models. We future conduct experiments on multi-label samples from the dataset. In items of macro-F1, the improvement of baselines with our approach is 3%-5%; In items of micro-F1, the significant improvement of our approach is 5%-15%. The experiment results show the effectiveness our approach for multi-label charge prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-phase external-knowledge-enhanced approach for multi-label charge prediction on legal cases. Phase one performs charge label prediction using external knowledge extracted from law provisions; phase two uses a dedicated Number Learning Network (NLN) to learn the appropriate label cardinality and automatically adjust the decision threshold. Final predictions are obtained by combining the base model's output probabilities with the NLN-derived label counts. When attached to existing deep models and evaluated on the largest published Chinese legal dataset, the method reportedly yields 3-5% macro-F1 and 5-15% micro-F1 gains over the baselines, with additional experiments on multi-label subsets.
Significance. If the external-knowledge component supplies cardinality information orthogonal to the base model's probability outputs, the work would address a practically important limitation in multi-label legal NLP (manual threshold tuning) and could generalize to other domains where label cardinality is variable. The reported F1 deltas are large enough to be noteworthy if reproducible and statistically supported.
major comments (2)
- [Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.
- [Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.
minor comments (2)
- [Abstract] Typos and phrasing: 'future conduct' should be 'further conduct'; 'In items of' should be 'In terms of'; 'the effectiveness our approach' is missing 'of'.
- [§3] Notation: the distinction between the 'charge label prediction phase' and the 'number learning phase' is clear in the abstract but should be reinforced with a diagram or explicit input/output equations in the method section.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and commit to revisions that strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.
Authors: We agree that §3 does not explicitly enumerate the inputs to the NLN. The two-phase design separates charge prediction (enhanced by law-provision matching) from cardinality learning; the NLN is intended to receive both the base-model probability vector and auxiliary features derived from the external-knowledge phase (e.g., count and embedding of matched provisions). This supplies the claimed orthogonal signal. Because the current text leaves this implicit, we will revise §3 to state the precise input features and how external knowledge enters the NLN. revision: yes
-
Referee: [Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.
Authors: The referee correctly notes the absence of experimental protocol details. We will expand the Experiments section to specify: the exact baseline models and their implementations, the number of independent runs with different random seeds, the statistical tests performed (e.g., paired t-tests), and error bars (standard deviation across runs). These additions will allow readers to reproduce and attribute the reported macro-F1 (3–5 %) and micro-F1 (5–15 %) gains. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The paper describes a two-phase method: charge label prediction using external knowledge from law provisions, followed by a separate number learning network (NLN) to adjust thresholds for label cardinality. The abstract and provided text present the NLN as an independent learned component whose outputs are combined with base probabilities, with reported F1 improvements coming from experimental evaluation on a held-out dataset rather than any algebraic reduction or self-referential fitting. No equations, self-citations, or ansatzes are exhibited that would make the cardinality prediction or performance gains equivalent to the inputs by construction. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[2]
Bajwa, I.S., Karim, F., Naeem, M.A., ul Amin, R.: A semi supervised approach for catchphrase classification in legal text documents. JCP 12(5), 451–461 (2017)
work page 2017
-
[3]
Berger, M.J.: Large scale multi-label text classification with semantic word vectors. Tech. rep., Technical Report. Stanford University (2015)
work page 2015
-
[4]
Cho, K., van Merrienboer, B., G¨ ul¸ cehre, C ¸ ., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1724– 1734 (2014)
work page 2014
-
[5]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data
Galgani, F., Compton, P., Hoffmann, A.: Combining different summarization tech- niques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data. pp. 115–123. Association for Compu- tational Linguistics (2012)
work page 2012
-
[7]
In: ICML 2017, Sydney, NSW, Australia, 6-11 August
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional se- quence to sequence learning. In: ICML 2017, Sydney, NSW, Australia, 6-11 August
work page 2017
- [8]
-
[9]
In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 770–778 (2016)
work page 2016
-
[10]
Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al
work page 1997
-
[11]
In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text cate- gorization. In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. pp. 562–570 (2017)
work page 2017
-
[12]
In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018
Kim, Y., Lee, H., Jung, K.: Attnconvnet at semeval-2018 task 1: Attention-based convolutional neural networks for multi-label emotion classification. In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018. pp. 141–145 (2018)
work page 2018
-
[13]
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1746–1751 (2014)
work page 2014
-
[14]
In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017
Lenc, L., Kr´ al, P.: Word embeddings for multi-label document classification. In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017. pp. 431–437 (2017)
work page 2017
-
[15]
In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017
Luo, B., Feng, Y., Xu, J., Zhang, X., Zhao, D.: Learning to predict charges for crim- inal cases with legal basis. In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. pp. 2727–2736 (2017)
work page 2017
-
[16]
In: EMNLP 2015, Lisbon, Portugal, September 17-21,
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neu- ral machine translation. In: EMNLP 2015, Lisbon, Portugal, September 17-21,
work page 2015
- [17]
-
[18]
Efficient Estimation of Word Representations in Vector Space
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[20]
In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016
Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 1400–1409 (2016)
work page 2016
-
[21]
In: EMNLP 2016, Austin, TX, USA, November 5, 2016
Nay, J.J.: Gov2vec: Learning distributed representations of institutions and their legal text. In: EMNLP 2016, Austin, TX, USA, November 5, 2016. pp. 49–54 (2016)
work page 2016
-
[22]
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word rep- resentation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1532–1543 (2014)
work page 2014
-
[23]
In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). pp. 2227–2237 (2018)
work page 2018
-
[24]
In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position represen- tations. In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers). pp. 464–468 (2018)
work page 2018
-
[25]
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
work page 2017
-
[26]
In: ICAIL 2017, London, UK, June 16, 2017
Sulea, O., Zampieri, M., Malmasi, S., Vela, M., Dinu, L.P., van Genabith, J.: Exploring the use of text classification in the legal domain. In: ICAIL 2017, London, UK, June 16, 2017. (2017)
work page 2017
-
[27]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA. pp. 6000–6010 (2017)
work page 2017
-
[28]
Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015) Charge Prediction with Label Number Learning 15
work page 2015
-
[29]
Neural Networks 103, 1–8 (2018)
Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Networks 103, 1–8 (2018)
work page 2018
-
[30]
Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29(10), 4833–4843 (2018)
work page 2018
-
[31]
In: IJCAI 2016, New York, NY, USA, 9-15 July 2016
Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: IJCAI 2016, New York, NY, USA, 9-15 July 2016. pp. 2153–2159 (2016)
work page 2016
-
[32]
In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018
Wei, F., Qin, H., Ye, S., Zhao, H.: Empirical study of deep learning for text classi- fication in legal document review. In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018. pp. 3317–3320 (2018)
work page 2018
-
[33]
In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)
Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)
work page 2015
-
[34]
Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21(6), 1412–1424 (2019)
work page 2019
-
[35]
Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive net- works for fine-grained visual recognition. IEEE Trans. Cybernetics 49(5), 1791– 1802 (2019)
work page 2019
-
[36]
CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction
Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J.: CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018
Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classification. In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. pp. 3915–3926 (2018)
work page 2018
-
[38]
Machine Learning 88(1-2), 47–68 (2012)
Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning- to-rank framework. Machine Learning 88(1-2), 47–68 (2012)
work page 2012
-
[39]
In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017
Zhang, H., Xiao, L., Wang, Y., Jin, Y.: A generalized recurrent neural architec- ture for text classification with multi-task learning. In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 3385–3391 (2017)
work page 2017
-
[40]
Zhong, H., Guo, Z., Tu, C., Xiao, C., Liu, Z., Sun, M.: Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,
work page 2018
- [41]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.