pith. sign in

arxiv: 1907.02205 · v1 · pith:ZPOD4HKSnew · submitted 2019-07-04 · 💻 cs.CL · cs.LG

An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning

Pith reviewed 2026-05-25 09:50 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords multi-label charge predictionexternal knowledgenumber learning networklegal case classificationChinese law datasetlabel cardinalitythreshold adjustmentdeep learning models
0
0 comments X

The pith

External knowledge from law provisions plus a number learning network lets models automatically set the right number of charges per case.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a two-phase method for predicting multiple charges in legal cases. The first phase incorporates external knowledge drawn from law provisions to generate label probabilities. The second phase trains a number learning network to predict how many labels each case should receive. Final predictions combine the probabilities with the learned label count by adjusting thresholds automatically. On the largest published Chinese legal dataset the method raises macro-F1 by 3-5 percent and micro-F1 by 5-15 percent when added to existing deep models.

Core claim

Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results.

What carries the argument

The number learning network (NLN) that learns label cardinality from external knowledge extracted from law provisions and then adjusts prediction thresholds.

If this is right

  • Attaching the approach to existing deep learning models raises both macro-F1 and micro-F1 on multi-label legal cases.
  • The gains hold on the largest published Chinese legal dataset and are larger on the multi-label subset.
  • Manual threshold tuning for label count is replaced by automatic adjustment driven by the learned label numbers.
  • The method produces final predictions by merging per-label probabilities with the predicted label cardinality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-phase structure could be tested on multi-label tasks outside law where external text sources encode cardinality rules.
  • If the NLN learns stable cardinality signals, the approach might reduce post-hoc calibration needs in other multi-label NLP settings.
  • Comparing label-number accuracy before and after adding law-provision knowledge would isolate how much the external source contributes.

Load-bearing premise

The external knowledge extracted from law provisions supplies independent information about the correct number of labels that the base model’s probability outputs do not already capture.

What would settle it

Reproduce the experiments on the same Chinese law dataset after disabling the number learning network and check whether the reported 3-15 percent F1 gains disappear.

Figures

Figures reproduced from arXiv: 1907.02205 by Duan Wei, Li Lin.

Figure 1
Figure 1. Figure 1: Example of charge prediction 3.2 The Framework of Our Approach For multi-label text classification tasks, our purposed prediction approach is: (1) use text preprocessing to obtain text vectors of input data and external knowl￾edge; (2) use deep learning model and attention mechanism to obtain vectors, and combine them to get output results; (3) By machine learning training with output probability of the ca… view at source ↗
Figure 2
Figure 2. Figure 2: In this approach, we can use any text vectorization methods, whether it is one-hot, word-embedding or direct learning, etc. You can also use any kinds of deep learning models, whether it is TextCNN, Bi-GRU, etc. Therefore, it is easy to experiment for the charge prediction task. We make use of the framework of the memory network to obtain the correlation between text and legal provi￾sions by attention mech… view at source ↗
Figure 3
Figure 3. Figure 3: Number learning network (NLN) 3.5 Label Decision Finally, we get the label number probability for each sample, and choose the label number with the largest value as the final label number. Then we set the value of the label number is n, and select the top n of the largest value in the corresponding sample output probability as the final output Ri [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Multi-label charge prediction is a task to predict the corresponding accusations for legal cases, and recently becomes a hot topic. However, current studies use rough methods to deal with the label number. These methods manually set parameters to select label numbers, which has an effect in final prediction quality. We propose an external knowledge enhanced multi-label charge prediction approach that has two phases. One is charge label prediction phase with external knowledge from law provisions, the other one is number learning phase with a number learning network (NLN) designed. Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results. In experiments, our approach is connected to some state of-the art deep learning models. By testing on the biggest published Chinese law dataset, we find that our approach has improvements on these models. We future conduct experiments on multi-label samples from the dataset. In items of macro-F1, the improvement of baselines with our approach is 3%-5%; In items of micro-F1, the significant improvement of our approach is 5%-15%. The experiment results show the effectiveness our approach for multi-label charge prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a two-phase external-knowledge-enhanced approach for multi-label charge prediction on legal cases. Phase one performs charge label prediction using external knowledge extracted from law provisions; phase two uses a dedicated Number Learning Network (NLN) to learn the appropriate label cardinality and automatically adjust the decision threshold. Final predictions are obtained by combining the base model's output probabilities with the NLN-derived label counts. When attached to existing deep models and evaluated on the largest published Chinese legal dataset, the method reportedly yields 3-5% macro-F1 and 5-15% micro-F1 gains over the baselines, with additional experiments on multi-label subsets.

Significance. If the external-knowledge component supplies cardinality information orthogonal to the base model's probability outputs, the work would address a practically important limitation in multi-label legal NLP (manual threshold tuning) and could generalize to other domains where label cardinality is variable. The reported F1 deltas are large enough to be noteworthy if reproducible and statistically supported.

major comments (2)
  1. [Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.
  2. [Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.
minor comments (2)
  1. [Abstract] Typos and phrasing: 'future conduct' should be 'further conduct'; 'In items of' should be 'In terms of'; 'the effectiveness our approach' is missing 'of'.
  2. [§3] Notation: the distinction between the 'charge label prediction phase' and the 'number learning phase' is clear in the abstract but should be reinforced with a diagram or explicit input/output equations in the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions that strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.

    Authors: We agree that §3 does not explicitly enumerate the inputs to the NLN. The two-phase design separates charge prediction (enhanced by law-provision matching) from cardinality learning; the NLN is intended to receive both the base-model probability vector and auxiliary features derived from the external-knowledge phase (e.g., count and embedding of matched provisions). This supplies the claimed orthogonal signal. Because the current text leaves this implicit, we will revise §3 to state the precise input features and how external knowledge enters the NLN. revision: yes

  2. Referee: [Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.

    Authors: The referee correctly notes the absence of experimental protocol details. We will expand the Experiments section to specify: the exact baseline models and their implementations, the number of independent runs with different random seeds, the statistical tests performed (e.g., paired t-tests), and error bars (standard deviation across runs). These additions will allow readers to reproduce and attribute the reported macro-F1 (3–5 %) and micro-F1 (5–15 %) gains. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper describes a two-phase method: charge label prediction using external knowledge from law provisions, followed by a separate number learning network (NLN) to adjust thresholds for label cardinality. The abstract and provided text present the NLN as an independent learned component whose outputs are combined with base probabilities, with reported F1 improvements coming from experimental evaluation on a held-out dataset rather than any algebraic reduction or self-referential fitting. No equations, self-citations, or ansatzes are exhibited that would make the cardinality prediction or performance gains equivalent to the inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The number learning network itself is treated as a learned module whose training objective is not detailed.

pith-pipeline@v0.9.0 · 5741 in / 1178 out tokens · 29945 ms · 2026-05-25T09:50:24.509093+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

  1. [1]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)

  2. [2]

    JCP 12(5), 451–461 (2017)

    Bajwa, I.S., Karim, F., Naeem, M.A., ul Amin, R.: A semi supervised approach for catchphrase classification in legal text documents. JCP 12(5), 451–461 (2017)

  3. [3]

    Berger, M.J.: Large scale multi-label text classification with semantic word vectors. Tech. rep., Technical Report. Stanford University (2015)

  4. [4]

    In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

    Cho, K., van Merrienboer, B., G¨ ul¸ cehre, C ¸ ., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1724– 1734 (2014)

  5. [5]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018)

  6. [6]

    In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data

    Galgani, F., Compton, P., Hoffmann, A.: Combining different summarization tech- niques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data. pp. 115–123. Association for Compu- tational Linguistics (2012)

  7. [7]

    In: ICML 2017, Sydney, NSW, Australia, 6-11 August

    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional se- quence to sequence learning. In: ICML 2017, Sydney, NSW, Australia, 6-11 August

  8. [8]

    1243–1252 (2017)

    pp. 1243–1252 (2017)

  9. [9]

    In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 770–778 (2016)

  10. [10]

    Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

  11. [11]

    In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers

    Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text cate- gorization. In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. pp. 562–570 (2017)

  12. [12]

    In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018

    Kim, Y., Lee, H., Jung, K.: Attnconvnet at semeval-2018 task 1: Attention-based convolutional neural networks for multi-label emotion classification. In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018. pp. 141–145 (2018)

  13. [13]

    In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

    Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1746–1751 (2014)

  14. [14]

    In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017

    Lenc, L., Kr´ al, P.: Word embeddings for multi-label document classification. In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017. pp. 431–437 (2017)

  15. [15]

    In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017

    Luo, B., Feng, Y., Xu, J., Zhang, X., Zhao, D.: Learning to predict charges for crim- inal cases with legal basis. In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. pp. 2727–2736 (2017)

  16. [16]

    In: EMNLP 2015, Lisbon, Portugal, September 17-21,

    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neu- ral machine translation. In: EMNLP 2015, Lisbon, Portugal, September 17-21,

  17. [17]

    1412–1421 (2015)

    pp. 1412–1421 (2015)

  18. [18]

    Efficient Estimation of Word Representations in Vector Space

    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)

  19. [20]

    In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016

    Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 1400–1409 (2016)

  20. [21]

    In: EMNLP 2016, Austin, TX, USA, November 5, 2016

    Nay, J.J.: Gov2vec: Learning distributed representations of institutions and their legal text. In: EMNLP 2016, Austin, TX, USA, November 5, 2016. pp. 49–54 (2016)

  21. [22]

    In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

    Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word rep- resentation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1532–1543 (2014)

  22. [23]

    In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)

    Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). pp. 2227–2237 (2018)

  23. [24]

    In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers)

    Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position represen- tations. In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers). pp. 464–468 (2018)

  24. [25]

    IEEE Trans

    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

  25. [26]

    In: ICAIL 2017, London, UK, June 16, 2017

    Sulea, O., Zampieri, M., Malmasi, S., Vela, M., Dinu, L.P., van Genabith, J.: Exploring the use of text classification in the legal domain. In: ICAIL 2017, London, UK, June 16, 2017. (2017)

  26. [27]

    In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA. pp. 6000–6010 (2017)

  27. [28]

    IEEE Trans

    Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015) Charge Prediction with Label Number Learning 15

  28. [29]

    Neural Networks 103, 1–8 (2018)

    Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Networks 103, 1–8 (2018)

  29. [30]

    IEEE Trans

    Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29(10), 4833–4843 (2018)

  30. [31]

    In: IJCAI 2016, New York, NY, USA, 9-15 July 2016

    Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: IJCAI 2016, New York, NY, USA, 9-15 July 2016. pp. 2153–2159 (2016)

  31. [32]

    In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018

    Wei, F., Qin, H., Ye, S., Zhao, H.: Empirical study of deep learning for text classi- fication in legal document review. In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018. pp. 3317–3320 (2018)

  32. [33]

    In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

    Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

  33. [34]

    IEEE Trans

    Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21(6), 1412–1424 (2019)

  34. [35]

    IEEE Trans

    Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive net- works for fine-grained visual recognition. IEEE Trans. Cybernetics 49(5), 1791– 1802 (2019)

  35. [36]

    CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

    Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J.: CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018)

  36. [37]

    In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018

    Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classification. In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. pp. 3915–3926 (2018)

  37. [38]

    Machine Learning 88(1-2), 47–68 (2012)

    Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning- to-rank framework. Machine Learning 88(1-2), 47–68 (2012)

  38. [39]

    In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017

    Zhang, H., Xiao, L., Wang, Y., Jin, Y.: A generalized recurrent neural architec- ture for text classification with multi-task learning. In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 3385–3391 (2017)

  39. [40]

    In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

    Zhong, H., Guo, Z., Tu, C., Xiao, C., Liu, Z., Sun, M.: Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

  40. [41]

    3540–3549 (2018)

    pp. 3540–3549 (2018)