An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning

Duan Wei; Li Lin

arxiv: 1907.02205 · v1 · pith:ZPOD4HKSnew · submitted 2019-07-04 · 💻 cs.CL · cs.LG

An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning

Duan Wei , Li Lin This is my paper

Pith reviewed 2026-05-25 09:50 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords multi-label charge predictionexternal knowledgenumber learning networklegal case classificationChinese law datasetlabel cardinalitythreshold adjustmentdeep learning models

0 comments

The pith

External knowledge from law provisions plus a number learning network lets models automatically set the right number of charges per case.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a two-phase method for predicting multiple charges in legal cases. The first phase incorporates external knowledge drawn from law provisions to generate label probabilities. The second phase trains a number learning network to predict how many labels each case should receive. Final predictions combine the probabilities with the learned label count by adjusting thresholds automatically. On the largest published Chinese legal dataset the method raises macro-F1 by 3-5 percent and micro-F1 by 5-15 percent when added to existing deep models.

Core claim

Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results.

What carries the argument

The number learning network (NLN) that learns label cardinality from external knowledge extracted from law provisions and then adjusts prediction thresholds.

If this is right

Attaching the approach to existing deep learning models raises both macro-F1 and micro-F1 on multi-label legal cases.
The gains hold on the largest published Chinese legal dataset and are larger on the multi-label subset.
Manual threshold tuning for label count is replaced by automatic adjustment driven by the learned label numbers.
The method produces final predictions by merging per-label probabilities with the predicted label cardinality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-phase structure could be tested on multi-label tasks outside law where external text sources encode cardinality rules.
If the NLN learns stable cardinality signals, the approach might reduce post-hoc calibration needs in other multi-label NLP settings.
Comparing label-number accuracy before and after adding law-provision knowledge would isolate how much the external source contributes.

Load-bearing premise

The external knowledge extracted from law provisions supplies independent information about the correct number of labels that the base model’s probability outputs do not already capture.

What would settle it

Reproduce the experiments on the same Chinese law dataset after disabling the number learning network and check whether the reported 3-15 percent F1 gains disappear.

Figures

Figures reproduced from arXiv: 1907.02205 by Duan Wei, Li Lin.

**Figure 1.** Figure 1: Example of charge prediction 3.2 The Framework of Our Approach For multi-label text classification tasks, our purposed prediction approach is: (1) use text preprocessing to obtain text vectors of input data and external knowledge; (2) use deep learning model and attention mechanism to obtain vectors, and combine them to get output results; (3) By machine learning training with output probability of the ca… view at source ↗

**Figure 2.** Figure 2: In this approach, we can use any text vectorization methods, whether it is one-hot, word-embedding or direct learning, etc. You can also use any kinds of deep learning models, whether it is TextCNN, Bi-GRU, etc. Therefore, it is easy to experiment for the charge prediction task. We make use of the framework of the memory network to obtain the correlation between text and legal provisions by attention mech… view at source ↗

**Figure 3.** Figure 3: Number learning network (NLN) 3.5 Label Decision Finally, we get the label number probability for each sample, and choose the label number with the largest value as the final label number. Then we set the value of the label number is n, and select the top n of the largest value in the corresponding sample output probability as the final output Ri [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Multi-label charge prediction is a task to predict the corresponding accusations for legal cases, and recently becomes a hot topic. However, current studies use rough methods to deal with the label number. These methods manually set parameters to select label numbers, which has an effect in final prediction quality. We propose an external knowledge enhanced multi-label charge prediction approach that has two phases. One is charge label prediction phase with external knowledge from law provisions, the other one is number learning phase with a number learning network (NLN) designed. Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results. In experiments, our approach is connected to some state of-the art deep learning models. By testing on the biggest published Chinese law dataset, we find that our approach has improvements on these models. We future conduct experiments on multi-label samples from the dataset. In items of macro-F1, the improvement of baselines with our approach is 3%-5%; In items of micro-F1, the significant improvement of our approach is 5%-15%. The experiment results show the effectiveness our approach for multi-label charge prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a dedicated number learning network to automate label count in multi-label legal charge prediction but the abstract leaves open whether external statute knowledge supplies any signal beyond what the base probabilities already provide.

read the letter

This paper splits multi-label charge prediction into a prediction phase that injects statute text as external knowledge and a separate number learning network (NLN) that outputs the label cardinality, then combines the two to set the final threshold. The concrete pairing of statute-derived knowledge with an explicit learned cardinality component is not in the prior work the abstract cites, so that pairing is the incremental step forward. It does address a real practical pain point: manual or fixed thresholds for how many charges to predict hurt F1 in legal cases that vary in label count. The reported 3-5% macro-F1 and 5-15% micro-F1 lifts when the method is bolted onto existing deep models on the large Chinese dataset show the idea can move the needle on standard metrics. The main soft spot is exactly the one the stress-test flags. The abstract says the approach “combines the output probabilities of samples and their corresponding label numbers” but never states what the NLN actually receives as input. If the NLN sees only the probability vector or features derived from it, then any threshold adjustment could be done by post-processing the base model alone and the external-knowledge phase would be doing no independent work on cardinality. No ablation, no input diagram, and no training separation details are given, so the central claim that the knowledge supplies orthogonal signal cannot be checked from the text. The experiments also lack protocol, baseline definitions, or significance tests, which keeps the numeric gains hard to interpret. This is for people building legal-tech pipelines who need to handle variable numbers of charges and are willing to try a two-stage architecture. A reader already working on Chinese legal NLP or multi-label thresholding could extract the NLN idea and test it themselves. The paper shows clear thinking about the problem even if the evidence is thin, so it deserves a serious referee who can ask for the missing input and ablation details.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a two-phase external-knowledge-enhanced approach for multi-label charge prediction on legal cases. Phase one performs charge label prediction using external knowledge extracted from law provisions; phase two uses a dedicated Number Learning Network (NLN) to learn the appropriate label cardinality and automatically adjust the decision threshold. Final predictions are obtained by combining the base model's output probabilities with the NLN-derived label counts. When attached to existing deep models and evaluated on the largest published Chinese legal dataset, the method reportedly yields 3-5% macro-F1 and 5-15% micro-F1 gains over the baselines, with additional experiments on multi-label subsets.

Significance. If the external-knowledge component supplies cardinality information orthogonal to the base model's probability outputs, the work would address a practically important limitation in multi-label legal NLP (manual threshold tuning) and could generalize to other domains where label cardinality is variable. The reported F1 deltas are large enough to be noteworthy if reproducible and statistically supported.

major comments (2)

[Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.
[Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.

minor comments (2)

[Abstract] Typos and phrasing: 'future conduct' should be 'further conduct'; 'In items of' should be 'In terms of'; 'the effectiveness our approach' is missing 'of'.
[§3] Notation: the distinction between the 'charge label prediction phase' and the 'number learning phase' is clear in the abstract but should be reinforced with a diagram or explicit input/output equations in the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions that strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (method description): the inputs to the Number Learning Network are never specified. If the NLN receives only the probability vector (or features derived from it) produced by the base model, then any threshold adjustment could be performed by post-processing the base model alone; the external-knowledge component would then be superfluous to the cardinality claim. This is load-bearing for the central assertion that external knowledge from law provisions supplies independent signal about label number.

Authors: We agree that §3 does not explicitly enumerate the inputs to the NLN. The two-phase design separates charge prediction (enhanced by law-provision matching) from cardinality learning; the NLN is intended to receive both the base-model probability vector and auxiliary features derived from the external-knowledge phase (e.g., count and embedding of matched provisions). This supplies the claimed orthogonal signal. Because the current text leaves this implicit, we will revise §3 to state the precise input features and how external knowledge enters the NLN. revision: yes
Referee: [Experiments] Experiments section: the abstract states 3-15% F1 improvements but supplies no protocol details, baseline definitions, number of runs, statistical significance tests, or error bars. Without these, the quantitative claims cannot be verified and the reported gains cannot be attributed to the proposed method rather than implementation differences or post-hoc tuning.

Authors: The referee correctly notes the absence of experimental protocol details. We will expand the Experiments section to specify: the exact baseline models and their implementations, the number of independent runs with different random seeds, the statistical tests performed (e.g., paired t-tests), and error bars (standard deviation across runs). These additions will allow readers to reproduce and attribute the reported macro-F1 (3–5 %) and micro-F1 (5–15 %) gains. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper describes a two-phase method: charge label prediction using external knowledge from law provisions, followed by a separate number learning network (NLN) to adjust thresholds for label cardinality. The abstract and provided text present the NLN as an independent learned component whose outputs are combined with base probabilities, with reported F1 improvements coming from experimental evaluation on a held-out dataset rather than any algebraic reduction or self-referential fitting. No equations, self-citations, or ansatzes are exhibited that would make the cardinality prediction or performance gains equivalent to the inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The number learning network itself is treated as a learned module whose training objective is not detailed.

pith-pipeline@v0.9.0 · 5741 in / 1178 out tokens · 29945 ms · 2026-05-25T09:50:24.509093+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

[1]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[2]

JCP 12(5), 451–461 (2017)

Bajwa, I.S., Karim, F., Naeem, M.A., ul Amin, R.: A semi supervised approach for catchphrase classiﬁcation in legal text documents. JCP 12(5), 451–461 (2017)

work page 2017
[3]

Berger, M.J.: Large scale multi-label text classiﬁcation with semantic word vectors. Tech. rep., Technical Report. Stanford University (2015)

work page 2015
[4]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Cho, K., van Merrienboer, B., G¨ ul¸ cehre, C ¸ ., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1724– 1734 (2014)

work page 2014
[5]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data

Galgani, F., Compton, P., Hoﬀmann, A.: Combining diﬀerent summarization tech- niques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data. pp. 115–123. Association for Compu- tational Linguistics (2012)

work page 2012
[7]

In: ICML 2017, Sydney, NSW, Australia, 6-11 August

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional se- quence to sequence learning. In: ICML 2017, Sydney, NSW, Australia, 6-11 August

work page 2017
[8]

1243–1252 (2017)

pp. 1243–1252 (2017)

work page 2017
[9]

In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 770–778 (2016)

work page 2016
[10]

Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

work page 1997
[11]

In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers

Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text cate- gorization. In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. pp. 562–570 (2017)

work page 2017
[12]

In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018

Kim, Y., Lee, H., Jung, K.: Attnconvnet at semeval-2018 task 1: Attention-based convolutional neural networks for multi-label emotion classiﬁcation. In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018. pp. 141–145 (2018)

work page 2018
[13]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Kim, Y.: Convolutional neural networks for sentence classiﬁcation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1746–1751 (2014)

work page 2014
[14]

In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017

Lenc, L., Kr´ al, P.: Word embeddings for multi-label document classiﬁcation. In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017. pp. 431–437 (2017)

work page 2017
[15]

In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017

Luo, B., Feng, Y., Xu, J., Zhang, X., Zhao, D.: Learning to predict charges for crim- inal cases with legal basis. In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. pp. 2727–2736 (2017)

work page 2017
[16]

In: EMNLP 2015, Lisbon, Portugal, September 17-21,

Luong, T., Pham, H., Manning, C.D.: Eﬀective approaches to attention-based neu- ral machine translation. In: EMNLP 2015, Lisbon, Portugal, September 17-21,

work page 2015
[17]

1412–1421 (2015)

pp. 1412–1421 (2015)

work page 2015
[18]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Eﬃcient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[20]

In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016

Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 1400–1409 (2016)

work page 2016
[21]

In: EMNLP 2016, Austin, TX, USA, November 5, 2016

Nay, J.J.: Gov2vec: Learning distributed representations of institutions and their legal text. In: EMNLP 2016, Austin, TX, USA, November 5, 2016. pp. 49–54 (2016)

work page 2016
[22]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word rep- resentation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1532–1543 (2014)

work page 2014
[23]

In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). pp. 2227–2237 (2018)

work page 2018
[24]

In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers)

Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position represen- tations. In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers). pp. 464–468 (2018)

work page 2018
[25]

IEEE Trans

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

work page 2017
[26]

In: ICAIL 2017, London, UK, June 16, 2017

Sulea, O., Zampieri, M., Malmasi, S., Vela, M., Dinu, L.P., van Genabith, J.: Exploring the use of text classiﬁcation in the legal domain. In: ICAIL 2017, London, UK, June 16, 2017. (2017)

work page 2017
[27]

In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA. pp. 6000–6010 (2017)

work page 2017
[28]

IEEE Trans

Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015) Charge Prediction with Label Number Learning 15

work page 2015
[29]

Neural Networks 103, 1–8 (2018)

Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Networks 103, 1–8 (2018)

work page 2018
[30]

IEEE Trans

Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29(10), 4833–4843 (2018)

work page 2018
[31]

In: IJCAI 2016, New York, NY, USA, 9-15 July 2016

Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: IJCAI 2016, New York, NY, USA, 9-15 July 2016. pp. 2153–2159 (2016)

work page 2016
[32]

In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018

Wei, F., Qin, H., Ye, S., Zhao, H.: Empirical study of deep learning for text classi- ﬁcation in legal document review. In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018. pp. 3317–3320 (2018)

work page 2018
[33]

In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

work page 2015
[34]

IEEE Trans

Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: Deep siamese attention networks for video-based person re-identiﬁcation. IEEE Trans. Multimedia 21(6), 1412–1424 (2019)

work page 2019
[35]

IEEE Trans

Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive net- works for ﬁne-grained visual recognition. IEEE Trans. Cybernetics 49(5), 1791– 1802 (2019)

work page 2019
[36]

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J.: CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018

Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classiﬁcation. In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. pp. 3915–3926 (2018)

work page 2018
[38]

Machine Learning 88(1-2), 47–68 (2012)

Yang, Y., Gopal, S.: Multilabel classiﬁcation with meta-level features in a learning- to-rank framework. Machine Learning 88(1-2), 47–68 (2012)

work page 2012
[39]

In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017

Zhang, H., Xiao, L., Wang, Y., Jin, Y.: A generalized recurrent neural architec- ture for text classiﬁcation with multi-task learning. In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 3385–3391 (2017)

work page 2017
[40]

In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

Zhong, H., Guo, Z., Tu, C., Xiao, C., Liu, Z., Sun, M.: Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

work page 2018
[41]

3540–3549 (2018)

pp. 3540–3549 (2018)

work page 2018

[1] [1]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[2] [2]

JCP 12(5), 451–461 (2017)

Bajwa, I.S., Karim, F., Naeem, M.A., ul Amin, R.: A semi supervised approach for catchphrase classiﬁcation in legal text documents. JCP 12(5), 451–461 (2017)

work page 2017

[3] [3]

Berger, M.J.: Large scale multi-label text classiﬁcation with semantic word vectors. Tech. rep., Technical Report. Stanford University (2015)

work page 2015

[4] [4]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Cho, K., van Merrienboer, B., G¨ ul¸ cehre, C ¸ ., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1724– 1734 (2014)

work page 2014

[5] [5]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data

Galgani, F., Compton, P., Hoﬀmann, A.: Combining diﬀerent summarization tech- niques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Ap- proaches to the Processing of Textual Data. pp. 115–123. Association for Compu- tational Linguistics (2012)

work page 2012

[7] [7]

In: ICML 2017, Sydney, NSW, Australia, 6-11 August

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional se- quence to sequence learning. In: ICML 2017, Sydney, NSW, Australia, 6-11 August

work page 2017

[8] [8]

1243–1252 (2017)

pp. 1243–1252 (2017)

work page 2017

[9] [9]

In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 770–778 (2016)

work page 2016

[10] [10]

Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) 14 W.Duan et al

work page 1997

[11] [11]

In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers

Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text cate- gorization. In: ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. pp. 562–570 (2017)

work page 2017

[12] [12]

In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018

Kim, Y., Lee, H., Jung, K.: Attnconvnet at semeval-2018 task 1: Attention-based convolutional neural networks for multi-label emotion classiﬁcation. In: NAACL- HLT, New Orleans, Louisiana, June 5-6, 2018. pp. 141–145 (2018)

work page 2018

[13] [13]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Kim, Y.: Convolutional neural networks for sentence classiﬁcation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1746–1751 (2014)

work page 2014

[14] [14]

In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017

Lenc, L., Kr´ al, P.: Word embeddings for multi-label document classiﬁcation. In: RANLP 2017, Varna, Bulgaria, September 2 - 8, 2017. pp. 431–437 (2017)

work page 2017

[15] [15]

In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017

Luo, B., Feng, Y., Xu, J., Zhang, X., Zhao, D.: Learning to predict charges for crim- inal cases with legal basis. In: EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017. pp. 2727–2736 (2017)

work page 2017

[16] [16]

In: EMNLP 2015, Lisbon, Portugal, September 17-21,

Luong, T., Pham, H., Manning, C.D.: Eﬀective approaches to attention-based neu- ral machine translation. In: EMNLP 2015, Lisbon, Portugal, September 17-21,

work page 2015

[17] [17]

1412–1421 (2015)

pp. 1412–1421 (2015)

work page 2015

[18] [18]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Eﬃcient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[19] [20]

In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016

Miller, A.H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 1400–1409 (2016)

work page 2016

[20] [21]

In: EMNLP 2016, Austin, TX, USA, November 5, 2016

Nay, J.J.: Gov2vec: Learning distributed representations of institutions and their legal text. In: EMNLP 2016, Austin, TX, USA, November 5, 2016. pp. 49–54 (2016)

work page 2016

[21] [22]

In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word rep- resentation. In: EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1532–1543 (2014)

work page 2014

[22] [23]

In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. In: NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). pp. 2227–2237 (2018)

work page 2018

[23] [24]

In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers)

Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position represen- tations. In: NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers). pp. 464–468 (2018)

work page 2018

[24] [25]

IEEE Trans

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

work page 2017

[25] [26]

In: ICAIL 2017, London, UK, June 16, 2017

Sulea, O., Zampieri, M., Malmasi, S., Vela, M., Dinu, L.P., van Genabith, J.: Exploring the use of text classiﬁcation in the legal domain. In: ICAIL 2017, London, UK, June 16, 2017. (2017)

work page 2017

[26] [27]

In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Sys- tems 2017, 4-9 December 2017, Long Beach, CA, USA. pp. 6000–6010 (2017)

work page 2017

[27] [28]

IEEE Trans

Wang, Y., Lin, X., Wu, L., Zhang, W., Zhang, Q., Huang, X.: Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Processing 24(11), 3939–3949 (2015) Charge Prediction with Label Number Learning 15

work page 2015

[28] [29]

Neural Networks 103, 1–8 (2018)

Wang, Y., Wu, L.: Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Networks 103, 1–8 (2018)

work page 2018

[29] [30]

IEEE Trans

Wang, Y., Wu, L., Lin, X., Gao, J.: Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29(10), 4833–4843 (2018)

work page 2018

[30] [31]

In: IJCAI 2016, New York, NY, USA, 9-15 July 2016

Wang, Y., Zhang, W., Wu, L., Lin, X., Fang, M., Pan, S.: Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: IJCAI 2016, New York, NY, USA, 9-15 July 2016. pp. 2153–2159 (2016)

work page 2016

[31] [32]

In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018

Wei, F., Qin, H., Ye, S., Zhao, H.: Empirical study of deep learning for text classi- ﬁcation in legal document review. In: Big Data 2018, Seattle, WA, USA, December 10-13, 2018. pp. 3317–3320 (2018)

work page 2018

[32] [33]

In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

work page 2015

[33] [34]

IEEE Trans

Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: Deep siamese attention networks for video-based person re-identiﬁcation. IEEE Trans. Multimedia 21(6), 1412–1424 (2019)

work page 2019

[34] [35]

IEEE Trans

Wu, L., Wang, Y., Li, X., Gao, J.: Deep attention-based spatially recursive net- works for ﬁne-grained visual recognition. IEEE Trans. Cybernetics 49(5), 1791– 1802 (2019)

work page 2019

[35] [36]

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., Xu, J.: CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [37]

In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018

Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classiﬁcation. In: COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. pp. 3915–3926 (2018)

work page 2018

[37] [38]

Machine Learning 88(1-2), 47–68 (2012)

Yang, Y., Gopal, S.: Multilabel classiﬁcation with meta-level features in a learning- to-rank framework. Machine Learning 88(1-2), 47–68 (2012)

work page 2012

[38] [39]

In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017

Zhang, H., Xiao, L., Wang, Y., Jin, Y.: A generalized recurrent neural architec- ture for text classiﬁcation with multi-task learning. In: IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 3385–3391 (2017)

work page 2017

[39] [40]

In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

Zhong, H., Guo, Z., Tu, C., Xiao, C., Liu, Z., Sun, M.: Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Meth- ods in Natural Language Processing, Brussels, Belgium, October 31 - November 4,

work page 2018

[40] [41]

3540–3549 (2018)

pp. 3540–3549 (2018)

work page 2018