Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Hai Ye; Zhunchen Luo

arxiv: 1907.11521 · v1 · pith:M2RBSOGZnew · submitted 2019-07-25 · 💻 cs.CL

Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Hai Ye , Zhunchen Luo This is my paper

Pith reviewed 2026-05-24 16:36 UTC · model grok-4.3

classification 💻 cs.CL

keywords distant supervisionrelation extractionmulti-label learningranking losscost-sensitive learningconvolutional neural networksclass tiesknowledge base population

0 comments

The pith

A ranking-based multi-label framework with CNNs learns latent ties between relation classes and applies cost-sensitive rescaling to handle label imbalance in distant supervision relation extraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve relation extraction under distant supervision, where the same entity pair can carry multiple overlapping relations. It builds a convolutional neural network inside a multi-label ranking setup whose loss functions, plus regularization, are meant to capture connections among relation types. Cost-sensitive weighting is added to adjust the relative penalties on positive and negative labels. A reader would care because distant supervision is a main route to populating knowledge bases, and any gain from modeling these inter-relation dependencies would directly raise the quality of extracted facts.

Core claim

To exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels.

What carries the argument

Ranking-based loss functions with regularization inside a convolutional neural network multi-label learner, used to capture latent class ties among relations, together with cost-sensitive rescaling of positive versus negative label costs.

If this is right

The model learns latent connections between relation classes by optimizing ranking losses rather than independent binary decisions.
Cost-sensitive rescaling reduces the impact of the severe positive-negative imbalance typical in distant supervision data.
Experiments on a standard benchmark dataset demonstrate improved performance when both the ranking component and the cost adjustment are active.
The framework is presented as general, so the same ranking-plus-cost structure can be attached to other base neural extractors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ranking-loss structure could be tested on other multi-label sequence labeling tasks that exhibit label co-occurrence patterns.
If class ties prove stable across different knowledge bases, the learned ranking parameters might transfer without full retraining.
An extension could replace the CNN encoder with a transformer and measure whether the ranking component still adds value once contextual representations improve.

Load-bearing premise

Latent connections between relation classes exist in a form that ranking losses can usefully exploit to raise extraction accuracy.

What would settle it

An ablation that removes the ranking-loss and regularization terms, retrains on the same dataset, and shows no drop in extraction metrics would indicate that the class ties are not being exploited as claimed.

Figures

Figures reproduced from arXiv: 1907.11521 by Hai Ye, Zhunchen Luo.

**Figure 2.** Figure 2: The main architecture of our model. The features of sentences are encoded by CNN model, and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison of our model and the baselines. “Rank+Cost” is using the loss function [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Results for impact of ranking based loss function with methods of Rank + AVE, Rank + ATT and [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Results for impact of regularization to model class ties. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Results for impact of cost-sensitive learning. “ATT” means the loss function of Variant-2; [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of λ for model performance based on the loss function of Variant-3. 4.5. Impact of Cost-sensitive Learning In this section, we conduct experiments to reveal the effectiveness of cost-sensitive learning to relieve the impact of NR for model training and model performance. For the loss function of G[cost att], we have two parts for cost-sensitive learning: the first is the one penalized by γ, and the … view at source ↗

**Figure 8.** Figure 8: Impact of NR for model convergence. “+NR” means not relieving NR impact with [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

read the original abstract

Knowledge base provides a potential way to improve the intelligence of information retrieval (IR) systems, for that knowledge base has numerous relations between entities which can help the IR systems to conduct inference from one entity to another entity. Relation extraction is one of the fundamental techniques to construct a knowledge base. Distant supervision is a semi-supervised learning method for relation extraction which learns with labeled and unlabeled data. However, this approach suffers the problem of relation overlapping in which one entity tuple may have multiple relation facts. We believe that relation types can have latent connections, which we call class ties, and can be exploited to enhance relation extraction. However, this property between relation classes has not been fully explored before. In this paper, to exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels. Extensive experiments on a widely used dataset show the effectiveness of our model to exploit class ties and to relieve class imbalance problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds ranking-based multi-label losses and cost-sensitive weighting to a CNN for distant supervision relation extraction to handle class ties and imbalance, an incremental but targeted extension.

read the letter

The core idea is to treat distant supervision relation extraction as a multi-label problem where relations can overlap for the same entity pair. They use a CNN sentence encoder, then apply ranking losses with regularization to learn latent connections between relation classes, and rescale costs to deal with positive-negative imbalance. Experiments on a standard dataset are reported as showing gains from both pieces. This directly targets the overlapping-relations issue that has been a known headache in DSRE for years, and the ranking-plus-cost combination is a reasonable way to operationalize it without inventing new encoders. The approach stays within existing components rather than claiming a fundamental shift. The main limitation visible from the description is the lack of visible loss equations, ablation tables, or detailed baseline comparisons in the summary material, which makes it hard to isolate how much the class-tie modeling actually moves the needle versus the cost rescaling or the base CNN. If the full paper supplies those controls and shows the ranking term is not just acting as extra regularization, the claim holds up better. No circularity or unfalsifiable steps appear in the setup. This is useful reading for groups already running DSRE pipelines or experimenting with multi-label losses in information extraction. It is not the kind of result that would change broader modeling choices in the field. A serious editor should send it to review because the problem is real, the method is concrete and testable, and the experiments are on the right dataset even if they need tighter controls.

Referee Report

2 major / 2 minor

Summary. The paper proposes a ranking-based multi-label learning framework integrated with CNNs for distant supervision relation extraction. It introduces ranking loss functions with regularization to capture latent connections (class ties) between relation types and adopts cost-sensitive rescaling to mitigate class imbalance. Experiments on a standard dataset are reported to demonstrate effectiveness in exploiting class ties and relieving imbalance.

Significance. If the empirical gains hold under rigorous controls, the work provides a concrete mechanism for modeling inter-relation dependencies via ranking objectives in a multi-label RE setting, which remains underexplored. The explicit combination of ranking regularization and cost-sensitive learning offers a reusable template for noisy, overlapping-label extraction tasks.

major comments (2)

[§3] §3 (Method), ranking loss formulation: the claim that the regularization term learns latent class ties is central to the contribution, yet no analysis (e.g., inspection of learned relation embeddings or correlation matrices before/after training) is provided to show that the improvement stems from tie exploitation rather than generic ranking optimization; an ablation removing the regularization term is required to support this.
[§4] §4 (Experiments), baseline and ablation tables: without an ablation that isolates the cost-sensitive rescaling from the ranking component, it is impossible to attribute performance gains to the two stated innovations; the current comparisons do not establish that the framework outperforms prior multi-label or cost-sensitive RE methods on the same dataset splits.

minor comments (2)

[§3.2] Notation for the multi-label ranking loss should be introduced with explicit definitions of positive/negative sets and the margin parameter before its first use in equations.
[Figures] Figure captions should state the exact dataset split (e.g., NYT train/test sizes) and the evaluation metric (P@N or AUC) used in each plot.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address the two major comments point by point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§3] §3 (Method), ranking loss formulation: the claim that the regularization term learns latent class ties is central to the contribution, yet no analysis (e.g., inspection of learned relation embeddings or correlation matrices before/after training) is provided to show that the improvement stems from tie exploitation rather than generic ranking optimization; an ablation removing the regularization term is required to support this.

Authors: We agree that an ablation isolating the regularization term would strengthen the central claim. In the revised version we will add an ablation that removes the regularization term while retaining the base ranking loss, allowing direct attribution of gains to the class-tie modeling component. Space permitting, we will also include a short qualitative inspection of the learned relation embeddings. revision: yes
Referee: [§4] §4 (Experiments), baseline and ablation tables: without an ablation that isolates the cost-sensitive rescaling from the ranking component, it is impossible to attribute performance gains to the two stated innovations; the current comparisons do not establish that the framework outperforms prior multi-label or cost-sensitive RE methods on the same dataset splits.

Authors: We concur that separate ablations are needed to attribute gains to each innovation. We will add an ablation that disables cost-sensitive rescaling while keeping the ranking loss and regularization fixed. Our experiments already follow the standard NYT dataset splits used by prior work; we will expand the baseline tables to include additional published multi-label and cost-sensitive RE methods for direct comparison on those splits. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes a ranking-based multi-label CNN framework with regularization to learn latent class ties between relations plus cost-sensitive rescaling for imbalance. No equations, derivations, or self-citations are visible that reduce the claimed improvements or the exploitation of class ties to fitted parameters, renamed inputs, or prior self-work by construction. The central premise is an empirical modeling choice tested on standard datasets rather than a tautological reduction; the derivation remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; the paper introduces a modeling framework whose load-bearing premises (existence and learnability of class ties, utility of ranking regularization) are stated motivationally rather than derived. No explicit free parameters, axioms, or invented entities are enumerated in the provided text.

pith-pipeline@v0.9.0 · 5745 in / 1189 out tokens · 21426 ms · 2026-05-24T16:36:42.305283+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 7 internal anchors

[1]

Mintz, S

M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extrac- tion without labeled data, in: Proceedings of ACL-IJCNLP, 2009

work page 2009
[2]

K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collabora- tively created graph database for structuring human knowledge, in: Proceedings of KDD, 2008, pp. 1247–1250

work page 2008
[3]

Hoffmann, C

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, D. S. Weld, Knowledge-based weak supervision for information extraction of overlapping relations, in: Pro- ceedings of ACL-HLT, 2011

work page 2011
[4]

Surdeanu, J

M. Surdeanu, J. Tibshirani, R. Nallapati, C. D. Manning, Multi-instance multi- label learning for relation extraction, in: Proceedings of EMNLP, 2012

work page 2012
[5]

F ¨urnkranz, E

J. F ¨urnkranz, E. H ¨ullermeier, E. L. Menc´ıa, K. Brinker, Multilabel classiﬁcation via calibrated label ranking, Machine learning 73 (2) (2008) 133–153

work page 2008
[6]

Zhang, Z.-H

M.-L. Zhang, Z.-H. Zhou, Multilabel neural networks with applications to func- tional genomics and text categorization, IEEE transactions on Knowledge and Data Engineering 18 (10) (2006) 1338–1351. 26

work page 2006
[7]

Zhou, M.-L

Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y .-F. Li, Multi-instance multi-label learn- ing, Artiﬁcial Intelligence 176 (1) (2012) 2291–2320

work page 2012
[8]

Evgeniou, C

T. Evgeniou, C. A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research 6 (Apr) (2005) 615–637

work page 2005
[9]

Japkowicz, S

N. Japkowicz, S. Stephen, The class imbalance problem: A systematic study, Intelligent data analysis 6 (5) (2002) 429–449

work page 2002
[10]

Zheng, Z

H. Zheng, Z. Li, S. Wang, Z. Yan, J. Zhou, Aggregating inter-sentence informa- tion to enhance relation extraction, in: Thirtieth AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016
[11]

Y . Lin, S. Shen, Z. Liu, H. Luan, M. Sun, Neural relation extraction with selective attention over instances, in: Proceedings of ACL, 2016

work page 2016
[12]

LeCun, Y

Y . LeCun, Y . Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436– 444

work page 2015
[13]

C. N. d. Santos, B. Xiang, B. Zhou, Classifying relations by ranking with convo- lutional neural networks, in: Proceeding of ACL, 2015

work page 2015
[14]

Riedel, L

S. Riedel, L. Yao, A. McCallum, Modeling relations and their mentions without labeled text, in: Proceedings of ECML-PKDD, Springer, 2010, pp. 148–163

work page 2010
[15]

X. Han, L. Sun, Global distant supervision for relation extraction, in: Proceedings of AAAI, 2016

work page 2016
[16]

D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, et al., Relation classiﬁcation via con- volutional deep neural network., in: Proceeding of COLING, 2014

work page 2014
[17]

M. G. Yu Mo, M. Dredze, Factor-based compositional embedding models, in: NIPS Workshop on Learning Semantics, 2014

work page 2014
[18]

H. Ye, Z. Yan, Z. Luo, W. Chao, Dependency-tree based convolutional neural networks for aspect term extraction, in: Advances in Knowledge Discovery and Data Mining - 21st Paciﬁc-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part II, 2017. 27

work page 2017
[19]

H. Ye, L. Wang, Semi-supervised learning for neural keyphrase generation, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

work page 2018
[20]

H. Ye, X. Jiang, Z. Luo, W. Chao, Interpretable charge predictions for crim- inal cases: Learning to generate court views from fact descriptions, CoRR abs/1802.08504

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Jiang, H

X. Jiang, H. Ye, Z. Luo, W. Chao, W. Ma, Interpretable rationale augmented charge prediction system, in: The 27th International Conference on Computa- tional Linguistics: System Demonstrations, 2018

work page 2018
[22]

D. Zeng, K. Liu, Y . Chen, J. Zhao, Distant supervision for relation extraction via piecewise convolutional neural networks, in: Proceedings of EMNLP, 2015

work page 2015
[23]

Luong, H

T. Luong, H. Pham, C. D. Manning, Effective approaches to attention-based neu- ral machine translation, in: Proceedings of EMNLP, 2015

work page 2015
[24]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Y . Lin, Z. Liu, M. Sun, Neural relation extraction with multi-lingual attention, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017
[26]

W. Zeng, Y . Lin, Z. Liu, M. Sun, Incorporating relation paths in neural relation extraction, arXiv preprint arXiv:1609.07479

work page internal anchor Pith review Pith/arXiv arXiv
[27]

G. Ji, K. Liu, S. He, J. Zhao, Distant supervision for relation extraction with sentence-level attention and entity descriptions., in: AAAI, 2017, pp. 3060–3066

work page 2017
[28]

L. Chen, Y . Feng, S. Huang, B. Luo, D. Zhao, Encoding implicit relation require- ments for relation extraction: A joint inference approach, Artiﬁcial Intelligence 265 (2018) 45–66

work page 2018
[29]

H. Ye, W. Li, L. Wang, Jointly learning semantic parser and natural language generator via dual information maximization, CoRR abs/1906.00575. 28

work page internal anchor Pith review Pith/arXiv arXiv 1906
[30]

B. Luo, Y . Feng, Z. Wang, Z. Zhu, S. Huang, R. Yan, D. Zhao, Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017
[31]

P. Qin, W. Xu, W. Y . Wang, Dsgan: Generative adversarial training for distant supervision relation extraction, arXiv preprint arXiv:1805.09929

work page internal anchor Pith review Pith/arXiv arXiv
[32]

X. Han, Z. Liu, M. Sun, Denoising distant supervision for relation extraction via instance-level adversarial training, arXiv preprint arXiv:1805.10959

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y . Bengio, Generative adversarial nets, in: Advances in neural in- formation processing systems, 2014

work page 2014
[34]

J. Feng, M. Huang, L. Zhao, Y . Yang, X. Zhu, Reinforcement learning for relation classiﬁcation from noisy data, in: Proceedings of the Thirty-Second AAAI Con- ference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovative Applications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artiﬁcial Intelligence (EAA...

work page 2018
[35]

P. Qin, W. Xu, W. Y . Wang, Robust distant supervision relation extraction via deep reinforcement learning, in: Proceedings of the 56th Annual Meeting of the As- sociation for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, V olume 1: Long Papers, 2018

work page 2018
[36]

T. Liu, K. Wang, B. Chang, Z. Sui, A soft-label method for noise-tolerant distantly supervised relation extraction, in: Proceedings of Empirical Methods in Natural Language Processing, 2017

work page 2017
[37]

T. Liu, X. Zhang, W. Zhou, W. Jia, Neural relation extraction via inner-sentence noise reduction and transfer learning, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

work page 2018
[38]

Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331

T.-Y . Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331. 29

work page 2009
[39]

F. Zhao, Y . Huang, L. Wang, T. Tan, Deep semantic ranking based hashing for multi-label image retrieval, in: Proceedings of CVPR, 2015

work page 2015
[40]

Severyn, A

A. Severyn, A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks, in: Proceedings of the 38th International ACM SIGIR Con- ference on Research and Development in Information Retrieval, ACM, 2015, pp. 373–382

work page 2015
[41]

W. Shen, X. Wang, Y . Wang, X. Bai, Z. Zhang, Deepcontour: A deep convo- lutional feature learned by positive-sharing loss for contour detection, in: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015
[42]

S. H. Khan, M. Bennamoun, F. Sohel, R. Togneri, Cost sensitive learning of deep feature representations from imbalanced data, arXiv preprint arXiv:1508.03422

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Huang, Y

C. Huang, Y . Li, C. Change Loy, X. Tang, Learning deep representation for im- balanced classiﬁcation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[44]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284

work page 2009
[45]

H. Ye, W. Chao, Z. Luo, Z. Li, Jointly extracting relations with class ties via effec- tive deep ranking, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017
[46]

L. Wang, Z. Cao, G. de Melo, Z. Liu, Relation classiﬁcation via multi-level atten- tion cnns, in: Proceedings of ACL, V olume 1: Long Papers, 2016

work page 2016
[47]

Weston, S

J. Weston, S. Bengio, N. Usunier, WSABIE: scaling up to large vocabulary image annotation, in: Proceedings of IJCAI, 2011

work page 2011
[48]

Zhang, Z.-H

M.-L. Zhang, Z.-H. Zhou, A review on multi-label learning algorithms, IEEE transactions on knowledge and data engineering 26 (8) (2014) 1819–1837. 30

work page 2014

[1] [1]

Mintz, S

M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extrac- tion without labeled data, in: Proceedings of ACL-IJCNLP, 2009

work page 2009

[2] [2]

K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collabora- tively created graph database for structuring human knowledge, in: Proceedings of KDD, 2008, pp. 1247–1250

work page 2008

[3] [3]

Hoffmann, C

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, D. S. Weld, Knowledge-based weak supervision for information extraction of overlapping relations, in: Pro- ceedings of ACL-HLT, 2011

work page 2011

[4] [4]

Surdeanu, J

M. Surdeanu, J. Tibshirani, R. Nallapati, C. D. Manning, Multi-instance multi- label learning for relation extraction, in: Proceedings of EMNLP, 2012

work page 2012

[5] [5]

F ¨urnkranz, E

J. F ¨urnkranz, E. H ¨ullermeier, E. L. Menc´ıa, K. Brinker, Multilabel classiﬁcation via calibrated label ranking, Machine learning 73 (2) (2008) 133–153

work page 2008

[6] [6]

Zhang, Z.-H

M.-L. Zhang, Z.-H. Zhou, Multilabel neural networks with applications to func- tional genomics and text categorization, IEEE transactions on Knowledge and Data Engineering 18 (10) (2006) 1338–1351. 26

work page 2006

[7] [7]

Zhou, M.-L

Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y .-F. Li, Multi-instance multi-label learn- ing, Artiﬁcial Intelligence 176 (1) (2012) 2291–2320

work page 2012

[8] [8]

Evgeniou, C

T. Evgeniou, C. A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research 6 (Apr) (2005) 615–637

work page 2005

[9] [9]

Japkowicz, S

N. Japkowicz, S. Stephen, The class imbalance problem: A systematic study, Intelligent data analysis 6 (5) (2002) 429–449

work page 2002

[10] [10]

Zheng, Z

H. Zheng, Z. Li, S. Wang, Z. Yan, J. Zhou, Aggregating inter-sentence informa- tion to enhance relation extraction, in: Thirtieth AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016

[11] [11]

Y . Lin, S. Shen, Z. Liu, H. Luan, M. Sun, Neural relation extraction with selective attention over instances, in: Proceedings of ACL, 2016

work page 2016

[12] [12]

LeCun, Y

Y . LeCun, Y . Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436– 444

work page 2015

[13] [13]

C. N. d. Santos, B. Xiang, B. Zhou, Classifying relations by ranking with convo- lutional neural networks, in: Proceeding of ACL, 2015

work page 2015

[14] [14]

Riedel, L

S. Riedel, L. Yao, A. McCallum, Modeling relations and their mentions without labeled text, in: Proceedings of ECML-PKDD, Springer, 2010, pp. 148–163

work page 2010

[15] [15]

X. Han, L. Sun, Global distant supervision for relation extraction, in: Proceedings of AAAI, 2016

work page 2016

[16] [16]

D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, et al., Relation classiﬁcation via con- volutional deep neural network., in: Proceeding of COLING, 2014

work page 2014

[17] [17]

M. G. Yu Mo, M. Dredze, Factor-based compositional embedding models, in: NIPS Workshop on Learning Semantics, 2014

work page 2014

[18] [18]

H. Ye, Z. Yan, Z. Luo, W. Chao, Dependency-tree based convolutional neural networks for aspect term extraction, in: Advances in Knowledge Discovery and Data Mining - 21st Paciﬁc-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part II, 2017. 27

work page 2017

[19] [19]

H. Ye, L. Wang, Semi-supervised learning for neural keyphrase generation, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

work page 2018

[20] [20]

H. Ye, X. Jiang, Z. Luo, W. Chao, Interpretable charge predictions for crim- inal cases: Learning to generate court views from fact descriptions, CoRR abs/1802.08504

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Jiang, H

X. Jiang, H. Ye, Z. Luo, W. Chao, W. Ma, Interpretable rationale augmented charge prediction system, in: The 27th International Conference on Computa- tional Linguistics: System Demonstrations, 2018

work page 2018

[22] [22]

D. Zeng, K. Liu, Y . Chen, J. Zhao, Distant supervision for relation extraction via piecewise convolutional neural networks, in: Proceedings of EMNLP, 2015

work page 2015

[23] [23]

Luong, H

T. Luong, H. Pham, C. D. Manning, Effective approaches to attention-based neu- ral machine translation, in: Proceedings of EMNLP, 2015

work page 2015

[24] [24]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Y . Lin, Z. Liu, M. Sun, Neural relation extraction with multi-lingual attention, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017

[26] [26]

W. Zeng, Y . Lin, Z. Liu, M. Sun, Incorporating relation paths in neural relation extraction, arXiv preprint arXiv:1609.07479

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

G. Ji, K. Liu, S. He, J. Zhao, Distant supervision for relation extraction with sentence-level attention and entity descriptions., in: AAAI, 2017, pp. 3060–3066

work page 2017

[28] [28]

L. Chen, Y . Feng, S. Huang, B. Luo, D. Zhao, Encoding implicit relation require- ments for relation extraction: A joint inference approach, Artiﬁcial Intelligence 265 (2018) 45–66

work page 2018

[29] [29]

H. Ye, W. Li, L. Wang, Jointly learning semantic parser and natural language generator via dual information maximization, CoRR abs/1906.00575. 28

work page internal anchor Pith review Pith/arXiv arXiv 1906

[30] [30]

B. Luo, Y . Feng, Z. Wang, Z. Zhu, S. Huang, R. Yan, D. Zhao, Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017

[31] [31]

P. Qin, W. Xu, W. Y . Wang, Dsgan: Generative adversarial training for distant supervision relation extraction, arXiv preprint arXiv:1805.09929

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

X. Han, Z. Liu, M. Sun, Denoising distant supervision for relation extraction via instance-level adversarial training, arXiv preprint arXiv:1805.10959

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y . Bengio, Generative adversarial nets, in: Advances in neural in- formation processing systems, 2014

work page 2014

[34] [34]

J. Feng, M. Huang, L. Zhao, Y . Yang, X. Zhu, Reinforcement learning for relation classiﬁcation from noisy data, in: Proceedings of the Thirty-Second AAAI Con- ference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovative Applications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artiﬁcial Intelligence (EAA...

work page 2018

[35] [35]

P. Qin, W. Xu, W. Y . Wang, Robust distant supervision relation extraction via deep reinforcement learning, in: Proceedings of the 56th Annual Meeting of the As- sociation for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, V olume 1: Long Papers, 2018

work page 2018

[36] [36]

T. Liu, K. Wang, B. Chang, Z. Sui, A soft-label method for noise-tolerant distantly supervised relation extraction, in: Proceedings of Empirical Methods in Natural Language Processing, 2017

work page 2017

[37] [37]

T. Liu, X. Zhang, W. Zhou, W. Jia, Neural relation extraction via inner-sentence noise reduction and transfer learning, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

work page 2018

[38] [38]

Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331

T.-Y . Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331. 29

work page 2009

[39] [39]

F. Zhao, Y . Huang, L. Wang, T. Tan, Deep semantic ranking based hashing for multi-label image retrieval, in: Proceedings of CVPR, 2015

work page 2015

[40] [40]

Severyn, A

A. Severyn, A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks, in: Proceedings of the 38th International ACM SIGIR Con- ference on Research and Development in Information Retrieval, ACM, 2015, pp. 373–382

work page 2015

[41] [41]

W. Shen, X. Wang, Y . Wang, X. Bai, Z. Zhang, Deepcontour: A deep convo- lutional feature learned by positive-sharing loss for contour detection, in: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015

[42] [42]

S. H. Khan, M. Bennamoun, F. Sohel, R. Togneri, Cost sensitive learning of deep feature representations from imbalanced data, arXiv preprint arXiv:1508.03422

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

Huang, Y

C. Huang, Y . Li, C. Change Loy, X. Tang, Learning deep representation for im- balanced classiﬁcation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[44] [44]

H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284

work page 2009

[45] [45]

H. Ye, W. Chao, Z. Luo, Z. Li, Jointly extracting relations with class ties via effec- tive deep ranking, in: Proceedings of Association for Computational Linguistics, 2017

work page 2017

[46] [46]

L. Wang, Z. Cao, G. de Melo, Z. Liu, Relation classiﬁcation via multi-level atten- tion cnns, in: Proceedings of ACL, V olume 1: Long Papers, 2016

work page 2016

[47] [47]

Weston, S

J. Weston, S. Bengio, N. Usunier, WSABIE: scaling up to large vocabulary image annotation, in: Proceedings of IJCAI, 2011

work page 2011

[48] [48]

Zhang, Z.-H

M.-L. Zhang, Z.-H. Zhou, A review on multi-label learning algorithms, IEEE transactions on knowledge and data engineering 26 (8) (2014) 1819–1837. 30

work page 2014