pith. sign in

arxiv: 1907.11521 · v1 · pith:M2RBSOGZnew · submitted 2019-07-25 · 💻 cs.CL

Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Pith reviewed 2026-05-24 16:36 UTC · model grok-4.3

classification 💻 cs.CL
keywords distant supervisionrelation extractionmulti-label learningranking losscost-sensitive learningconvolutional neural networksclass tiesknowledge base population
0
0 comments X

The pith

A ranking-based multi-label framework with CNNs learns latent ties between relation classes and applies cost-sensitive rescaling to handle label imbalance in distant supervision relation extraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve relation extraction under distant supervision, where the same entity pair can carry multiple overlapping relations. It builds a convolutional neural network inside a multi-label ranking setup whose loss functions, plus regularization, are meant to capture connections among relation types. Cost-sensitive weighting is added to adjust the relative penalties on positive and negative labels. A reader would care because distant supervision is a main route to populating knowledge bases, and any gain from modeling these inter-relation dependencies would directly raise the quality of extracted facts.

Core claim

To exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels.

What carries the argument

Ranking-based loss functions with regularization inside a convolutional neural network multi-label learner, used to capture latent class ties among relations, together with cost-sensitive rescaling of positive versus negative label costs.

If this is right

  • The model learns latent connections between relation classes by optimizing ranking losses rather than independent binary decisions.
  • Cost-sensitive rescaling reduces the impact of the severe positive-negative imbalance typical in distant supervision data.
  • Experiments on a standard benchmark dataset demonstrate improved performance when both the ranking component and the cost adjustment are active.
  • The framework is presented as general, so the same ranking-plus-cost structure can be attached to other base neural extractors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ranking-loss structure could be tested on other multi-label sequence labeling tasks that exhibit label co-occurrence patterns.
  • If class ties prove stable across different knowledge bases, the learned ranking parameters might transfer without full retraining.
  • An extension could replace the CNN encoder with a transformer and measure whether the ranking component still adds value once contextual representations improve.

Load-bearing premise

Latent connections between relation classes exist in a form that ranking losses can usefully exploit to raise extraction accuracy.

What would settle it

An ablation that removes the ranking-loss and regularization terms, retrains on the same dataset, and shows no drop in extraction metrics would indicate that the class ties are not being exploited as claimed.

Figures

Figures reproduced from arXiv: 1907.11521 by Hai Ye, Zhunchen Luo.

Figure 1
Figure 1. Figure 1: Training instances generated by freebase. The entity tuple is ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The main architecture of our model. The features of sentences are encoded by CNN model, and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison of our model and the baselines. “Rank+Cost” is using the loss function [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results for impact of ranking based loss function with methods of Rank + AVE, Rank + ATT and [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results for impact of regularization to model class ties. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results for impact of cost-sensitive learning. “ATT” means the loss function of Variant-2; [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Effect of λ for model performance based on the loss function of Variant-3. 4.5. Impact of Cost-sensitive Learning In this section, we conduct experiments to reveal the effectiveness of cost-sensitive learning to relieve the impact of NR for model training and model performance. For the loss function of G[cost att], we have two parts for cost-sensitive learning: the first is the one penalized by γ, and the … view at source ↗
Figure 8
Figure 8. Figure 8: Impact of NR for model convergence. “+NR” means not relieving NR impact with [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
read the original abstract

Knowledge base provides a potential way to improve the intelligence of information retrieval (IR) systems, for that knowledge base has numerous relations between entities which can help the IR systems to conduct inference from one entity to another entity. Relation extraction is one of the fundamental techniques to construct a knowledge base. Distant supervision is a semi-supervised learning method for relation extraction which learns with labeled and unlabeled data. However, this approach suffers the problem of relation overlapping in which one entity tuple may have multiple relation facts. We believe that relation types can have latent connections, which we call class ties, and can be exploited to enhance relation extraction. However, this property between relation classes has not been fully explored before. In this paper, to exploit class ties between relations to improve relation extraction, we propose a general ranking based multi-label learning framework combined with convolutional neural networks, in which ranking based loss functions with regularization technique are introduced to learn the latent connections between relations. Furthermore, to deal with the problem of class imbalance in distant supervision relation extraction, we further adopt cost-sensitive learning to rescale the costs from the positive and negative labels. Extensive experiments on a widely used dataset show the effectiveness of our model to exploit class ties and to relieve class imbalance problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a ranking-based multi-label learning framework integrated with CNNs for distant supervision relation extraction. It introduces ranking loss functions with regularization to capture latent connections (class ties) between relation types and adopts cost-sensitive rescaling to mitigate class imbalance. Experiments on a standard dataset are reported to demonstrate effectiveness in exploiting class ties and relieving imbalance.

Significance. If the empirical gains hold under rigorous controls, the work provides a concrete mechanism for modeling inter-relation dependencies via ranking objectives in a multi-label RE setting, which remains underexplored. The explicit combination of ranking regularization and cost-sensitive learning offers a reusable template for noisy, overlapping-label extraction tasks.

major comments (2)
  1. [§3] §3 (Method), ranking loss formulation: the claim that the regularization term learns latent class ties is central to the contribution, yet no analysis (e.g., inspection of learned relation embeddings or correlation matrices before/after training) is provided to show that the improvement stems from tie exploitation rather than generic ranking optimization; an ablation removing the regularization term is required to support this.
  2. [§4] §4 (Experiments), baseline and ablation tables: without an ablation that isolates the cost-sensitive rescaling from the ranking component, it is impossible to attribute performance gains to the two stated innovations; the current comparisons do not establish that the framework outperforms prior multi-label or cost-sensitive RE methods on the same dataset splits.
minor comments (2)
  1. [§3.2] Notation for the multi-label ranking loss should be introduced with explicit definitions of positive/negative sets and the margin parameter before its first use in equations.
  2. [Figures] Figure captions should state the exact dataset split (e.g., NYT train/test sizes) and the evaluation metric (P@N or AUC) used in each plot.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address the two major comments point by point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3] §3 (Method), ranking loss formulation: the claim that the regularization term learns latent class ties is central to the contribution, yet no analysis (e.g., inspection of learned relation embeddings or correlation matrices before/after training) is provided to show that the improvement stems from tie exploitation rather than generic ranking optimization; an ablation removing the regularization term is required to support this.

    Authors: We agree that an ablation isolating the regularization term would strengthen the central claim. In the revised version we will add an ablation that removes the regularization term while retaining the base ranking loss, allowing direct attribution of gains to the class-tie modeling component. Space permitting, we will also include a short qualitative inspection of the learned relation embeddings. revision: yes

  2. Referee: [§4] §4 (Experiments), baseline and ablation tables: without an ablation that isolates the cost-sensitive rescaling from the ranking component, it is impossible to attribute performance gains to the two stated innovations; the current comparisons do not establish that the framework outperforms prior multi-label or cost-sensitive RE methods on the same dataset splits.

    Authors: We concur that separate ablations are needed to attribute gains to each innovation. We will add an ablation that disables cost-sensitive rescaling while keeping the ranking loss and regularization fixed. Our experiments already follow the standard NYT dataset splits used by prior work; we will expand the baseline tables to include additional published multi-label and cost-sensitive RE methods for direct comparison on those splits. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes a ranking-based multi-label CNN framework with regularization to learn latent class ties between relations plus cost-sensitive rescaling for imbalance. No equations, derivations, or self-citations are visible that reduce the claimed improvements or the exploitation of class ties to fitted parameters, renamed inputs, or prior self-work by construction. The central premise is an empirical modeling choice tested on standard datasets rather than a tautological reduction; the derivation remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; the paper introduces a modeling framework whose load-bearing premises (existence and learnability of class ties, utility of ranking regularization) are stated motivationally rather than derived. No explicit free parameters, axioms, or invented entities are enumerated in the provided text.

pith-pipeline@v0.9.0 · 5745 in / 1189 out tokens · 21426 ms · 2026-05-24T16:36:42.305283+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 7 internal anchors

  1. [1]

    Mintz, S

    M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extrac- tion without labeled data, in: Proceedings of ACL-IJCNLP, 2009

  2. [2]

    K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collabora- tively created graph database for structuring human knowledge, in: Proceedings of KDD, 2008, pp. 1247–1250

  3. [3]

    Hoffmann, C

    R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, D. S. Weld, Knowledge-based weak supervision for information extraction of overlapping relations, in: Pro- ceedings of ACL-HLT, 2011

  4. [4]

    Surdeanu, J

    M. Surdeanu, J. Tibshirani, R. Nallapati, C. D. Manning, Multi-instance multi- label learning for relation extraction, in: Proceedings of EMNLP, 2012

  5. [5]

    F ¨urnkranz, E

    J. F ¨urnkranz, E. H ¨ullermeier, E. L. Menc´ıa, K. Brinker, Multilabel classification via calibrated label ranking, Machine learning 73 (2) (2008) 133–153

  6. [6]

    Zhang, Z.-H

    M.-L. Zhang, Z.-H. Zhou, Multilabel neural networks with applications to func- tional genomics and text categorization, IEEE transactions on Knowledge and Data Engineering 18 (10) (2006) 1338–1351. 26

  7. [7]

    Zhou, M.-L

    Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y .-F. Li, Multi-instance multi-label learn- ing, Artificial Intelligence 176 (1) (2012) 2291–2320

  8. [8]

    Evgeniou, C

    T. Evgeniou, C. A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research 6 (Apr) (2005) 615–637

  9. [9]

    Japkowicz, S

    N. Japkowicz, S. Stephen, The class imbalance problem: A systematic study, Intelligent data analysis 6 (5) (2002) 429–449

  10. [10]

    Zheng, Z

    H. Zheng, Z. Li, S. Wang, Z. Yan, J. Zhou, Aggregating inter-sentence informa- tion to enhance relation extraction, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016

  11. [11]

    Y . Lin, S. Shen, Z. Liu, H. Luan, M. Sun, Neural relation extraction with selective attention over instances, in: Proceedings of ACL, 2016

  12. [12]

    LeCun, Y

    Y . LeCun, Y . Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436– 444

  13. [13]

    C. N. d. Santos, B. Xiang, B. Zhou, Classifying relations by ranking with convo- lutional neural networks, in: Proceeding of ACL, 2015

  14. [14]

    Riedel, L

    S. Riedel, L. Yao, A. McCallum, Modeling relations and their mentions without labeled text, in: Proceedings of ECML-PKDD, Springer, 2010, pp. 148–163

  15. [15]

    X. Han, L. Sun, Global distant supervision for relation extraction, in: Proceedings of AAAI, 2016

  16. [16]

    D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, et al., Relation classification via con- volutional deep neural network., in: Proceeding of COLING, 2014

  17. [17]

    M. G. Yu Mo, M. Dredze, Factor-based compositional embedding models, in: NIPS Workshop on Learning Semantics, 2014

  18. [18]

    H. Ye, Z. Yan, Z. Luo, W. Chao, Dependency-tree based convolutional neural networks for aspect term extraction, in: Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part II, 2017. 27

  19. [19]

    H. Ye, L. Wang, Semi-supervised learning for neural keyphrase generation, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

  20. [20]

    H. Ye, X. Jiang, Z. Luo, W. Chao, Interpretable charge predictions for crim- inal cases: Learning to generate court views from fact descriptions, CoRR abs/1802.08504

  21. [21]

    Jiang, H

    X. Jiang, H. Ye, Z. Luo, W. Chao, W. Ma, Interpretable rationale augmented charge prediction system, in: The 27th International Conference on Computa- tional Linguistics: System Demonstrations, 2018

  22. [22]

    D. Zeng, K. Liu, Y . Chen, J. Zhao, Distant supervision for relation extraction via piecewise convolutional neural networks, in: Proceedings of EMNLP, 2015

  23. [23]

    Luong, H

    T. Luong, H. Pham, C. D. Manning, Effective approaches to attention-based neu- ral machine translation, in: Proceedings of EMNLP, 2015

  24. [24]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473

  25. [25]

    Y . Lin, Z. Liu, M. Sun, Neural relation extraction with multi-lingual attention, in: Proceedings of Association for Computational Linguistics, 2017

  26. [26]

    W. Zeng, Y . Lin, Z. Liu, M. Sun, Incorporating relation paths in neural relation extraction, arXiv preprint arXiv:1609.07479

  27. [27]

    G. Ji, K. Liu, S. He, J. Zhao, Distant supervision for relation extraction with sentence-level attention and entity descriptions., in: AAAI, 2017, pp. 3060–3066

  28. [28]

    L. Chen, Y . Feng, S. Huang, B. Luo, D. Zhao, Encoding implicit relation require- ments for relation extraction: A joint inference approach, Artificial Intelligence 265 (2018) 45–66

  29. [29]

    H. Ye, W. Li, L. Wang, Jointly learning semantic parser and natural language generator via dual information maximization, CoRR abs/1906.00575. 28

  30. [30]

    B. Luo, Y . Feng, Z. Wang, Z. Zhu, S. Huang, R. Yan, D. Zhao, Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix, in: Proceedings of Association for Computational Linguistics, 2017

  31. [31]

    P. Qin, W. Xu, W. Y . Wang, Dsgan: Generative adversarial training for distant supervision relation extraction, arXiv preprint arXiv:1805.09929

  32. [32]

    X. Han, Z. Liu, M. Sun, Denoising distant supervision for relation extraction via instance-level adversarial training, arXiv preprint arXiv:1805.10959

  33. [33]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y . Bengio, Generative adversarial nets, in: Advances in neural in- formation processing systems, 2014

  34. [34]

    J. Feng, M. Huang, L. Zhao, Y . Yang, X. Zhu, Reinforcement learning for relation classification from noisy data, in: Proceedings of the Thirty-Second AAAI Con- ference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAA...

  35. [35]

    P. Qin, W. Xu, W. Y . Wang, Robust distant supervision relation extraction via deep reinforcement learning, in: Proceedings of the 56th Annual Meeting of the As- sociation for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, V olume 1: Long Papers, 2018

  36. [36]

    T. Liu, K. Wang, B. Chang, Z. Sui, A soft-label method for noise-tolerant distantly supervised relation extraction, in: Proceedings of Empirical Methods in Natural Language Processing, 2017

  37. [37]

    T. Liu, X. Zhang, W. Zhou, W. Jia, Neural relation extraction via inner-sentence noise reduction and transfer learning, in: Proceedings of Empirical Methods in Natural Language Processing, 2018

  38. [38]

    Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331

    T.-Y . Liu, Learning to rank for information retrieval, Foundations and Trends in Information Retrieval 3 (3) (2009) 225–331. 29

  39. [39]

    F. Zhao, Y . Huang, L. Wang, T. Tan, Deep semantic ranking based hashing for multi-label image retrieval, in: Proceedings of CVPR, 2015

  40. [40]

    Severyn, A

    A. Severyn, A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks, in: Proceedings of the 38th International ACM SIGIR Con- ference on Research and Development in Information Retrieval, ACM, 2015, pp. 373–382

  41. [41]

    W. Shen, X. Wang, Y . Wang, X. Bai, Z. Zhang, Deepcontour: A deep convo- lutional feature learned by positive-sharing loss for contour detection, in: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

  42. [42]

    S. H. Khan, M. Bennamoun, F. Sohel, R. Togneri, Cost sensitive learning of deep feature representations from imbalanced data, arXiv preprint arXiv:1508.03422

  43. [43]

    Huang, Y

    C. Huang, Y . Li, C. Change Loy, X. Tang, Learning deep representation for im- balanced classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016

  44. [44]

    H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284

  45. [45]

    H. Ye, W. Chao, Z. Luo, Z. Li, Jointly extracting relations with class ties via effec- tive deep ranking, in: Proceedings of Association for Computational Linguistics, 2017

  46. [46]

    L. Wang, Z. Cao, G. de Melo, Z. Liu, Relation classification via multi-level atten- tion cnns, in: Proceedings of ACL, V olume 1: Long Papers, 2016

  47. [47]

    Weston, S

    J. Weston, S. Bengio, N. Usunier, WSABIE: scaling up to large vocabulary image annotation, in: Proceedings of IJCAI, 2011

  48. [48]

    Zhang, Z.-H

    M.-L. Zhang, Z.-H. Zhou, A review on multi-label learning algorithms, IEEE transactions on knowledge and data engineering 26 (8) (2014) 1819–1837. 30