pith. sign in

arxiv: 2605.20628 · v1 · pith:VXYWTEWXnew · submitted 2026-05-20 · 💻 cs.CL

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Pith reviewed 2026-05-21 05:24 UTC · model grok-4.3

classification 💻 cs.CL
keywords biomedical abstract generationtraining-free summarizationzero-shot promptingrhetorical structurelarge language modelsfactuality evaluationPMC-MAD dataset
0
0 comments X

The pith

Dividing full-text biomedical articles into rhetorical facets and using LLM prompts with refinement produces abstracts that are more novel while staying factually consistent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a training-free method to create abstracts for biomedical articles that lack them. It works by breaking the full text into five standard rhetorical sections, summarizing each one separately with language model prompts, and then combining and polishing the results for smooth flow. This approach matters because missing abstracts reduce the usefulness of many papers for search tools and knowledge building in biomedicine. Tests on a large set of 46,309 articles show gains in how much new phrasing the summaries use compared to pulling sentences directly or using trained models, without adding factual mistakes. The work also finds that overly detailed prompts can actually hurt accuracy, suggesting simpler strategies work better.

Core claim

DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence, resulting in improved abstractive novelty over baselines while maintaining factual consistency on the PMC-MAD dataset.

What carries the argument

The divide-prompt-refine process that applies the Background-Objective-Methods-Results-Conclusions rhetorical schema to organize parallel zero-shot summarizations followed by coherence refinement.

Load-bearing premise

The refinement stage can restore global discourse coherence without introducing factual errors or hallucinations that were not present in the individual facet summaries.

What would settle it

If automated or human fact-checking on a held-out portion of the PMC-MAD dataset reveals more factual inconsistencies in the DPR-BAG outputs than in the unrefined facet summaries, this would indicate the refinement step fails to preserve accuracy.

Figures

Figures reproduced from arXiv: 2605.20628 by Dongin Nam, Halil Kilicoglu, Joe Menke, Neil Smalheiser, Shufan Ming, Sylvey Lin.

Figure 1
Figure 1. Figure 1: Overview of the DPR-BAG framework for biomedical abstract generation. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of document token lengths in the [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Token Distribution Comparison [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Publication type distribution of PMC-MAD, [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
read the original abstract

Biomedical abstracts play a critical role in downstream NLP applications, such as information retrieval, biocuration, and biomedical knowledge discovery. However, a non-trivial number of biomedical articles do not have abstracts, diminishing the utility of these articles for downstream tasks. We propose DPR-BAG (Divide, Prompt, and Refine for Biomedical Abstract Generation), a training-free, zero-shot framework that generates coherent and factually grounded abstracts for biomedical articles with full text but no abstract. DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions (BOMRC) schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence. On PMC-MAD, a distribution-aligned dataset of 46,309 biomedical articles, DPR-BAG improves abstractive novelty over strong extractive and fine-tuned baselines, while maintaining factual consistency. Our ablation study reveals a counterintuitive finding: increasing prompt complexity or explicitly injecting entity-level guidance can degrade factual alignment, highlighting the importance of controlled prompting strategies. These findings underscore the potential of training-free, structure-aware frameworks for scalable biomedical abstract generation in low-resource settings. Our data and code are available at https://huggingface.co/datasets/pmc-mad/PMC-MAD and https://github.com/ScienceNLP-Lab/MultiTagger-v2/tree/main/DPR-BAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DPR-BAG, a training-free, zero-shot framework for generating abstracts for biomedical articles that lack them. The approach divides the full text into BOMRC (Background, Objective, Methods, Results, Conclusions) rhetorical facets, generates parallel LLM-based summaries for each, and uses a refinement stage to restore global discourse coherence. On the PMC-MAD dataset comprising 46,309 distribution-aligned biomedical articles, DPR-BAG is shown to improve abstractive novelty compared to strong extractive and fine-tuned baselines while maintaining factual consistency. The ablation study highlights that increasing prompt complexity or adding entity-level guidance can degrade factual alignment, emphasizing controlled prompting.

Significance. Should the empirical results prove robust, this framework offers a valuable contribution to biomedical NLP by providing a scalable method for abstract generation in low-resource scenarios without requiring training data or fine-tuning. The release of the PMC-MAD dataset and code supports reproducibility and further research. The finding regarding prompt complexity provides a useful cautionary insight for LLM-based summarization tasks. The significance is moderated by the need to more thoroughly validate the refinement stage's effect on factual consistency to fully support the central claims.

major comments (2)
  1. [Evaluation and Ablation Studies] The ablation study examines the effects of prompt complexity but does not isolate the contribution of the refinement stage to factual consistency. A direct comparison of factuality metrics (e.g., entity overlap or entailment scores) between the unrefined facet summaries and the final refined abstract is missing, which is critical to confirm that the refinement does not introduce new factual errors or hallucinations as raised in the central claim of maintained consistency.
  2. [Experimental Results] The reported improvements in abstractive novelty on the PMC-MAD dataset lack accompanying statistical significance tests, confidence intervals, or details on multiple LLM sampling runs. Given the inherent variability in LLM outputs, this omission makes it difficult to assess the reliability of the gains over baselines.
minor comments (2)
  1. [Introduction] The BOMRC schema is used without citing prior work on rhetorical structure in biomedical abstracts, which could strengthen the motivation.
  2. [Method] Notation for the refinement prompt could be clarified with an example in the main text rather than appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and agree that the suggested additions will strengthen the empirical support for our claims. We plan to incorporate these changes in the revised manuscript.

read point-by-point responses
  1. Referee: [Evaluation and Ablation Studies] The ablation study examines the effects of prompt complexity but does not isolate the contribution of the refinement stage to factual consistency. A direct comparison of factuality metrics (e.g., entity overlap or entailment scores) between the unrefined facet summaries and the final refined abstract is missing, which is critical to confirm that the refinement does not introduce new factual errors or hallucinations as raised in the central claim of maintained consistency.

    Authors: We agree that isolating the refinement stage's impact is essential to substantiate our claim of maintained factual consistency. In the revised version, we will add a direct comparison using factuality metrics such as entity overlap and entailment scores between the unrefined parallel facet summaries and the final refined abstract. This analysis will clarify whether the refinement step preserves alignment or introduces errors. revision: yes

  2. Referee: [Experimental Results] The reported improvements in abstractive novelty on the PMC-MAD dataset lack accompanying statistical significance tests, confidence intervals, or details on multiple LLM sampling runs. Given the inherent variability in LLM outputs, this omission makes it difficult to assess the reliability of the gains over baselines.

    Authors: We acknowledge the need for statistical rigor given LLM output variability. We will include statistical significance tests (such as paired t-tests), confidence intervals, and results from multiple independent sampling runs in the updated experimental results to better demonstrate the reliability of the observed improvements over baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: framework evaluated on held-out external data with independent baselines

full rationale

The paper describes a training-free Divide-Prompt-Refine framework that decomposes documents into BOMRC facets, generates parallel LLM summaries, and applies a refinement pass for coherence. Core claims rest on empirical results from the PMC-MAD dataset of 46,309 articles, compared against extractive and fine-tuned baselines. No equations, fitted parameters, or first-principles derivations appear; ablations test prompt variations but do not redefine outputs in terms of the framework itself. Evaluation uses held-out data and external metrics, keeping the result self-contained without reduction to author-defined inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that biomedical articles follow a consistent BOMRC rhetorical structure and that current LLMs can produce factually aligned summaries of individual facets when prompted simply.

axioms (1)
  • domain assumption Biomedical full-text articles can be reliably decomposed into the Background-Objective-Methods-Results-Conclusions (BOMRC) schema.
    The decomposition step is presented as the foundation for parallel summarization.

pith-pipeline@v0.9.0 · 5805 in / 1266 out tokens · 35108 ms · 2026-05-21T05:24:34.147999+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions (BOMRC) schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence.

  • IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    On PMC-MAD, a distribution-aligned dataset of 46,309 biomedical articles, DPR-BAG improves abstractive novelty over strong extractive and fine-tuned baselines, while maintaining factual consistency.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    English for Specific Purposes , volume=

    Letters to the editor: Still vigorous after all these years?: A presentation of the discursive and linguistic features of the genre , author=. English for Specific Purposes , volume=. 2006 , publisher=

  2. [2]

    European Journal of Clinical Investigation , volume=

    In-house editorials and journalistic pieces comprise a massive corpus in the scientific literature that can be improved , author=. European Journal of Clinical Investigation , volume=. 2025 , publisher=

  3. [3]

    Journal of biomedical informatics , volume=

    Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , author=. Journal of biomedical informatics , volume=. 2012 , publisher=

  4. [4]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  5. [5]

    Publications Manual , year = "1983", publisher =

  6. [6]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  7. [7]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  8. [8]

    Dan Gusfield , title =. 1997

  9. [9]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  10. [10]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  11. [11]

    When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in

    Yizhou Zhang and Defu Cao and Lun Du and Qiang Fu and Yan Liu , booktitle=. When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in. 2025 , url=

  12. [12]

    Journal of Emerging Technologies in Web Intelligence , year=

    A Survey of Text Summarization Extractive Techniques , author=. Journal of Emerging Technologies in Web Intelligence , year=

  13. [13]

    ACM Trans

    Wang, Tairan and Chen, Xiuying and Zhu, Qingqing and Guo, Taicheng and Gao, Shen and Lu, Zhiyong and Gao, Xin and Zhang, Xiangliang , title =. ACM Trans. Inf. Syst. , month = jun, articleno =. 2025 , issue_date =. doi:10.1145/3733597 , abstract =

  14. [14]

    Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

    Chernyshev, Daniil and Dobrov, Boris , journal=. Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

  15. [15]

    A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

    Gidiotis, Alexios and Tsoumakas, Grigorios , journal=. A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

  16. [16]

    Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

    Shen, Xin and Lam, Wai , booktitle=. Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

  17. [17]

    AMIA Annual Symposium Proceedings , year =

    Lin, Sylvey and Menke, Joseph and Holt, Arthur and Kilicoglu, Halil and Smalheiser, Neil , title =. AMIA Annual Symposium Proceedings , year =

  18. [18]

    Multi-label Sequential Sentence Classification via Large Language Model

    Lan, Mengfei and Zheng, Lecheng and Ming, Shufan and Kilicoglu, Halil. Multi-label Sequential Sentence Classification via Large Language Model. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.944

  19. [19]

    D isco S core: Evaluating Text Generation with BERT and Discourse Coherence

    Zhao, Wei and Strube, Michael and Eger, Steffen. D isco S core: Evaluating Text Generation with BERT and Discourse Coherence. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. doi:10.18653/v1/2023.eacl-main.278

  20. [20]

    N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

    Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065

  21. [21]

    and Manning, Christopher D

    See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099

  22. [22]

    2020 , eprint=

    Longformer: The Long-Document Transformer , author=. 2020 , eprint=

  23. [23]

    A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

    Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Vo...

  24. [24]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Improving Biomedical Information Retrieval with Neural Retrievers , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2022 , month=. doi:10.1609/aaai.v36i10.21352 , abstractNote=

  25. [25]

    Ueda, Alberto and Santos, Rodrygo L. T. and Macdonald, Craig and Ounis, Iadh , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463075 , abstract =

  26. [26]

    Database , volume =

    Wiegers, Thomas C and Davis, Allan Peter and Wiegers, Jolene and Sciaky, Daniela and Barkalow, Fern and Wyatt, Brent and Strong, Melissa and McMorran, Roy and Abrar, Sakib and Mattingly, Carolyn J , title =. Database , volume =. 2025 , month =. doi:10.1093/database/baaf013 , url =

  27. [27]

    PLoS ONE , volume=

    Towards effective clinical decision support systems: A systematic review , author=. PLoS ONE , volume=. 2022 , publisher=. doi:10.1371/journal.pone.0272846 , url=

  28. [28]

    Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua , booktitle=

  29. [29]

    Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries

    Fang, Biaoyan and Dai, Xiang and Karimi, Sarvnaz. Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.578

  30. [30]

    Applied Sciences , VOLUME =

    Giarelis, Nikolaos and Mastrokostas, Charalampos and Karacapilidis, Nikos , TITLE =. Applied Sciences , VOLUME =. 2023 , NUMBER =

  31. [31]

    G en C ompare S um: a hybrid unsupervised summarization method using salience

    Bishop, Jennifer and Xie, Qianqian and Ananiadou, Sophia. G en C ompare S um: a hybrid unsupervised summarization method using salience. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.22

  32. [32]

    L ong T 5: E fficient Text-To-Text Transformer for Long Sequences

    Guo, Mandy and Ainslie, Joshua and Uthus, David and Onta \ n \'o n, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei. L ong T 5: E fficient Text-To-Text Transformer for Long Sequences. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.55

  33. [33]

    Adverse drug event detection and extraction from open data: A deep learning approach , journal =

    Brandon Fan and Weiguo Fan and Carly Smith and Harold ``Skip'' Garner , keywords =. Adverse drug event detection and extraction from open data: A deep learning approach , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ipm.2019.102131 , url =

  34. [34]

    ACM Trans

    Gu, Yu and Tinn, Robert and Cheng, Hao and Lucas, Michael and Usuyama, Naoto and Liu, Xiaodong and Naumann, Tristan and Gao, Jianfeng and Poon, Hoifung , title =. ACM Trans. Comput. Healthcare , month = oct, articleno =. 2021 , issue_date =. doi:10.1145/3458754 , abstract =

  35. [35]

    , title =

    Nuzzo, James L. , title =. Scientometrics , year =. doi:10.1007/s11192-021-04068-w , url =

  36. [36]

    Waaijer, Cathelijn J. F. and van Bochove, Cornelis A. and van Eck, Nees Jan , title =. Scientometrics , year =. doi:10.1007/s11192-010-0205-9 , url =

  37. [37]

    A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature

    Chachra, Suchet and Ben Abacha, Asma and Shooshan, Sonya and Rodriguez, Laritza and Demner-Fushman, Dina. A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016

  38. [38]

    AMIA Annual Symposium Proceedings , volume=

    Publication Type Tagging using Transformer Models and Multi-Label Classification , author=. AMIA Annual Symposium Proceedings , volume=. 2024 , publisher=

  39. [39]

    Does Prompt Formatting Have Any Impact on

    Jia He and Mukund Rungta and David Koleczek and Arshdeep Sekhon and Franklin X Wang and Sadid Hasan , year=. Does Prompt Formatting Have Any Impact on. 2411.10541 , archivePrefix=

  40. [40]

    and Hearst, Marti A

    Laban, Philippe and Schnabel, Tobias and Bennett, Paul N. and Hearst, Marti A. , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00453 , url =

  41. [41]

    A lign S core: Evaluating Factual Consistency with A Unified Alignment Function

    Zha, Yuheng and Yang, Yichi and Li, Ruichen and Hu, Zhiting. A lign S core: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.634

  42. [42]

    M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents

    Tang, Liyan and Laban, Philippe and Durrett, Greg. M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.499

  43. [43]

    2021 , publisher=

    Sybrandt, Justin and Safro, Ilya , journal=. 2021 , publisher=. doi:10.1371/journal.pone.0253905 , url=

  44. [44]

    P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts

    Dernoncourt, Franck and Lee, Ji Young. P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2017

  45. [45]

    S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing

    Neumann, Mark and King, Daniel and Beltagy, Iz and Ammar, Waleed. S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing. Proceedings of the 18th BioNLP Workshop and Shared Task. 2019. doi:10.18653/v1/W19-5034

  46. [46]

    Bodenreider, Olivier , journal =. The. 2004 , month =. doi:10.1093/nar/gkh061 , pmid =

  47. [47]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

  48. [48]

    Weinberger and Yoav Artzi , booktitle=

    Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , booktitle=. 2020 , url=

  49. [49]

    Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

    Ladhak, Faisal and Durmus, Esin and He, He and Cardie, Claire and McKeown, Kathleen. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.100

  50. [50]

    Pretrained Language Models for Sequential Sentence Classification

    Cohan, Arman and Beltagy, Iz and King, Daniel and Dalvi, Bhavana and Weld, Dan. Pretrained Language Models for Sequential Sentence Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1383