pith. machine review for the scientific record. sign in

arxiv: 1904.05342 · v3 · submitted 2019-04-10 · 💻 cs.CL · cs.LG

Recognition: 2 theorem links

· Lean Theorem

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:22 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords ClinicalBERTbidirectional transformersclinical noteshospital readmissionnatural language processingmedical concept relationshipsintensive care unit notes
0
0 comments X

The pith

ClinicalBERT applies bidirectional transformers to clinical notes to outperform baselines in predicting 30-day hospital readmission.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops ClinicalBERT by pre-training and fine-tuning bidirectional transformers on clinical notes to create representations that capture patient information beyond structured data. These representations are shown to reflect high-quality relationships between medical concepts when evaluated by humans. The model improves prediction accuracy for 30-day hospital readmission when applied to discharge summaries and to the first few days of notes from intensive care units. The work addresses the challenge of high-dimensional and sparse clinical text by demonstrating practical gains on an administrative outcome task.

Core claim

ClinicalBERT produces contextual embeddings from clinical notes that uncover medical concept relationships judged as high-quality by humans and that yield better performance than baselines on 30-day readmission prediction using both discharge summaries and early ICU notes.

What carries the argument

Bidirectional transformer architecture trained on clinical notes, which generates contextual word representations for downstream prediction tasks.

If this is right

  • Hospitals could use early notes to flag patients at higher risk of readmission and allocate resources for preventive care.
  • The same note representations could support other clinical prediction tasks that currently rely only on structured fields.
  • Analysis of concept relationships in notes becomes feasible without manual feature engineering.
  • Clinical decision support systems gain access to richer signals from unstructured text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the approach holds, hospitals with different documentation practices would need to retrain or adapt the model rather than deploy it off-the-shelf.
  • The method could be extended to longer time horizons or to predict other events such as mortality or complications by swapping the prediction head.
  • Combining ClinicalBERT embeddings with structured data might further improve performance, though the paper focuses on notes alone.

Load-bearing premise

Human judgments of medical concept relationships and administrative labels for readmission serve as reliable indicators of clinical usefulness and that the learned representations generalize beyond the single hospital's note-writing style.

What would settle it

A controlled test on notes from a second hospital showing that ClinicalBERT embeddings produce no measurable gain over baseline methods on 30-day readmission prediction.

read the original abstract

Clinical notes contain information about patients that goes beyond structured data like lab values and medications. However, clinical notes have been underused relative to structured data, because notes are high-dimensional and sparse. This work develops and evaluates representations of clinical notes using bidirectional transformers (ClinicalBERT). ClinicalBERT uncovers high-quality relationships between medical concepts as judged by humans. ClinicalBert outperforms baselines on 30-day hospital readmission prediction using both discharge summaries and the first few days of notes in the intensive care unit. Code and model parameters are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ClinicalBERT, a bidirectional transformer model fine-tuned on clinical notes from the MIMIC-III database. It claims that the model learns high-quality representations of medical concepts (as judged by human evaluators) and outperforms baselines on 30-day hospital readmission prediction when using either discharge summaries or the first few days of ICU notes. Code and model parameters are publicly released.

Significance. If the empirical results hold, the work provides a reusable domain-adapted model and evaluation framework for clinical NLP, with the public release of code and parameters serving as a clear strength for reproducibility. The dual assessment via human concept evaluation and a downstream prediction task offers a more comprehensive view than task performance alone.

major comments (2)
  1. [Results / Experiments] Results section on readmission prediction: the manuscript reports outperformance but omits details on train/validation/test splits, handling of class imbalance in the readmission labels, and any statistical significance testing of the gains over baselines. These elements are load-bearing for interpreting the central empirical claim.
  2. [Discussion] Discussion / Limitations: all reported results use notes from a single center (MIMIC-III, Beth Israel Deaconess). The paper should explicitly discuss risks to generalization arising from institution-specific documentation conventions and administrative label distributions, as this directly affects the strength of any claim to broader clinical utility.
minor comments (2)
  1. [Abstract] Abstract and methods: the phrase 'the first few days of notes' should be replaced with the precise time window (e.g., 48 hours) used for the early-prediction experiments.
  2. [Human Evaluation] Human evaluation subsection: report the number of raters, rating scale, number of concept pairs evaluated, and any inter-rater agreement statistic to support the 'high-quality relationships' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major point below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: [Results / Experiments] Results section on readmission prediction: the manuscript reports outperformance but omits details on train/validation/test splits, handling of class imbalance in the readmission labels, and any statistical significance testing of the gains over baselines. These elements are load-bearing for interpreting the central empirical claim.

    Authors: We agree that these details strengthen interpretability. The original submission described patient-level partitioning to avoid leakage and noted the class distribution (approximately 10-15% readmission rate), but we will expand the Results section to explicitly state: (1) the exact train/validation/test split ratios and construction method, (2) the use of class-weighted loss to address imbalance, and (3) statistical significance testing via bootstrap resampling with 95% confidence intervals and paired tests against baselines. These additions will be included in the revision. revision: yes

  2. Referee: [Discussion] Discussion / Limitations: all reported results use notes from a single center (MIMIC-III, Beth Israel Deaconess). The paper should explicitly discuss risks to generalization arising from institution-specific documentation conventions and administrative label distributions, as this directly affects the strength of any claim to broader clinical utility.

    Authors: We concur that single-center data limits generalizability claims. The revised Discussion will add a dedicated Limitations paragraph addressing institution-specific documentation styles, variations in administrative coding practices, differences in patient populations and readmission label distributions, and the consequent risks to external validity. We will also note planned future work on multi-center evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical fine-tuning and held-out evaluation on MIMIC-III

full rationale

The paper trains ClinicalBERT via standard masked language modeling and next-sentence prediction on clinical notes, then evaluates readmission prediction on temporally held-out discharge summaries and early ICU notes using external baselines. No equation or claim reduces a prediction to a fitted parameter by construction, no self-citation supplies a uniqueness theorem or ansatz that the current work depends on, and the central performance numbers are produced by direct comparison against non-self-referential models on the same data splits. The single-center limitation is a generalization concern, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central results rest on standard transformer pre-training assumptions plus fine-tuning on MIMIC-style clinical notes; no new entities are postulated and the only free parameters are the usual BERT hyperparameters and fine-tuning schedule.

free parameters (1)
  • BERT fine-tuning hyperparameters
    Learning rate, batch size, and number of epochs chosen during adaptation to clinical notes; values not specified in abstract.
axioms (2)
  • domain assumption Clinical notes contain extractable predictive signal beyond structured data
    Invoked in the motivation and evaluation sections to justify the readmission task.
  • domain assumption Human raters provide a valid proxy for medical concept quality
    Used to judge the quality of uncovered relationships.

pith-pipeline@v0.9.0 · 5380 in / 1244 out tokens · 28234 ms · 2026-05-15T09:22:31.827636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

    cs.CL 2026-05 unverdicted novelty 7.0

    CMR-EXTR extracts structured data from CMR reports at 99.65% variable-level accuracy using teacher-student LLM distillation and three-principle uncertainty estimation for quality control.

  2. Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning

    cs.CL 2026-03 unverdicted novelty 7.0

    RCT couples an LLM and Random Forest via RL feedback so each augments the other's features and rewards, producing consistent gains on three medical datasets.

  3. NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

    cs.AI 2026-05 unverdicted novelty 6.0

    NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.

  4. A renormalization-group inspired lattice-based framework for piecewise generalized linear models

    stat.ME 2026-05 unverdicted novelty 6.0

    RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generali...

  5. Deep Kernel Learning for Stratifying Glaucoma Trajectories

    cs.LG 2026-05 unverdicted novelty 6.0

    A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current ...

  6. REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

    cs.CR 2026-04 unverdicted novelty 6.0

    REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.

  7. CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction

    cs.CL 2026-04 unverdicted novelty 6.0

    CURA improves calibration of clinical LM risk predictions by combining individual error alignment with neighborhood-based soft labels without harming discrimination on MIMIC-IV tasks.

  8. Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making

    cs.AI 2026-04 unverdicted novelty 6.0

    CN-PR learns reward functions from LLM-derived preferences over clinical trajectories to improve RL policies for sequential treatment decisions, showing correlation with quality scores and better recovery outcomes.

  9. EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

    cs.CR 2026-04 unverdicted novelty 6.0

    EncFormer reduces online MPC communication by 1.4x-30.4x and end-to-end latency by 1.3x-9.8x versus prior hybrid FHE-MPC systems for private GPT- and BERT-style inference while preserving accuracy.

  10. Clinical Note Bloat Reduction for Efficient LLM Use

    cs.CY 2026-03 conditional novelty 6.0

    TRACE removes 47.3% of text from clinical notes by targeting bloat and preserves performance on information extraction and outcome prediction tasks.

  11. From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing

    cs.CY 2026-03 unverdicted novelty 6.0

    The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.

  12. BloombergGPT: A Large Language Model for Finance

    cs.LG 2023-03 conditional novelty 6.0

    BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

  13. Training Large Language Models to Predict Clinical Events

    cs.LG 2026-05 unverdicted novelty 5.0

    Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.

  14. AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

    cs.AI 2026-05 unverdicted novelty 5.0

    Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

  15. Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction

    cs.AI 2026-05 unverdicted novelty 5.0

    LLMs match or beat supervised BERT models on detecting whether a discharge note contains an actionable clinical task but trail on classifying the exact type of action, pointing to the need for datasets that explain wh...

  16. Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

    cs.LG 2026-04 unverdicted novelty 5.0

    Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.

  17. From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

    cs.AI 2026-04 unverdicted novelty 5.0

    CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and...

  18. Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

    cs.CV 2026-05 unverdicted novelty 4.0

    Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.

  19. Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

    cs.CV 2026-05 unverdicted novelty 4.0

    Retina-RAG combines a DR classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade retinopathy, detect macular edema, and generate reports, reaching F1 0.731/0.948 and ROUGE-L 0.429 on a retinal dataset while runnin...

  20. A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

    cs.CL 2026-04 unverdicted novelty 4.0

    Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

  21. A Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generation

    cs.IR 2026-05 unverdicted novelty 2.0

    A hybrid RAG system with retrieval, Cohere reranking, and claim-level LLM judgment achieves 100% grounding accuracy on 200 claims from 25 biomedical queries in a pilot study.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · cited by 20 Pith papers · 3 internal anchors

  1. [1]

    Publicly Available Clinical BERT Embeddings

    E. Alsentzer, J. R. Murphy, W. Boag, W.-H. Weng, D. Jin, T. Naumann, and M. B. A. McDermott. “Publicly Available Clinical BERT Embeddings”. In:arXiv:1904.03323(2019)

  2. [2]

    Hospital readmissions in the Medicare population

    G. F. Anderson and E. P. Steinberg. “Hospital readmissions in the Medicare population”. In:New England Journal of Medicine 21 (1984)

  3. [3]

    Aninformatics-based approach to reducing heart failure all-cause readmissions: the Stanfordheartfailuredashboard

    D. Banerjee, C. Thompson, C. Kell, R. Shetty, Y. Vetteth, H. Grossman,A.DiBiase,andM.Fowler.“Aninformatics-based approach to reducing heart failure all-cause readmissions: the Stanfordheartfailuredashboard”.In: JournaloftheAmerican Medical Informatics Association3 (2016)

  4. [4]

    Dynamic Hierarchical Clas- sification for Patient Risk-of-Readmission

    S. Basu Roy, A. Teredesai, K. Zolfaghar, R. Liu, D. Hazel, S. Newman, and A. Marinez. “Dynamic Hierarchical Clas- sification for Patient Risk-of-Readmission”. In:Knowledge Discovery and Data Mining(2015)

  5. [5]

    What’s in a Note? Unpacking Predictive Value in Clinical Note Repre- sentations

    W. Boag, D. Doss, T. Naumann, and P. Szolovits. “What’s in a Note? Unpacking Predictive Value in Clinical Note Repre- sentations”.In:AMIAJointSummitsonTranslationalScience (2018)

  6. [6]

    Enrich- ingwordvectorswithsubwordinformation

    P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. “Enrich- ingwordvectorswithsubwordinformation”.In: Transactions of the Association for Computational Linguistics(2017)

  7. [7]

    Real-time prediction of mor- tality, readmission, and length of stay using electronic health record data

    X. Cai, O. Perez-Concha, E. Coiera, F. Martin-Sanchez, R. Day, D. Roffe, and B. Gallego. “Real-time prediction of mor- tality, readmission, and length of stay using electronic health record data”. In:Journal of the American Medical Informat- ics Association3 (2015)

  8. [8]

    Intelligible models for healthcare: Predicting pneu- monia risk and hospital 30-day readmission

    R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. El- hadad. “Intelligible models for healthcare: Predicting pneu- monia risk and hospital 30-day readmission”. In:Knowledge Discovery and Data Mining. 2015

  9. [9]

    Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for addi- tional creative solutions

    W.W.Chapman,P.M.Nadkarni,L.Hirschman,L.W.D’Avolio, G. K. Savova, and O. Uzuner. “Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for addi- tional creative solutions”. In:Journal of the American Medi- cal Informatics Association5 (2011)

  10. [10]

    How to Train good Word Embeddings for Biomedical NLP

    B. Chiu, G. Crichton, A. Korhonen, and S. Pyysalo. “How to Train good Word Embeddings for Biomedical NLP”. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, ACL 2016()

  11. [11]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. “BERT: Pre-trainingofDeepBidirectionalTransformersforLanguage Understanding”. In:arXiv:1810.04805(2018)

  12. [12]

    A comparison of models for predicting early hospital readmissions

    J. Futoma, J. Morris, and J. Lucas. “A comparison of models for predicting early hospital readmissions”. In:Journal of Biomedical Informatics(2015)

  13. [13]

    Opportunities and challenges in developing risk pre- diction models with electronic health records data: a system- atic review

    B.A.Goldstein,A.M.Navar,M.J.Pencina,andJ.P.A.Ioan- nidis. “Opportunities and challenges in developing risk pre- diction models with electronic health records data: a system- atic review”. In:Journal of the American Medical Informat- ics Association(2017)

  14. [14]

    Long Short-Term Mem- ory

    S. Hochreiter and J. Schmidhuber. “Long Short-Term Mem- ory”. In:Neural Computation8 (1997)

  15. [15]

    MIMIC-III, a freely accessible critical care database

    A.E.W.Johnson, T.J.Pollard,L.Shen,L. -w.H.Lehman,M. Feng,M.Ghassemi,B.Moody,P.Szolovits,L.AnthonyCeli, and R. G. Mark. “MIMIC-III, a freely accessible critical care database”. In:Scientific Data(2016)

  16. [16]

    Documentation of mandated discharge summary components in transitions from acute to subacute care

    A. J. Kind and M. A. Smith. “Documentation of mandated discharge summary components in transitions from acute to subacute care”. In:Agency for Healthcare Research and Quality (2008). CHIL ’20 Workshop, April 02–04, 2020, Toronto, ON Kexin Huang, Jaan Altosaar, and Rajesh Ranganath

  17. [17]

    BioBERT: a pre-trained biomedical language repre- sentationmodelforbiomedicaltextmining

    J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. “BioBERT: a pre-trained biomedical language repre- sentationmodelforbiomedicaltextmining”.In: arXiv:1901.08746 (2019)

  18. [18]

    Deep EHR: Chronic Disease Prediction Using Medical Notes

    J. Liu, Z. Zhang, and N. Razavian. “Deep EHR: Chronic Disease Prediction Using Medical Notes”. In:Proceedings of the 3rd Machine Learning for Healthcare Conference. 2018

  19. [19]

    Visualizing data using t- SNE

    L. van der Maaten and G. Hinton. “Visualizing data using t- SNE”. In:Journal of Machine Learning Research(2008)

  20. [20]

    Distributed representations of words and phrases and their compositionality

    T.Mikolov,I.Sutskever,K.Chen,G.S.Corrado,andJ.Dean. “Distributed representations of words and phrases and their compositionality”. In:Advances in Neural Information Pro- cessing Systems. 2013

  21. [21]

    ASHP national survey of pharmacy practice in hospital settings: Pre- scribing and transcribing—2016

    C.A.Pedersen,P.J.Schneider,andD.J.Scheckelhoff.“ASHP national survey of pharmacy practice in hospital settings: Pre- scribing and transcribing—2016”. In:American Journal of Health-System Pharmacy17 (2017)

  22. [22]

    Measuresofsemanticsimilarityandrelatednessinthebiomed- ical domain

    T.Pedersen,S.V.Pakhomov,S.Patwardhan,andC.G.Chute. “Measuresofsemanticsimilarityandrelatednessinthebiomed- ical domain”. In:Journal of Biomedical Informatics3 (2007)

  23. [23]

    Glove: Global Vectors for Word Representation

    J. Pennington, R. Socher, and C. Manning. “Glove: Global Vectors for Word Representation”. In:EMNLP (2014)

  24. [24]

    Deep contextualized word representations

    M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. “Deep contextualized word rep- resentations”. In:arXiv:1802.05365(2018)

  25. [25]

    Improving Language Understanding by Gen- erative Pre-Training

    A. Radford. “Improving Language Understanding by Gen- erative Pre-Training”. https://s3-us-west-2.amazonaws. com/openai-assets/research-covers/language-unsupervised/ language_understanding_paper.pdf. 2018

  26. [26]

    Scalable and accurate deep learning with electronic health records

    A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, P. Sundberg, H. Yee, K. Zhang, Y. Zhang, G. Flores, G. E. Duggan, J. Irvine, Q. Le, K. Litsch, A. Mossin, J. Tansuwan, D. Wang, J. Wexler, J. Wilson, D. Ludwig, S. L. Volchenboum, K. Chou, M. Pearson, S. Madabushi, N. H. Shah, A. J. Butte, M. D. Howell, C. ...

  27. [27]

    Bidirectionalrecurrentneural networks

    M.SchusterandK.K.Paliwal.“Bidirectionalrecurrentneural networks”. In:IEEE Trans. Signal Processing(1997)

  28. [28]

    Alarm fatigue: a patient safety concern

    S. Sendelbach and M. Funk. “Alarm fatigue: a patient safety concern”. In:AACN Advanced Critical Care4 (2013)

  29. [29]

    Neural Machine Translation of Rare Words with Subword Units

    R. Sennrich, B. Haddow, and A. Birch. “Neural Machine Translation of Rare Words with Subword Units”. In:Proceed- ings of the 54th Annual Meeting of the Association for Com- putational Linguistics. 2016

  30. [30]

    DeepEHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis

    B.Shickel,P.J.Tighe,A.Bihorac,andP.Rashidi.“DeepEHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis”. In:IEEE Journal of Biomedical and Health Informatics5 (2018)

  31. [31]

    Enhancing clinical concept extraction with contextual embeddings

    Y. Si, J. Wang, H. Xu, and K. Roberts. “Enhancing clinical concept extraction with contextual embeddings”. In:Journal of the American Medical Informatics Association11 (2019)

  32. [32]

    Ef- fect of discharge summary availability during post-discharge visits on hospital readmission

    C. Van Walraven, R. Seth, P. C. Austin, and A. Laupacis. “Ef- fect of discharge summary availability during post-discharge visits on hospital readmission”. In:Journal of General Inter- nal Medicine3 (2002)

  33. [33]

    Attention is all you need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. “Attention is all you need”. In:Advances in Neural Information Processing Systems. 2017

  34. [34]

    A comparison of word embeddings for the biomedical natural language processing

    Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, P. Kingsbury, and H. Liu. “A comparison of word embeddings for the biomedical natural language processing”. In:Journal of Biomedical Informatics(2018)

  35. [35]

    MedicalSubdomainClassificationofClin- ical Notes Using a Machine Learning-Based Natural Lan- guage Processing Approach

    W.-H. Weng, K. B. Wagholikar, A. T. McCray, P. Szolovits, andH.C.Chueh.“MedicalSubdomainClassificationofClin- ical Notes Using a Machine Learning-Based Natural Lan- guage Processing Approach”. In:BMC Medical Informatics and Decision Making1 (2017)

  36. [36]

    Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review

    C. Xiao, E. Choi, and J. Sun. “Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review”. In:Journal of the Ameri- can Medical Informatics Association10 (2018)

  37. [37]

    Artificial intelli- gence in healthcare

    K.-H. Yu, A. L. Beam, and I. S. Kohane. “Artificial intelli- gence in healthcare”. In:Nature Biomedical Engineering10 (2018)

  38. [38]

    Understanding bag-of- words model: a statistical framework

    Y. Zhang, R. Jin, and Z.-H. Zhou. “Understanding bag-of- words model: a statistical framework”. In:International Jour- nal of Machine Learning and Cybernetics1 (2010)

  39. [39]

    Multi-Label Learning from Medical Plain Text with Convolutional Resid- ual Models

    Y.Zhang,R.Henao,Z.Gan,Y.Li,andL.Carin.“Multi-Label Learning from Medical Plain Text with Convolutional Resid- ual Models”. In:Proceedings of the 3rd Machine Learning for Healthcare Conference. 2018

  40. [40]

    Readmissions, observation, and the hospital readmissions reduction program

    R. B. Zuckerman, S. H. Sheingold, E. J. Orav, J. Ruhter, and A. M. Epstein. “Readmissions, observation, and the hospital readmissions reduction program”. In:New England Journal of Medicine16 (2016). A Hyperparameters and training details The parameters are initialized to thebertBase parameters released by[11];wefollowtheirrecommendedhyper-parametersetting...