hub

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Kexin Huang, Jaan Altosaar, Rajesh Ranganath · 2019 · cs.CL · arXiv 1904.05342

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

Clinical notes contain information about patients that goes beyond structured data like lab values and medications. However, clinical notes have been underused relative to structured data, because notes are high-dimensional and sparse. This work develops and evaluates representations of clinical notes using bidirectional transformers (ClinicalBERT). ClinicalBERT uncovers high-quality relationships between medical concepts as judged by humans. ClinicalBert outperforms baselines on 30-day hospital readmission prediction using both discharge summaries and the first few days of notes in the intensive care unit. Code and model parameters are available.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CMR-EXTR extracts structured data from CMR reports at 99.65% variable-level accuracy using teacher-student LLM distillation and three-principle uncertainty estimation for quality control.

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.

A renormalization-group inspired lattice-based framework for piecewise generalized linear models

stat.ME · 2026-05-06 · unverdicted · novelty 6.0

RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.

Deep Kernel Learning for Stratifying Glaucoma Trajectories

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.

CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

CURA improves calibration of clinical LM risk predictions by combining individual error alignment with neighborhood-based soft labels without harming discrimination on MIMIC-IV tasks.

EncFormer: Secure and Efficient Transformer Inference over Encrypted Data

cs.CR · 2026-04-11 · unverdicted · novelty 6.0

EncFormer reduces online MPC communication by 1.4x-30.4x and end-to-end latency by 1.3x-9.8x versus prior hybrid FHE-MPC systems for private GPT- and BERT-style inference while preserving accuracy.

Clinical Note Bloat Reduction for Efficient LLM Use

cs.CY · 2026-03-21 · conditional · novelty 6.0

TRACE removes 47.3% of text from clinical notes by targeting bloat and preserves performance on information extraction and outcome prediction tasks.

From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing

cs.CY · 2026-03-12 · unverdicted · novelty 6.0

The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

cs.CV · 2023-06-01 · unverdicted · novelty 6.0

LLaVA-Med is created via curriculum fine-tuning on PubMed figure-caption pairs and GPT-4 self-instructed data, achieving competitive or better results than prior supervised models on three biomedical VQA benchmarks.

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

Sequence models reveal diagnosis accumulation pathways beyond comorbidity burden in population-scale hospital data

physics.soc-ph · 2026-05-29 · unverdicted · novelty 5.0

Sequence embeddings from diagnosis histories improve prediction of 93 of 131 incident disease blocks and event-free survival beyond age, sex, and comorbidity burden in large-scale hospital data.

Training Large Language Models to Predict Clinical Events

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

LLMs match or beat supervised BERT models on detecting whether a discharge note contains an actionable clinical task but trail on classifying the exact type of action, pointing to the need for datasets that explain why each span was labeled actionable.

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.

From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

cs.CV · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.

A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

cs.CL · 2026-04-08 · unverdicted · novelty 4.0

Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

cs.CL · 2025-12-27 · unverdicted · novelty 4.0

Llama 3.1 annotates Polish medical texts to train DistilBERT classifiers achieving F1 scores above 0.80 that are 500 times smaller than the teacher model.

Towards the Anonymization of the Language Modeling

cs.CL · 2025-01-05 · unverdicted · novelty 4.0

Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.

A Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generation

cs.IR · 2026-05-03 · unverdicted · novelty 2.0

A hybrid RAG system with retrieval, Cohere reranking, and claim-level LLM judgment achieves 100% grounding accuracy on 200 claims from 25 biomedical queries in a pilot study.

Learning Preference-Based Objectives from Clinical Narratives for Dynamic Sepsis Treatment

cs.AI · 2026-04-12

citing papers explorer

Showing 25 of 25 citing papers.

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs cs.CL · 2026-05-08 · unverdicted · none · ref 21 · internal anchor
CMR-EXTR extracts structured data from CMR reports at 99.65% variable-level accuracy using teacher-student LLM distillation and three-principle uncertainty estimation for quality control.
Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale cs.CL · 2026-05-18 · unverdicted · none · ref 40 · internal anchor
LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation cs.AI · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.
A renormalization-group inspired lattice-based framework for piecewise generalized linear models stat.ME · 2026-05-06 · unverdicted · none · ref 284 · internal anchor
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
Deep Kernel Learning for Stratifying Glaucoma Trajectories cs.LG · 2026-05-01 · unverdicted · none · ref 15 · internal anchor
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version) cs.CR · 2026-04-30 · unverdicted · none · ref 17 · internal anchor
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction cs.CL · 2026-04-16 · unverdicted · none · ref 1 · internal anchor
CURA improves calibration of clinical LM risk predictions by combining individual error alignment with neighborhood-based soft labels without harming discrimination on MIMIC-IV tasks.
EncFormer: Secure and Efficient Transformer Inference over Encrypted Data cs.CR · 2026-04-11 · unverdicted · none · ref 3 · internal anchor
EncFormer reduces online MPC communication by 1.4x-30.4x and end-to-end latency by 1.3x-9.8x versus prior hybrid FHE-MPC systems for private GPT- and BERT-style inference while preserving accuracy.
Clinical Note Bloat Reduction for Efficient LLM Use cs.CY · 2026-03-21 · conditional · none · ref 22 · internal anchor
TRACE removes 47.3% of text from clinical notes by targeting bloat and preserves performance on information extraction and outcome prediction tasks.
From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing cs.CY · 2026-03-12 · unverdicted · none · ref 15 · internal anchor
The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day cs.CV · 2023-06-01 · unverdicted · none · ref 14 · internal anchor
LLaVA-Med is created via curriculum fine-tuning on PubMed figure-caption pairs and GPT-4 self-instructed data, achieving competitive or better results than prior supervised models on three biomedical VQA benchmarks.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 47 · internal anchor
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Sequence models reveal diagnosis accumulation pathways beyond comorbidity burden in population-scale hospital data physics.soc-ph · 2026-05-29 · unverdicted · none · ref 6 · internal anchor
Sequence embeddings from diagnosis histories improve prediction of 93 of 131 incident disease blocks and event-free survival beyond age, sex, and comorbidity burden in large-scale hospital data.
Training Large Language Models to Predict Clinical Events cs.LG · 2026-05-12 · unverdicted · none · ref 6 · internal anchor
Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks cs.AI · 2026-05-11 · unverdicted · none · ref 54 · internal anchor
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction cs.AI · 2026-05-07 · unverdicted · none · ref 7 · internal anchor
LLMs match or beat supervised BERT models on detecting whether a discharge note contains an actionable clinical task but trail on classifying the exact type of action, pointing to the need for datasets that explain why each span was labeled actionable.
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling cs.LG · 2026-04-20 · unverdicted · none · ref 43 · internal anchor
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning cs.AI · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation cs.CV · 2026-05-07 · unverdicted · none · ref 38 · 2 links · internal anchor
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering cs.CL · 2026-04-08 · unverdicted · none · ref 5 · internal anchor
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.
ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language cs.CL · 2025-12-27 · unverdicted · none · ref 21 · internal anchor
Llama 3.1 annotates Polish medical texts to train DistilBERT classifiers achieving F1 scores above 0.80 that are 500 times smaller than the teacher model.
Towards the Anonymization of the Language Modeling cs.CL · 2025-01-05 · unverdicted · none · ref 28 · internal anchor
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
A Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generation cs.IR · 2026-05-03 · unverdicted · none · ref 23 · internal anchor
A hybrid RAG system with retrieval, Cohere reranking, and claim-level LLM judgment achieves 100% grounding accuracy on 200 claims from 25 biomedical queries in a pilot study.
Learning Preference-Based Objectives from Clinical Narratives for Dynamic Sepsis Treatment cs.AI · 2026-04-12 · unreviewed · ref 18 · internal anchor
Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning cs.CL · 2026-03-24 · unreviewed · ref 20 · internal anchor

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer