CMR-EXTR extracts structured data from CMR reports at 99.65% variable-level accuracy using teacher-student LLM distillation and three-principle uncertainty estimation for quality control.
hub
Clinicalbert: Modeling clinical notes and predicting hospital readmission
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
CURA improves calibration of clinical LM risk predictions by combining individual error alignment with neighborhood-based soft labels without harming discrimination on MIMIC-IV tasks.
CN-PR learns reward functions from LLM-derived preferences over clinical trajectories to improve RL policies for sequential treatment decisions, showing correlation with quality scores and better recovery outcomes.
EncFormer reduces online MPC communication by 1.4x-30.4x and end-to-end latency by 1.3x-9.8x versus prior hybrid FHE-MPC systems for private GPT- and BERT-style inference while preserving accuracy.
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
LLMs match or beat supervised BERT models on detecting whether a discharge note contains an actionable clinical task but trail on classifying the exact type of action, pointing to the need for datasets that explain why each span was labeled actionable.
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.
A hybrid RAG system with retrieval, Cohere reranking, and claim-level LLM judgment achieves 100% grounding accuracy on 200 claims from 25 biomedical queries in a pilot study.
citing papers explorer
-
Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
CMR-EXTR extracts structured data from CMR reports at 99.65% variable-level accuracy using teacher-student LLM distillation and three-principle uncertainty estimation for quality control.
-
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.
-
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
-
Deep Kernel Learning for Stratifying Glaucoma Trajectories
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
-
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
-
CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction
CURA improves calibration of clinical LM risk predictions by combining individual error alignment with neighborhood-based soft labels without harming discrimination on MIMIC-IV tasks.
-
Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making
CN-PR learns reward functions from LLM-derived preferences over clinical trajectories to improve RL policies for sequential treatment decisions, showing correlation with quality scores and better recovery outcomes.
-
EncFormer: Secure and Efficient Transformer Inference over Encrypted Data
EncFormer reduces online MPC communication by 1.4x-30.4x and end-to-end latency by 1.3x-9.8x versus prior hybrid FHE-MPC systems for private GPT- and BERT-style inference while preserving accuracy.
-
BloombergGPT: A Large Language Model for Finance
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
-
Training Large Language Models to Predict Clinical Events
Training a LoRA adapter on 6,900 examples derived from MIMIC-III notes reduces expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145 for clinical event prediction.
-
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
-
Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction
LLMs match or beat supervised BERT models on detecting whether a discharge note contains an actionable clinical task but trail on classifying the exact type of action, pointing to the need for datasets that explain why each span was labeled actionable.
-
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
-
From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
-
Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.
-
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.
-
A Hybrid Retrieval and Reranking Framework for Evidence-Grounded Retrieval-Augmented Generation
A hybrid RAG system with retrieval, Cohere reranking, and claim-level LLM judgment achieves 100% grounding accuracy on 200 claims from 25 biomedical queries in a pilot study.