hub

arXiv preprint arXiv:2406.19280 , year=

Huatuogpt-vision, towards injecting medical visual knowledge into multimodal llms at scale , author= · 2025 · arXiv 2406.19280

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

cs.CV · 2026-05-10 · accept · novelty 8.0

DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

cs.CV · 2026-05-07 · unverdicted · novelty 8.0

MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows

cs.CV · 2026-03-25 · conditional · novelty 8.0

MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.

DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

DermAgent orchestrates seven vision-language tools in a Plan-Execute-Reflect loop with dual-modality retrieval from 413k cases and a critic module to outperform GPT-4o by 17.6% in zero-shot dermatological diagnosis accuracy.

X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

X-PCR is a new benchmark of 26,415 images and 177,868 expert VQA pairs that evaluates MLLMs on six-stage progressive reasoning and cross-modality integration in ophthalmology.

MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models

cs.CL · 2026-02-25 · conditional · novelty 7.0

MEDSYN benchmark shows MLLMs match experts on differential diagnosis lists but have much larger gaps to final diagnosis selection than humans, due to text overreliance and cross-modal evidence gaps.

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation

cs.CV · 2026-01-06 · conditional · novelty 7.0

IBISAgent enables MLLMs to perform iterative pixel-level visual reasoning for biomedical object referring and segmentation via text-based clicks and agentic RL, outperforming prior SOTA methods without model modifications.

Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

ViToS uses dual-stream RL with cross-feedback optimization to prune medical image tokens to 77% length while reporting 108.27% and 104.16% relative performance on two 7B VLMs across seven benchmarks.

ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

ASAP introduces an anatomy-aware semantically-adaptive pre-training method for medical volumetric vision-language models and reports state-of-the-art results on a new benchmark spanning 15 datasets and 22 tasks.

DIYHealth Suite: Dataset, Model, and Benchmark for Health Management at Home

cs.CY · 2026-05-01 · unverdicted · novelty 6.0

DIYHealth Suite introduces a large home-care dataset, DIYHealthGPT model with Hybrid Hyper Low-Rank Adaptation, and DIYHealthBench, claiming SOTA results on 11 tasks over general and medical baselines.

Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

GazeX uses radiologist gaze trajectories as a behavioral prior during pretraining to generate more accurate and expert-consistent results in chest X-ray report generation, disease grounding, and visual question answering.

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

MedRCube is a new fine-grained evaluation framework that benchmarks 33 MLLMs on medical imaging, ranks Lingshu-32B highest, and finds a significant positive link between shortcut behaviors and diagnostic performance.

HeartcareGPT: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding

cs.LG · 2025-06-06 · unverdicted · novelty 6.0

HeartcareGPT proposes Dual Stream Projection Alignment (DSPA) on a structure-aware tokenizer for unified ECG signal-image modeling, supported by Heartcare-400K dataset and Heartcare-Bench.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

cs.CL · 2024-12-25 · unverdicted · novelty 6.0

HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

VITAL adds visual-semantic dual supervision during training of medical MLLMs for latent reasoning, yielding SOTA results on 7 benchmarks with a new 61K multi-modality dataset while enabling post-hoc textual and visual explanations at zero inference overhead.

Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.

RoiMAM: Region-of-Interest Medical Attention Model for Efficient Vision-Language Understanding

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

RoiMAM integrates a training-free ROI Generation Module with Semantic Selective Suppression and a Text Prompt Enhancer to produce a compact VLM that reports 2 percent and 4.6 percent accuracy gains on SLAKE and PMC-VQA at less than 20 percent the size of MedVInT-TD.

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

cs.AI · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

cs.CV · 2026-04-08 · unverdicted · novelty 5.0

The UPDP pipeline filters privacy terms and generates de-identified radiology images that preserve diagnostic pathology information, enabling models with competitive disease detection accuracy but reduced identity leakage and improved cross-hospital performance.

Thought Graph Traversal for Test-time Scaling in Chest X-ray VLLMs

cs.CV · 2025-06-13 · unverdicted · novelty 5.0

A new prompting framework called Thought Graph Traversal combined with reasoning budget forcing improves test-time performance of frozen chest X-ray VLLMs on report generation benchmarks.

citing papers explorer

Showing 20 of 20 citing papers.

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents cs.CV · 2026-05-10 · accept · none · ref 13
DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.
MedHorizon: Towards Long-context Medical Video Understanding in the Wild cs.CV · 2026-05-07 · unverdicted · none · ref 62
MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows cs.CV · 2026-03-25 · conditional · none · ref 3
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making cs.CV · 2026-05-14 · unverdicted · none · ref 4
DermAgent orchestrates seven vision-language tools in a Plan-Execute-Reflect loop with dual-modality retrieval from 413k cases and a critic module to outperform GPT-4o by 17.6% in zero-shot dermatological diagnosis accuracy.
X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis cs.CV · 2026-04-22 · unverdicted · none · ref 8
X-PCR is a new benchmark of 26,415 images and 177,868 expert VQA pairs that evaluates MLLMs on six-stage progressive reasoning and cross-modality integration in ophthalmology.
MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models cs.CL · 2026-02-25 · conditional · none · ref 1
MEDSYN benchmark shows MLLMs match experts on differential diagnosis lists but have much larger gaps to final diagnosis selection than humans, due to text overreliance and cross-modal evidence gaps.
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation cs.CV · 2026-01-06 · conditional · none · ref 4
IBISAgent enables MLLMs to perform iterative pixel-level visual reasoning for biomedical object referring and segmentation via text-based clicks and agentic RL, outperforming prior SOTA methods without model modifications.
Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning cs.CV · 2026-06-30 · unverdicted · none · ref 5
ViToS uses dual-stream RL with cross-feedback optimization to prune medical image tokens to 77% length while reporting 108.27% and 104.16% relative performance on two 7B VLMs across seven benchmarks.
ASAP: Advancing Medical Volumetric Representation Learning with Anatomy-aware Semantically-adaptive Pre-training cs.CV · 2026-05-30 · unverdicted · none · ref 108
ASAP introduces an anatomy-aware semantically-adaptive pre-training method for medical volumetric vision-language models and reports state-of-the-art results on a new benchmark spanning 15 datasets and 22 tasks.
DIYHealth Suite: Dataset, Model, and Benchmark for Health Management at Home cs.CY · 2026-05-01 · unverdicted · none · ref 84
DIYHealth Suite introduces a large home-care dataset, DIYHealthGPT model with Hybrid Hyper Low-Rank Adaptation, and DIYHealthBench, claiming SOTA results on 11 tasks over general and medical baselines.
Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning cs.AI · 2026-04-15 · unverdicted · none · ref 43
GazeX uses radiologist gaze trajectories as a behavioral prior during pretraining to generate more accurate and expert-consistent results in chest X-ray report generation, disease grounding, and visual question answering.
MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging cs.CL · 2026-04-15 · unverdicted · none · ref 13
MedRCube is a new fine-grained evaluation framework that benchmarks 33 MLLMs on medical imaging, ranks Lingshu-32B highest, and finds a significant positive link between shortcut behaviors and diagnostic performance.
HeartcareGPT: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding cs.LG · 2025-06-06 · unverdicted · none · ref 10
HeartcareGPT proposes Dual Stream Projection Alignment (DSPA) on a structure-aware tokenizer for unified ECG signal-image modeling, supported by Heartcare-400K dataset and Heartcare-Bench.
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 55
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.
VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs cs.CV · 2026-05-27 · unverdicted · none · ref 1
VITAL adds visual-semantic dual supervision during training of medical MLLMs for latent reasoning, yielding SOTA results on 7 benchmarks with a new 61K multi-modality dataset while enabling post-hoc textual and visual explanations at zero inference overhead.
Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis cs.CV · 2026-05-19 · unverdicted · none · ref 40
TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.
RoiMAM: Region-of-Interest Medical Attention Model for Efficient Vision-Language Understanding cs.CV · 2026-05-15 · unverdicted · none · ref 13
RoiMAM integrates a training-free ROI Generation Module with Semantic Selective Suppression and a Text Prompt Enhancer to produce a compact VLM that reports 2 percent and 4.6 percent accuracy gains on SLAKE and PMC-VQA at less than 20 percent the size of MedVInT-TD.
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks cs.AI · 2026-05-11 · unverdicted · none · ref 42 · 2 links
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing cs.CV · 2026-04-08 · unverdicted · none · ref 7
The UPDP pipeline filters privacy terms and generates de-identified radiology images that preserve diagnostic pathology information, enabling models with competitive disease detection accuracy but reduced identity leakage and improved cross-hospital performance.
Thought Graph Traversal for Test-time Scaling in Chest X-ray VLLMs cs.CV · 2025-06-13 · unverdicted · none · ref 30
A new prompting framework called Thought Graph Traversal combined with reasoning budget forcing improves test-time performance of frozen chest X-ray VLLMs on report generation benchmarks.

arXiv preprint arXiv:2406.19280 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer