hub

arXiv preprint arXiv:2305.10415 , year=

Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie · 2023 · arXiv 2305.10415

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

cs.CV · 2026-05-10 · accept · novelty 8.0

DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows

cs.CV · 2026-03-25 · conditional · novelty 8.0

MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

cs.CV · 2026-05-07 · conditional · novelty 7.0

Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.

CheXthought: A global multimodal dataset of clinical chain-of-thought reasoning and visual attention for chest X-ray interpretation

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

CheXthought supplies large-scale expert chain-of-thought reasoning and synchronized visual attention data for chest X-rays to train more accurate and interpretable clinical vision-language models.

X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

X-PCR is a new benchmark of 26,415 images and 177,868 expert VQA pairs that evaluates MLLMs on six-stage progressive reasoning and cross-modality integration in ophthalmology.

Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQA

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Self-verification in medical VQA creates a verification mirage where verifiers exhibit high error and agreement bias on wrong answers, with reliability strongly conditioned on task type.

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

MedSynapse-V evolves latent diagnostic memories via meta queries, causal counterfactual refinement with RL, and dual-branch memory transition to outperform prior medical VLM methods in diagnostic accuracy.

Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

DCI unifies backdoor adjustment and instrumental variable learning in MedVQA to extract deconfounded representations, yielding better out-of-distribution performance on SLAKE, VQA-RAD and similar benchmarks.

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

MedRCube is a new fine-grained evaluation framework that benchmarks 33 MLLMs on medical imaging, ranks Lingshu-32B highest, and finds a significant positive link between shortcut behaviors and diagnostic performance.

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.

Improving Medical VQA through Trajectory-Aware Process Supervision

cs.LG · 2026-04-10 · conditional · novelty 6.0

A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.

LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

A Medical Entity Tree organizes medical knowledge to engineer higher-quality training data that boosts general MLLMs on medical benchmarks.

MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MedLVR interleaves latent visual reasoning segments in autoregressive decoding and uses two-stage training to raise average medical VQA accuracy from 48.3% to 53.4% over a Qwen2.5-VL-7B backbone on OmniMedVQA and five other benchmarks.

citing papers explorer

Showing 17 of 17 citing papers.

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents cs.CV · 2026-05-10 · accept · none · ref 61
DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows cs.CV · 2026-03-25 · conditional · none · ref 16
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs cs.CV · 2026-05-07 · conditional · none · ref 44
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
CheXthought: A global multimodal dataset of clinical chain-of-thought reasoning and visual attention for chest X-ray interpretation cs.CV · 2026-04-29 · unverdicted · none · ref 38
CheXthought supplies large-scale expert chain-of-thought reasoning and synchronized visual attention data for chest X-rays to train more accurate and interpretable clinical vision-language models.
X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis cs.CV · 2026-04-22 · unverdicted · none · ref 58
X-PCR is a new benchmark of 26,415 images and 177,868 expert VQA pairs that evaluates MLLMs on six-stage progressive reasoning and cross-modality integration in ophthalmology.
Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQA cs.CV · 2026-05-11 · unverdicted · none · ref 22
Self-verification in medical VQA creates a verification mirage where verifiers exhibit high error and agreement bias on wrong answers, with reliability strongly conditioned on task type.
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology cs.CV · 2026-05-11 · unverdicted · none · ref 117
RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.
MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence cs.CV · 2026-05-08 · unverdicted · none · ref 31
MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models cs.AI · 2026-05-05 · unverdicted · none · ref 35
MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution cs.CV · 2026-04-29 · unverdicted · none · ref 64
MedSynapse-V evolves latent diagnostic memories via meta queries, causal counterfactual refinement with RL, and dual-branch memory transition to outperform prior medical VLM methods in diagnostic accuracy.
Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA cs.CV · 2026-04-22 · unverdicted · none · ref 5
DCI unifies backdoor adjustment and instrumental variable learning in MedVQA to extract deconfounded representations, yielding better out-of-distribution performance on SLAKE, VQA-RAD and similar benchmarks.
MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging cs.CL · 2026-04-15 · unverdicted · none · ref 84
MedRCube is a new fine-grained evaluation framework that benchmarks 33 MLLMs on medical imaging, ranks Lingshu-32B highest, and finds a significant positive link between shortcut behaviors and diagnostic performance.
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance cs.CV · 2026-04-12 · unverdicted · none · ref 46
DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.
Improving Medical VQA through Trajectory-Aware Process Supervision cs.LG · 2026-04-10 · conditional · none · ref 35
A trajectory-aware process reward using DTW on sentence embeddings, combined with exact-match in GRPO after SFT, raises mean medical VQA accuracy from 0.598 to 0.689 across six benchmarks.
LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering cs.CV · 2026-05-10 · unverdicted · none · ref 18
LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.
Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs cs.CL · 2026-04-28 · unverdicted · none · ref 45
A Medical Entity Tree organizes medical knowledge to engineer higher-quality training data that boosts general MLLMs on medical benchmarks.
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering cs.CV · 2026-04-10 · unverdicted · none · ref 41
MedLVR interleaves latent visual reasoning segments in autoregressive decoding and uses two-stage training to raise average medical VQA accuracy from 48.3% to 53.4% over a Qwen2.5-VL-7B backbone on OmniMedVQA and five other benchmarks.

arXiv preprint arXiv:2305.10415 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer