MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
arXiv preprint arXiv:2408.04187 (2024) Medical Latent Memory Evolution 31
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
MedSynapse-V evolves latent diagnostic memories via meta queries, causal counterfactual refinement with RL, and dual-branch memory transition to outperform prior medical VLM methods in diagnostic accuracy.
SAMe grounds clinical complaints to target organs, builds a patient-specific anatomical map from a single external image, and outputs probe initialization poses, reaching 97.3% liver and 81.7% kidney hit rates on a real robot.
HEG-TKG grounds LLM clinical reasoning in hierarchical evidence-based temporal knowledge graphs from 4,512 PubMed records, delivering 100% citation verifiability and error detectability where standard RAG and unprompted LLMs produce none.
OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
A domain-specific LLM for TB care in South Africa, created by fine-tuning BioMistral-7B with QLoRA and GraphRAG on local guidelines, shows improved contextual alignment over the base model.
citing papers explorer
-
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
-
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution
MedSynapse-V evolves latent diagnostic memories via meta queries, causal counterfactual refinement with RL, and dual-branch memory transition to outperform prior medical VLM methods in diagnostic accuracy.
-
SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound
SAMe grounds clinical complaints to target organs, builds a patient-specific anatomical map from a single external image, and outputs probe initialization poses, reaching 97.3% liver and 81.7% kidney hit rates on a real robot.
-
The Provenance Gap in Clinical AI: Evidence-Traceable Temporal Knowledge Graphs for Rare Disease Reasoning
HEG-TKG grounds LLM clinical reasoning in hierarchical evidence-based temporal knowledge graphs from 4,512 PubMed records, delivering 100% citation verifiability and error detectability where standard RAG and unprompted LLMs produce none.
-
Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models
OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa
A domain-specific LLM for TB care in South Africa, created by fine-tuning BioMistral-7B with QLoRA and GraphRAG on local guidelines, shows improved contextual alignment over the base model.