MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
hub
arXiv preprint arXiv:2408.04187 (2024) Medical Latent Memory Evolution 31
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
Agents-K1 is an end-to-end pipeline with a multimodal parser, 4B GRPO-trained extractor, and agent CLI that builds scientific knowledge graphs from full papers and was run on 2.46 million documents to produce Scholar-KG.
MoG uses hub graphs for shared context and sparsely activates expert graphs with a topology-aware router, reporting over 20% relative gains on MuSiQue.
SAMe grounds complaints to organs, builds a lightweight patient anatomy model from one body image, and outputs probe initialization poses, outperforming keypoint baselines in real-robot liver and kidney trials.
HEG-TKG grounds LLM clinical reasoning in hierarchical evidence-based temporal knowledge graphs from 4,512 PubMed records, delivering 100% citation verifiability and error detectability where standard RAG and unprompted LLMs produce none.
OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
HoT-SSM combines hypergraph construction from domain knowledge with a dynamic state space model to jointly capture higher-order clinical interactions and long-range temporal dependencies, yielding improved predictions on MIMIC-III and MIMIC-IV.
UniD³ applies KG-RAG with Llama 3.3-70B to build six knowledge graphs and generate large validated datasets for drug-disease matching, effectiveness assessment, and target analysis from biomedical literature.
ReCellTy constructs a knowledge graph with 18850 nodes and 48944 edges, retrieves relevant entities for differential genes, and applies multi-task LLM reasoning to improve single-cell type annotation over standard LLMs by up to 0.21 in human scores and 6.1% in semantic similarity.
Proposes MedRLM, a recursive agent-based multimodal framework for long-context clinical reasoning, sensor-guided screening, and referral optimization using a Clinical Evidence Graph Memory.
UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.
VArify introduces a tree visualization to support human verification of GraphRAG evidence for LLM responses in food science, evaluated in a study with six domain experts.
MedSynapse-V proposes a latent memory evolution framework with meta-query prior retrieval, causal counterfactual refinement via RL, and intrinsic memory transition to improve diagnostic accuracy over chain-of-thought baselines in medical VLMs.
CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.
A domain-specific LLM for TB care in South Africa, created by fine-tuning BioMistral-7B with QLoRA and GraphRAG on local guidelines, shows improved contextual alignment over the base model.
citing papers explorer
-
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
-
Agents-K1: Towards Agent-native Knowledge Orchestration
Agents-K1 is an end-to-end pipeline with a multimodal parser, 4B GRPO-trained extractor, and agent CLI that builds scientific knowledge graphs from full papers and was run on 2.46 million documents to produce Scholar-KG.
-
MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation
MoG uses hub graphs for shared context and sparsely activates expert graphs with a topology-aware router, reporting over 20% relative gains on MuSiQue.
-
SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound
SAMe grounds complaints to organs, builds a lightweight patient anatomy model from one body image, and outputs probe initialization poses, outperforming keypoint baselines in real-robot liver and kidney trials.
-
The Provenance Gap in Clinical AI: Evidence-Traceable Temporal Knowledge Graphs for Rare Disease Reasoning
HEG-TKG grounds LLM clinical reasoning in hierarchical evidence-based temporal knowledge graphs from 4,512 PubMed records, delivering 100% citation verifiability and error detectability where standard RAG and unprompted LLMs produce none.
-
Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models
OKH-RAG represents knowledge as ordered hyperedges and retrieves coherent interaction sequences via a learned transition model, outperforming permutation-invariant RAG baselines on order-sensitive QA tasks.
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
In-depth Analysis of Graph-based RAG in a Unified Framework
A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
-
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
-
HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care
HoT-SSM combines hypergraph construction from domain knowledge with a dynamic state space model to jointly capture higher-order clinical interactions and long-range temporal dependencies, yielding improved predictions on MIMIC-III and MIMIC-IV.
-
UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning
UniD³ applies KG-RAG with Llama 3.3-70B to build six knowledge graphs and generate large validated datasets for drug-disease matching, effectiveness assessment, and target analysis from biomedical literature.
-
ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation
ReCellTy constructs a knowledge graph with 18850 nodes and 48944 edges, retrieves relevant entities for differential genes, and applies multi-task LLM reasoning to improve single-cell type annotation over standard LLMs by up to 0.21 in human scores and 6.1% in semantic similarity.
-
MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization
Proposes MedRLM, a recursive agent-based multimodal framework for long-context clinical reasoning, sensor-guided screening, and referral optimization using a Clinical Evidence Graph Memory.
-
UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA
UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.
-
VArify: A Visual Analytics System for Verifying Knowledge Enhanced Large Language Model Responses in Food Science
VArify introduces a tree visualization to support human verification of GraphRAG evidence for LLM responses in food science, evaluated in a study with six domain experts.
-
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution
MedSynapse-V proposes a latent memory evolution framework with meta-query prior retrieval, causal counterfactual refinement via RL, and intrinsic memory transition to improve diagnostic accuracy over chain-of-thought baselines in medical VLMs.
-
CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation
CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.
-
Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa
A domain-specific LLM for TB care in South Africa, created by fine-tuning BioMistral-7B with QLoRA and GraphRAG on local guidelines, shows improved contextual alignment over the base model.