HIMM: Human-Inspired Long-Term Memory Modeling for Embodied Exploration and Question Answering

Bo Wang; Ji Li; Jing Xia; Mingyi Li; Shiyan Hu

arxiv: 2602.15513 · v2 · pith:HPVOKD7Mnew · submitted 2026-02-17 · 💻 cs.RO · cs.AI

HIMM: Human-Inspired Long-Term Memory Modeling for Embodied Exploration and Question Answering

Ji Li , Bo Wang , Jing Xia , Mingyi Li , Shiyan Hu This is my paper

classification 💻 cs.RO cs.AI

keywords memoryembodiedexplorationsemanticansweringepisodicquestionagents

0 comments

read the original abstract

Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limited context budgets. Existing memory assisted methods often rely on textual summaries, which discard rich visual and spatial details and remain brittle in non-stationary environments. In this work, we propose a non-parametric memory framework that explicitly disentangles episodic and semantic memory for embodied exploration and question answering. Our retrieval-first, reasoning-assisted paradigm recalls episodic experiences via semantic similarity and verifies them through visual reasoning, enabling robust reuse of past observations without rigid geometric alignment. In parallel, we introduce a program-style rule extraction mechanism that converts experiences into structured, reusable semantic memory, facilitating cross-environment generalization. Extensive experiments demonstrate state-of-the-art performance on embodied question answering and exploration benchmarks, yielding a 7.3% gain in LLM-Match and an 11.4% gain in LLM MatchXSPL on A-EQA, as well as +7.7% success rate and +6.8% SPL on GOAT-Bench. Analyses reveal that our episodic memory primarily improves exploration efficiency, while semantic memory strengthens complex reasoning of embodied agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
cs.RO 2026-05 unverdicted novelty 6.0

RoboMemArena is a new large-scale robotic memory benchmark with real-world tasks, and PrediMem is a dual VLA system that outperforms baselines by managing memory buffers with predictive coding.