AURA-Mem uses an action-gated recurrent memory trained on closed-loop action error to deliver constant 4,224-byte state and 5-9x fewer writes than baselines while matching base policy success on LIBERO-Long.
Title resolution pending
21 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 21representative citing papers
ECHO organizes VLA experiences into a hierarchical memory tree in hyperbolic space via autoencoder and entailment constraints, delivering a 12.8% success-rate gain on LIBERO-Long over the pi0 baseline.
π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.
Freeform Preference Learning trains language-conditioned multi-axis reward models from human pairwise preferences to produce steerable and compositional robot policies that outperform sparse and binary-preference baselines by 38 percentage points.
DiM-WAM is a memory-augmented world-action model that integrates multi-scale historical events and global task progress to improve long-horizon robot manipulation performance.
EventVLA introduces foundational visual anchors and a Keyframe Evidence Memory module that predicts future keyframe probabilities from VLA embeddings to improve long-horizon task success by an average of 40% on 17 simulation and 4 real-world tasks.
SERF conditions VLA policies on online-updated neural point maps of environment and robot to improve long-horizon mobile manipulation on BEHAVIOR-1K.
DAM-VLA decouples per-modality temporal processing in vision-language-action models via latent buffers refreshed at sensor rates, achieving 95.2% average success versus 40.95% for synchronous baselines on seven real-world manipulation tasks while enabling 100 Hz control.
AEM pretrains compact history representations via masked modeling on interleaved vision-action sequences to boost downstream robot manipulation in simulation and real settings.
Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.
VLA models with inference-time steering mitigate action leakage in implicit human-robot collaboration, supporting longer horizons and yielding faster, more reliable assembly than shorter-horizon baselines in a 16-person study.
Hide-and-Seek uses contrastive objectives on trajectories to localize failure signals in VLA models from trajectory-level supervision alone.
RoboMemArena is a new large-scale robotic memory benchmark with real-world tasks, and PrediMem is a dual VLA system that outperforms baselines by managing memory buffers with predictive coding.
ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.
CLWM with DINOv3 targets, O(1) TTT memory, SAI latency masking, and EmbodiChain training achieves SOTA dual-arm simulation performance and zero-shot sim-to-real transfer that beats real-data finetuned baselines.
Autoregressive VLA policies achieve real-time execution via tokenization horizon adjustment and constrained decoding, outperforming flow-matching policies in speed and performance across simulated and real environments.
Embodied-R1.5 is an 8B EFM achieving SOTA on 16 of 24 embodied VLM benchmarks, fine-tunable to outperform leading VLAs, with claimed zero-shot real-robot generalization.
MemoryVLA++ integrates a perceptual-cognitive memory bank and denoising world model into VLA models to enable temporal reasoning, yielding performance gains on manipulation benchmarks and real-robot tasks.
A dual VLM-VLA framework for long-horizon robot manipulation achieves 32.4% success on RMBench tasks versus 9.8% for the strongest baseline via structured memory and closed-loop adaptive replanning.
A structured literature survey of safety mechanisms in long-horizon robotic manipulation organized by intervention timing and strength of supporting evidence.
citing papers explorer
No citing papers match the current filters.