hub

A Pragmatic VLA Foundation Model

· 2026 · arXiv 2601.18692

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

RotVLA: Rotational Latent Action for Vision-Language-Action Model

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.

RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.

Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT

cs.RO · 2026-05-09 · unverdicted · novelty 7.0

ConSFT prevents catastrophic forgetting in fine-tuning flow-matching VLAs by dynamically scaling gradients based on model confidence, retaining over 20% more pre-trained capability than standard SFT without prior data or reference networks.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

cs.AI · 2026-04-18 · unverdicted · novelty 7.0

Mini-BEHAVIOR-Gran benchmark reveals a U-shaped effect of instruction granularity on embodied agent performance, with planning-width correlating best and coarse instructions linked to vision-dominant shallow policies.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

PRTS pretrains VLA models with contrastive goal-conditioned RL to embed goal-reachability probabilities from offline data, yielding SOTA results on robotic benchmarks especially for long-horizon and novel instructions.

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

cs.RO · 2026-04-23 · unverdicted · novelty 6.0

LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

cs.RO · 2026-03-26 · unverdicted · novelty 6.0

SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.

Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

A dual VLM-VLA framework for long-horizon robot manipulation achieves 32.4% success on RMBench tasks versus 9.8% for the strongest baseline via structured memory and closed-loop adaptive replanning.

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

cs.RO · 2026-04-07 · unverdicted · novelty 5.0

CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

cs.RO · 2026-04-22 · unverdicted · novelty 4.0

JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

citing papers explorer

Showing 15 of 15 citing papers.

RotVLA: Rotational Latent Action for Vision-Language-Action Model cs.RO · 2026-05-13 · unverdicted · none · ref 45
RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning cs.RO · 2026-05-12 · unverdicted · none · ref 48
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT cs.RO · 2026-05-09 · unverdicted · none · ref 1
ConSFT prevents catastrophic forgetting in fine-tuning flow-matching VLAs by dynamically scaling gradients based on model confidence, retaining over 20% more pre-trained capability than standard SFT without prior data or reference networks.
Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 116
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents cs.AI · 2026-04-18 · unverdicted · none · ref 6
Mini-BEHAVIOR-Gran benchmark reveals a U-shaped effect of instruction granularity on embodied agent performance, with planning-width correlating best and coarse instructions linked to vision-dominant shallow policies.
HumanNet: Scaling Human-centric Video Learning to One Million Hours cs.CV · 2026-05-07 · unverdicted · none · ref 36
HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations cs.AI · 2026-04-30 · unverdicted · none · ref 31
PRTS pretrains VLA models with contrastive goal-conditioned RL to embed goal-reachability probabilities from offline data, yielding SOTA results on robotic benchmarks especially for long-horizon and novel instructions.
Long-Horizon Manipulation via Trace-Conditioned VLA Planning cs.RO · 2026-04-23 · unverdicted · none · ref 61
LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 190
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis cs.RO · 2026-04-10 · unverdicted · none · ref 73
VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.
SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models cs.RO · 2026-03-26 · unverdicted · none · ref 3
SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.
Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection cs.RO · 2026-04-15 · unverdicted · none · ref 17
A dual VLM-VLA framework for long-horizon robot manipulation achieves 32.4% success on RMBench tasks versus 9.8% for the strongest baseline via structured memory and closed-loop adaptive replanning.
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment cs.RO · 2026-04-07 · unverdicted · none · ref 60
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy cs.RO · 2026-04-22 · unverdicted · none · ref 38
JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.
World Model for Robot Learning: A Comprehensive Survey cs.RO · 2026-04-30 · unverdicted · none · ref 59
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

A Pragmatic VLA Foundation Model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer