Title resolution pending

· 2024 · arXiv 2409.01652

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

PaMoSplat: Part-Aware Motion-Guided Gaussian Splatting for Dynamic Scene Reconstruction

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

PaMoSplat reconstructs dynamic scenes by lifting 2D segmentations to coherent 3D Gaussian parts and estimating their motions via optical flow-guided differential evolution for higher quality rendering and faster training.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.

TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

TriRelVLA introduces triadic object-hand-task relational representations and a task-grounded graph transformer with a relational bottleneck to improve generalization in robotic manipulation across scenes, objects, and tasks.

Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.

BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

BridgeACT learns robot manipulation from human videos alone by predicting task-relevant grasp regions and 3D motion affordances that map directly to robot controllers.

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

cs.RO · 2026-04-23 · unverdicted · novelty 6.0

CorridorVLA improves VLA models by using predicted sparse anchors to impose explicit spatial corridors on action trajectories, yielding 3.4-12.4% success rate gains on LIBERO-Plus with GR00T-Corr reaching 83.21%.

AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

AssemLM uses a specialized point cloud encoder inside a multimodal LLM to reach state-of-the-art 6D pose prediction for assembly tasks, backed by a new 900K-sample benchmark called AssemBench.

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

cs.RO · 2025-10-15 · unverdicted · novelty 6.0

InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

cs.RO · 2025-01-16 · unverdicted · novelty 6.0

FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

Forecast-aware Gaussian Splatting for Predictive 3D Representation in Language-Guided Pick-and-Place Manipulation

cs.RO · 2026-05-11 · unverdicted · novelty 5.0

Forecast-GS predicts task-completed 3D states via Gaussian splatting to achieve higher success rates than baselines in real-world language-conditioned manipulation tasks.

BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation

cs.RO · 2026-05-08 · unverdicted · novelty 5.0

BioProVLA-Agent integrates protocol parsing, visual state verification, and VLA-based execution in a closed-loop multi-agent framework with AugSmolVLA augmentation to improve robustness for biological lab tasks like tube handling and liquid pouring.

Synergizing Efficiency and Reliability for Continuous Mobile Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 5.0

A framework integrates anticipatory planning and real-time feedback via reliability-aware optimization and phase switching to achieve efficient, reliable continuous mobile manipulation under uncertainty.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04 · accept · novelty 5.0

A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

citing papers explorer

Showing 14 of 14 citing papers.

PaMoSplat: Part-Aware Motion-Guided Gaussian Splatting for Dynamic Scene Reconstruction cs.CV · 2026-05-11 · unverdicted · none · ref 2
PaMoSplat reconstructs dynamic scenes by lifting 2D segmentations to coherent 3D Gaussian parts and estimating their motions via optical flow-guided differential evolution for higher quality rendering and faster training.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 8
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 27
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.
TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation cs.CV · 2026-05-07 · unverdicted · none · ref 34
TriRelVLA introduces triadic object-hand-task relational representations and a task-grounded graph transformer with a relational bottleneck to improve generalization in robotic manipulation across scenes, objects, and tasks.
Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation cs.RO · 2026-05-02 · unverdicted · none · ref 12
Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.
BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances cs.RO · 2026-04-25 · unverdicted · none · ref 10
BridgeACT learns robot manipulation from human videos alone by predicting task-relevant grasp regions and 3D motion affordances that map directly to robot controllers.
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors cs.RO · 2026-04-23 · unverdicted · none · ref 22
CorridorVLA improves VLA models by using predicted sparse anchors to impose explicit spatial corridors on action trajectories, yielding 3.4-12.4% success rate gains on LIBERO-Plus with GR00T-Corr reaching 83.21%.
AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly cs.RO · 2026-04-10 · unverdicted · none · ref 20
AssemLM uses a specialized point cloud encoder inside a multimodal LLM to reach state-of-the-art 6D pose prediction for assembly tasks, backed by a new 900K-sample benchmark called AssemBench.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy cs.RO · 2025-10-15 · unverdicted · none · ref 14
InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.
FAST: Efficient Action Tokenization for Vision-Language-Action Models cs.RO · 2025-01-16 · unverdicted · none · ref 32
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
Forecast-aware Gaussian Splatting for Predictive 3D Representation in Language-Guided Pick-and-Place Manipulation cs.RO · 2026-05-11 · unverdicted · none · ref 3
Forecast-GS predicts task-completed 3D states via Gaussian splatting to achieve higher success rates than baselines in real-world language-conditioned manipulation tasks.
BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation cs.RO · 2026-05-08 · unverdicted · none · ref 7
BioProVLA-Agent integrates protocol parsing, visual state verification, and VLA-based execution in a closed-loop multi-agent framework with AugSmolVLA augmentation to improve robustness for biological lab tasks like tube handling and liquid pouring.
Synergizing Efficiency and Reliability for Continuous Mobile Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 2
A framework integrates anticipatory planning and real-time feedback via reliability-aware optimization and phase switching to achieve efficient, reliable continuous mobile manipulation under uncertainty.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · accept · none · ref 44
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer