citation dossier

//arxiv.org/abs/2210.00030

Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Os- bert Bastani, Vikash Kumar, and Amy Zhang · 2022 · arXiv 2210.00030

18Pith papers citing it

18reference links

cs.ROtop field · 12 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 18 reviewed papers. Its strongest current cluster is cs.RO (12 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

cs.RO · 2026-04-21 · unverdicted · novelty 7.0

RoboWM-Bench evaluates video world models by converting their outputs into executable robot actions and running them on manipulation tasks, showing that physical inconsistencies remain common.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

cs.RO · 2023-10-13 · unverdicted · novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

cs.RO · 2023-07-12 · unverdicted · novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

PriorZero: Bridging Language Priors and World Models for Decision Making

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

cs.RO · 2026-04-21 · unverdicted · novelty 6.0

UniT creates a unified physical language via visual anchoring and tri-branch reconstruction to enable scalable human-to-humanoid transfer for policy learning and world modeling.

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.

Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.

ARM: Advantage Reward Modeling for Long-Horizon Manipulation

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

cs.RO · 2025-02-27 · accept · novelty 6.0

OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

cs.CV · 2024-12-19 · unverdicted · novelty 6.0

Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

cs.RO · 2023-12-20 · conditional · novelty 6.0

A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.

Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04 · accept · novelty 5.0

A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

citing papers explorer

Showing 18 of 18 citing papers.

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation cs.RO · 2026-04-21 · unverdicted · none · ref 35
RoboWM-Bench evaluates video world models by converting their outputs into executable robot actions and running them on manipulation tasks, showing that physical inconsistencies remain common.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 39
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models cs.RO · 2023-10-13 · unverdicted · none · ref 55
A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models cs.RO · 2023-07-12 · unverdicted · none · ref 108
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
PriorZero: Bridging Language Priors and World Models for Decision Making cs.LG · 2026-05-12 · unverdicted · none · ref 5
PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 26
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning cs.LG · 2026-05-08 · unverdicted · none · ref 59
MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 131
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
GazeVLA: Learning Human Intention for Robotic Manipulation cs.RO · 2026-04-24 · unverdicted · none · ref 48
GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling cs.RO · 2026-04-21 · unverdicted · none · ref 15
UniT creates a unified physical language via visual anchoring and tri-branch reconstruction to enable scalable human-to-humanoid transfer for policy learning and world modeling.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations cs.RO · 2026-04-12 · unverdicted · none · ref 70
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment cs.RO · 2026-04-07 · unverdicted · none · ref 29
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
ARM: Advantage Reward Modeling for Long-Horizon Manipulation cs.RO · 2026-04-03 · unverdicted · none · ref 21
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success cs.RO · 2025-02-27 · accept · none · ref 30
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations cs.CV · 2024-12-19 · unverdicted · none · ref 112
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation cs.RO · 2023-12-20 · conditional · none · ref 152
A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning cs.LG · 2026-04-10 · unverdicted · none · ref 36
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data cs.RO · 2026-04-04 · accept · none · ref 64
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.

//arxiv.org/abs/2210.00030

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer