Vcot-grasp: Grasp foundation models with visual chain-of-thought reasoning for language-driven grasp generation

Haoran Zhang, Shuanghao Bai, Wanqi Zhou, Yuedi Zhang, Qi Zhang, Pengxiang Ding, Cheng Chi, Donglin Wang, Badong Chen · 2025 · arXiv 2510.05827

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Assistance Without Interruption: A Benchmark and LLM-based Framework for Non-Intrusive Human-Robot Assistance

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

The work creates NIABench and an LLM-plus-scoring-model framework that enables robots to deliver proactive assistance during human multi-step activities while avoiding interruptions and reducing human effort.

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.

AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.

citing papers explorer

Showing 3 of 3 citing papers.

Assistance Without Interruption: A Benchmark and LLM-based Framework for Non-Intrusive Human-Robot Assistance cs.RO · 2026-05-02 · unverdicted · none · ref 49
The work creates NIABench and an LLM-plus-scoring-model framework that enables robots to deliver proactive assistance during human multi-step activities while avoiding interruptions and reducing human effort.
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring cs.RO · 2026-04-08 · unverdicted · none · ref 9
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking cs.CV · 2026-05-20 · unverdicted · none · ref 39
AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.

Vcot-grasp: Grasp foundation models with visual chain-of-thought reasoning for language-driven grasp generation

fields

years

verdicts

representative citing papers

citing papers explorer