{"total":10,"items":[{"citing_arxiv_id":"2605.07308","ref_index":44,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-08T06:17:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"rich manipulation.arXiv preprint arXiv:2503.02881, 2025. [43] Shaobo Yang, Hongtong Li, Jiangyu Hu, Shixin Zhang, Guo- cai Yao, Ziqiang Ni, and Bin Fang. Bitla: A bimanual tactile- language-action model for contact-rich robotic manipulation. InProceedings of the 1st International Workshop on Multi- Sensorial Media and Applications, pages 12-17, 2025. [44] Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, et al. Forcevla: Enhancing vla models with a force-aware moe for contact-rich manipulation.arXiv preprint arXiv:2505.22159, 2025. [45] Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, and Gao Huang."},{"citing_arxiv_id":"2605.03269","ref_index":114,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RLDX-1 Technical Report","primary_cat":"cs.RO","submitted_at":"2026-05-05T01:40:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Cogvideox: Text-to-video diffusion models with an expert transformer. In International Conference on Learning Representations, 2025c. [113] Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, Shenyuan Gao, Sihyun Yu, George Kurian, Suneel Indupuru, You Liang Tan, Chuning Zhu, Jiannan Xiang, et al. World action models are zero-shot policies.arXiv preprint arXiv:2602.15922, 2026. [114] Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, et al. Forcevla: Enhancing vla models with a force-aware moe for contact-rich manipulation.arXiv preprint arXiv:2505.22159, 2025. [115] Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, and Jianye"},{"citing_arxiv_id":"2605.02600","ref_index":30,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-04T13:49:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CoRAL lets LLMs act as adaptive cost designers for motion planners while using VLM priors and online identification to handle unknown physics, achieving over 50% higher success rates than baselines in unseen contact-rich robotic scenarios.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"merely identifying goals, our LLM formulates the structure of the MPPI [28] cost function and symbolic contact strategies, grounding commonsense reasoning directly into the optimal control problem. Tackling Contact-Rich Manipulation.Contact-rich ma- nipulation requires nuanced force control beyond simple trajectory generation. Recent works like ForceVLA [30], TLA [7], VLA-Touch [1], RDP [29], and FACTR [19] explic- itly integrate force or tactile data into learned policies. While effective, this hardware-centric approach creates a data bot- tleneck, requiring difficult-to-collect specialized multimodal datasets [4]. CoRAL leverages real-time force feedback within the MPPI controller but eliminates the need for prior demon-"},{"citing_arxiv_id":"2604.10647","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction","primary_cat":"cs.RO","submitted_at":"2026-04-12T13:48:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"avoids the cost, wear, and operational constraints associ- ated with robot-in-the-loop data collection. Handheld inter- faces such as UMI [7] established a representative pipeline for portable robot-free data collection using RGB obser- vations, relative motion reconstruction, and gripper state, enabling embodiment-agnostic imitation learning. Follow- up systems such as FastUMI [31] and DexWild [23] fur- ther improved scalability, hardware independence, and ease of deployment. More recently, large-scale studies such as RDT2 [19], along with industrial efforts including General- ist AI and Sunday Robotics, have demonstrated that robot- free data pipelines can support cross-embodiment general- ization and large-scale robot learning."},{"citing_arxiv_id":"2604.10165","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks","primary_cat":"cs.RO","submitted_at":"2026-04-11T11:24:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoRI dynamically mixes RL and IL experts with variance-based switching and IL regularization to reach 97.5% success in four real-world robotic tasks while cutting human intervention by 85.8%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[28] and offline RL [29] utilize human demonstrations for direct deployment, yet these approaches often struggle with distribution shifts in real-world environments. Alternatively, simulation-based RL leverages domain randomization to bridge the sim-to-real gap, ensuring robustness during real- world deployment. While successful for highly dynamic lo- comotion tasks [30], [31], these methods often struggle with precision manipulation because of the difficulty in modeling complex robot-environment interactions. The third approach involves direct real-world training via IL [32] or RL [5], which learns physical dynamics without explicit modeling. However, this strategy carries risks of hardware damage and typically requires human intervention [10] to maintain safety"},{"citing_arxiv_id":"2604.03066","ref_index":176,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems","primary_cat":"eess.SY","submitted_at":"2026-04-03T14:40:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"are highly contact-rich and depend on subtle state transi- tions, including force and compliance, that may not be fully observable through vision alone. Therefore, purely vision- language-conditioned action generation may be insufficient for robust execution in such tasks. To support such applica- tions, additional modalities have been incorporated into the VLA framework [176], [177]. ForceVLA [176] treats force as the first-class modality, leveraging a force-aware mixture- of-experts approach to fuse tactile feedback with visual- language embeddings, thereby greatly enhancing performance in contact-rich manipulation. Besides incorporating tactile sensing as an additional modality in the network, TACTILE- VLA [177] also integrates tactile feedback into the chain-of-"},{"citing_arxiv_id":"2603.25044","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making","primary_cat":"cs.RO","submitted_at":"2026-03-26T05:26:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ThermoAct integrates thermal imaging into VLA models via a VLM planner to enable robots to perceive physical properties like heat and improve safety over vision-only systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.04038","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control","primary_cat":"cs.RO","submitted_at":"2026-03-04T13:18:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TER-DAgger improves robotic precision insertion success rates by over 37% via residual policies from edited trajectories and force-aware intervention triggers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.18085","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continually Evolving Skill Knowledge in Vision Language Action Model","primary_cat":"cs.RO","submitted_at":"2025-11-22T15:00:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Stellar VLA achieves continual learning in VLA models by maintaining a growing knowledge space and routing tasks to specialized experts conditioned on semantic relations, delivering strong LIBERO benchmark results with only 1% data replay and successful real-world transfer on dual-arm hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.13073","ref_index":138,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey","primary_cat":"cs.RO","submitted_at":"2025-08-18T16:45:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the incorporation of a true MoE or other specially designed architectures within the action expert. A typical example is π0 [29]. Its backbone weights are initialized from a pre- trained VLM. To handle robot-specific inputs and action generation, a second set of independent weights, the flow- matching-based action expert, is introduced and trained from scratch. ForceVLA [138] uses π0 [29] as the base model, the FVLMoE module with MoE is used to introduce the force modality into VLA. OneTwoVLA [139] is based on π0 [29] and can switch between two modes: explicitly reasoning and generating actions based on the most recent reasoning. This architecture makes it easier for the two systems to operate asynchronously and further improves efficiency."}],"limit":50,"offset":0}