{"total":12,"items":[{"citing_arxiv_id":"2606.30243","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"KYON: Semi-Modular Wheel-Legged Quadruped With Agile Bimanual Capability","primary_cat":"cs.RO","submitted_at":"2026-06-29T12:54:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"KYON is a semi-modular wheel-legged quadruped with reconfigurable lower legs, base-mounted actuators, and bimanual manipulation, using whole-body control plus RL policy for dynamic locomotion and tasks in unstructured environments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26392","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MPC-Injection: Biasing Off-Policy Locomotion RL Toward Controller-Induced Behavior Basins","primary_cat":"cs.RO","submitted_at":"2026-06-24T21:23:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MPC-Injection biases off-policy RL locomotion policies toward controller-induced behavior basins by injecting MPC transitions into the replay buffer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25179","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Learning Perceptive Platform Adaptive Locomotion Controllers for Quadrupedal Robots","primary_cat":"cs.RO","submitted_at":"2026-06-23T21:10:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Empirical comparison of blind, critic-perceptive, and fully perceptive variants of morphology-aware RL locomotion controllers shows critic-only perception improves robustness over blind baselines while remaining more stable under perception noise than full perception.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.21387","ref_index":12,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Long-Distance Real-World Navigation of the Legged-Wheeled Robot Go2-W Using Deep Reinforcement Learning","primary_cat":"cs.RO","submitted_at":"2026-06-19T12:53:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A DRL locomotion controller extended from prior quadruped work enabled the Go2-W robot to complete 2.8 km of autonomous real-world navigation including mixed terrain and stairs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06944","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"T-GMP: Terrain-conditioned Generative Motion Priors for Versatile and Natural Humanoid Locomotion","primary_cat":"cs.RO","submitted_at":"2026-06-05T06:15:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"T-GMP learns a terrain-conditioned latent motion manifold via CVAE from demonstrations and integrates it into an adversarial pipeline with a foothold penalty for versatile, natural humanoid locomotion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26478","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient","primary_cat":"cs.RO","submitted_at":"2026-05-26T02:35:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SDPG is a new on-policy visual RL algorithm that estimates gradients via stochastic perturbations of rollouts, achieving faster training and lower memory use than baselines on visual MuJoCo tasks while adding new robotics benchmarks and sim-to-real results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23847","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Instrumentation for Imitation Learning: Enhancing Training Datasets for Clothes Hanger Insertion","primary_cat":"cs.RO","submitted_at":"2026-05-22T16:59:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Instrumented objects boost diffusion policy success in robotic hanger insertion by 14-25 percentage points over vision-only baselines, and augmenting datasets with instrumented expert rollouts lets a vision-only student match the instrumented expert.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19503","ref_index":14,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders","primary_cat":"cs.RO","submitted_at":"2026-05-19T07:54:40+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09595","ref_index":24,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain","primary_cat":"cs.NE","submitted_at":"2026-05-10T15:16:07+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"However, here the PPO algorithm requires its objective to be maximized. Therefore, we can apply the negative of the PPO objective gradient w.r.t. the network output, shown in Equation 4, directly to the EP output layer, which yields an output layer dynamics: dξt,out,i d[time] =−ξ t,out,i +ρ ′(ξt,out,i) X j wout,ijρ(ξt,j) +b out,iρ′(ξt,out,i)(9) −β· 1 |B| ( 1[0,1+ϵ)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σ2 i ˆAt if ˆAt ≥0 1(1−ϵ,∞)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σ2 i ˆAt if ˆAt <0 Here rt,nudging(ξt,out) is the nudging probability ratio, which is dedicated to the nudge phase per- relaxation-iteration probability ratio calculation: rt,nudging(ξt,out) = πnudging(at|st) πrollout(at|st) (10) = 1 πrollout(at|st) DactionY i \" 1p 2πσ 2 i exp \u0012 −(at,i −ξ t,out,i)2 2σ2 i \u0013# 6 Here πnudging(at|st) is the action probability in the nudge relaxation as a function ofξt,out,i. Daction is the dimensionality of action space. The reason for using the notation rt,nudging(ξt,out) instead of rt(ξt,out) is that the probability ratio keeps changing/oscillating in both nudge phase iterations because ξt,out keeps changing/oscillating, and this nudging probability ratio controls the gradient mask in each relaxation step. However, experiments show that this formulation fails to converge when using Equation 9 as the objective gradient. The per-update KL-divergence graph indicates excessively large update steps. Upon investigation of the cause of the large update steps, we conclude that, for positive-advantage samples, the positive nudge phase will drive the output neuron state toward the farther-away-from-targetdirection without bound. For a detailed discussion about the cause of the large update step using the original PPO objective gradient, see Appendix C. To constrain these large update steps, we propose the two-sided PPO ratio clip objective gradient: ∂LTwoSidedCLIP ∂ξt,out,i = 1 |B| ( 1(1−ϵrev,1+ϵ)(rt,nudging(ξt,out))· (at,i−ξt,out,i) σi ˆAt if ˆAt ≥0 1(1−ϵ,1+ϵrev)(rt,nudgin"},{"citing_arxiv_id":"2604.04539","ref_index":41,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control","primary_cat":"cs.LG","submitted_at":"2026-04-06T09:03:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Wurman, Jaegul Choo, Peter Stone, and Takuma Seno. Simba: Simplicity bias for scaling up parameters in deep reinforcement learning.arXiv preprint arXiv:2410.09754, 2024. [40] Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, and Jaegul Choo. Hyperspherical normaliza- tion for scalable deep reinforcement learning.arXiv preprint arXiv:2502.15280, 2025. 14 [41] Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5(47), October 2020. ISSN 2470-9476. doi: 10.1126/ scirobotics.abc5986.http://dx.doi.org/10.1126/scirobotics.abc5986. [42] Qiyang Li, Aviral Kumar, Ilya Kostrikov, and Sergey Levine. Efficient deep reinforcement learning requires"},{"citing_arxiv_id":"2604.02744","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Learning Locomotion on Complex Terrain for Quadrupedal Robots with Foot Position Maps and Stability Rewards","primary_cat":"cs.RO","submitted_at":"2026-04-03T05:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Integrating foot position maps into heightmaps and adding a locomotion-stability reward in an attention-based RL framework improves quadrupedal success rates on both trained and out-of-domain complex terrains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, \"Learning to walk via deep reinforcement learning,\"arXiv preprint arXiv:1812.11103, 2018. [7] N. Rudin, D. Hoeller, P. Reist, and M. Hutter, \"Learning to walk in minutes using massively parallel deep reinforcement learning,\" in Proc. Conference on Robot Learning (CoRL), 2022, pp. 91-100. [8] J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, \"Learning quadrupedal locomotion over challenging terrain,\"Science Robotics, vol. 5, no. 47, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1126/scirobotics.abc5986 [9] T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, \"Learning robust perceptive locomotion for quadrupedal robots in"},{"citing_arxiv_id":"2507.13662","ref_index":29,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion","primary_cat":"cs.RO","submitted_at":"2025-07-18T05:13:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Integrates iterative learning control with a torque library to enable high-precision adaptive locomotion on bipedal and quadrupedal robots, reducing tracking errors by up to 85% and achieving over 30x faster control rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}