arxiv: 2604.25897 · v1 · submitted 2026-04-28 · 💻 cs.RO · cs.LG· cs.SY· eess.SY

Recognition: unknown

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

Clinton Enwerem , Shreya Kalyanaraman , John S. Baras , Calin Belta

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:32 UTC · model grok-4.3

classification 💻 cs.RO cs.LGcs.SYeess.SY

keywords robust graspingvariational inferenceGaussian mixtureCVaRmultimodal uncertaintydexterous manipulationcontact parametersrisk-sensitive control

0 comments

The pith

A variational Gaussian-mixture belief enables faster, more robust dexterous grasping by optimizing risk-sensitive objectives directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Grasping involves uncertainty from contacts, sensing, and disturbances, making standard expected-quality planning prone to failures in bad cases. The paper shows that formulating the belief over latent parameters as a variational inference problem with a Gaussian mixture allows gradient-based optimization of Conditional Value-at-Risk for robustness. This approach avoids the scalability issues of particle filters and achieves better performance in both simulation and real-robot experiments. It reduces planning time significantly while improving grasp success under perturbations and providing better risk calibration.

Core claim

By representing the belief over contact parameters and object pose with a differentiable Gaussian mixture and using Gumbel-Softmax for component selection along with reparameterization tricks, the method enables pathwise gradients through a CVaR surrogate, allowing direct optimization of robust grasping policies that outperform particle-filter based MPC in success rate and speed.

What carries the argument

The variational neural belief parameterization, which uses a Gaussian mixture model with Gumbel-Softmax selection and location-scale reparameterization to make samples differentiable functions of the belief parameters for CVaR optimization.

If this is right

Improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations in simulation.
Reduces planning time by roughly an order of magnitude relative to particle-filter model-predictive control.
Validates grasp-and-lift success on a real serial-chain robot arm with multifingered hand under object-pose uncertainty.
Achieves higher tactile grasp-quality proxy and terminates in fewer steps with less wall-clock time than Gaussian baseline.
Calibrates risk more accurately with mean absolute calibration error below 0.14 compared to 0.58 for Cross-Entropy Method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may apply to other robotics tasks involving stochastic contact and sensing uncertainties.
Combining this with learned dynamics models could enhance performance in more complex environments.
The speed improvements suggest potential for online replanning in dynamic settings.

Load-bearing premise

A finite Gaussian mixture plus variational inference yields a sufficiently accurate and differentiable approximation to the true multimodal posterior over latent contact parameters and object pose for the CVaR objective to produce reliable robustness gains.

What would settle it

Observing whether the variational method maintains higher grasp success than particle filters when the posterior is highly multimodal and not well-approximated by a small number of Gaussians.

Figures

Figures reproduced from arXiv: 2604.25897 by Calin Belta, Clinton Enwerem, John S. Baras, Shreya Kalyanaraman.

**Figure 1.** Figure 1: This paper develops a variational neural belief (VNB) parameter view at source ↗

**Figure 2.** Figure 2: Hardware Platform: Our platform comprises a FAIR Innovation FR3 cobot (6 DoF), RealHand L6 robotic hand (11 DoF), two calibrated RGB-D cameras (RealSense D435i and Orbbec Astra Pro Plus), and representative primitives and YCB objects. We compare VNB-MPC with a Gaussian baseline under object-pose uncertainty and report results in Section VI, with hardware grasps shown in view at source ↗

**Figure 3.** Figure 3: Overview of Our Proposed Variational Neural Belief Grasping Framework. At each decision step, the neural belief dynamics update a latent embedding ht via a prediction network ftrans conditioned on actions and a correction network fobs conditioned on observations (joint angles, pose estimate with covariance, and contact geometry). We then evaluate grasp robustness under multimodal uncertainty along separat… view at source ↗

**Figure 4.** Figure 4: Per-Regime Performance Comparison. Grouped bar chart comparing Cem and Vnb across friction regimes. Vnb matches or exceeds Cem in all three regimes, with the largest gains under nominal and bimodal friction. (3.0, 5.0, 8.0, and 12.0 N) to test resistance to disturbanceinduced slip. We consider a grasp nominally successful if the object remains within 1 cm of the lifted pose and survives all shear pulses.… view at source ↗

**Figure 5.** Figure 5: Grasp Quality under Force Perturbations. Aggregate Ferrari– Canny ε over MPC steps for Vnb and Cem. Shaded regions mark approach, consolidation, and grasp maintenance. Vnb maintains higher ε and remains robust under lift perturbations (step 11), whereas expectation-driven Cem degrades during grasp maintenance. once a grasp fails, terminal quality collapses to zero and no longer distinguishes whether a meth… view at source ↗

**Figure 7.** Figure 7: Hardware Grasp Results under Vision Uncertainty. Post-lift frames of representative Vnb grasps executed on our hardware platform after 6D pose estimation (RealSense D435i). Here, we quantify terminal grasp quality using an ε-metric [34] inspired proxy εˆT (×10−3 ) estimated from piezoresistive tactile observations (see Appendix D for details). C. Summary Together, our simulation and hardware results demon… view at source ↗

**Figure 8.** Figure 8: Pose Perturbation Protocol. A rigid 24×19-inch pegboard defines a discrete planar coordinate frame {FB}, with origin pB axis-aligned to the robot base frame {FR}. We place objects at fixed dot offsets from pB, and apply controlled pose perturbations using the offset set in (25). 0 128 255 Raw Pressure (a) (b) (c) (d) view at source ↗

**Figure 10.** Figure 10: Contact-Frame Assignment. For each active finger i ∈ Cbt, forward kinematics gives the fingertip position pi,t. Dashed lines connect the inferred object center ct to active fingertip positions. Blue arrows denote inward contact normals nˆi,t, the red arrow denotes ˆt (1) i,t , and ⊙ denotes ˆt (2) i,t , pointing out of the page. (kthumb = 25.5, kj = 204.0 for j ̸= thumb). A finger is in contact when ˆfi … view at source ↗

read the original abstract

Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poorly, obstruct gradient-based optimization, and estimate Conditional Value-at-Risk (CVaR) with high-variance approximations. We instead formulate grasp acquisition as variational inference over latent contact parameters and object pose, representing the belief with a differentiable Gaussian mixture. We use Gumbel-Softmax component selection and location-scale reparameterization to express samples as smooth functions of the belief parameters, enabling pathwise gradients through a differentiable CVaR surrogate for direct optimization of tail robustness. In simulation, our variational neural belief improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations while reducing planning time by roughly an order of magnitude relative to particle-filter model-predictive control. On a serial-chain robot arm with a multifingered hand, we validate grasp-and-lift success under object-pose uncertainty against a Gaussian baseline. Both methods succeed on the tested perturbations, but our controller terminates in fewer steps and less wall-clock time while achieving a higher tactile grasp-quality proxy. Our learned belief also calibrates risk more accurately, keeping mean absolute calibration error below 0.14 across tested simulation regimes, compared with 0.58 for a Cross-Entropy Method planner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The variational neural belief with Gumbel-Softmax and differentiable CVaR gives faster sim planning for robust grasping, but hardware tests only show efficiency gains without proving the multimodal robustness edge.

read the letter

This paper's main contribution is a variational neural belief over contact parameters and object pose that uses a Gaussian mixture with Gumbel-Softmax selection to enable differentiable CVaR optimization for robust grasping. It addresses the scaling problems of particle filters in risk-sensitive POMDPs. In simulation, it shows improved grasp success under uncertainty and much faster planning times than particle-filter MPC. The calibration is also tighter. These results come from the pathwise gradients through the CVaR surrogate, which is a practical technical choice. The hardware experiments are more limited. They test grasp-and-lift under pose uncertainty on a real arm with a multifingered hand, and both the variational method and a Gaussian baseline succeed, though the new one is quicker and has a better quality score. This means the key robustness benefit from handling multimodal uncertainty isn't demonstrated on the robot, only efficiency gains. There are some gaps in the reporting too. No error bars on the performance numbers, no ablations on the mixture components or CVaR level, and the baselines aren't described in enough detail to reproduce exactly. The central claim about robustness under multimodal uncertainty rests mostly on the simulation side. This work is for researchers in dexterous robotics and planning under uncertainty. A reader who wants to see how variational methods can replace particle filters in grasping would get something out of the sim results and the formulation. It should go to peer review. The approach is grounded and the simulation evidence is there, but the authors will likely need to strengthen the hardware validation and add statistical details.

Referee Report

3 major / 2 minor

Summary. The paper formulates dexterous grasp acquisition as variational inference over latent contact parameters and object pose, using a differentiable Gaussian-mixture belief with Gumbel-Softmax component selection and location-scale reparameterization. This enables pathwise gradients through a differentiable CVaR surrogate, allowing direct optimization of tail-robust objectives. In simulation the method reports higher robust grasp success under contact uncertainty and force perturbations, plus an order-of-magnitude reduction in planning time versus particle-filter MPC; on hardware it shows faster termination and higher tactile-quality proxy than a Gaussian baseline, with both succeeding on tested perturbations and improved calibration (MAE <0.14 vs 0.58 for CEM).

Significance. If the central claims hold, the work supplies a scalable, gradient-friendly belief representation that addresses the scalability and differentiability limitations of particle-filter approaches to risk-sensitive POMDPs in grasping. The technical device of combining Gumbel-Softmax with reparameterization to obtain low-variance CVaR gradients is a concrete contribution that could transfer to other contact-rich tasks. The reported calibration improvement and planning-time gains are practically relevant, though the absence of error bars and limited hardware evidence for multimodal robustness temper the strength of the overall result.

major comments (3)

[Experiments] Experiments section: performance deltas (success rates, planning time, calibration MAE) are presented without error bars, confidence intervals, or statistical significance tests, and without exact baseline implementations or ablation on mixture-component count and CVaR alpha. These omissions make it impossible to verify that the reported improvements are robust rather than artifacts of particular random seeds or hyper-parameter choices.
[Hardware Experiments] Hardware validation paragraph: both the variational method and the Gaussian baseline are reported to succeed on the tested object-pose perturbations, yet no failure cases for the baseline under multimodal contact uncertainty are shown. Consequently the hardware results demonstrate efficiency and proxy-quality gains but do not provide direct evidence that the multimodal posterior approximation translates into robustness advantages on the physical system.
[Belief Parameterization] Method section on belief parameterization: the claim that a finite Gaussian mixture plus variational inference yields a sufficiently accurate approximation to the true multimodal posterior for reliable CVaR optimization rests on the weakest assumption identified in the review; no diagnostic (e.g., posterior predictive checks or comparison against a high-fidelity sampler) is supplied to quantify approximation error under the contact-parameter regimes used in the CVaR objective.

minor comments (2)

[CVaR Optimization] Notation for the differentiable CVaR surrogate should be introduced with an explicit equation number and a short derivation showing how the Gumbel-Softmax and reparameterization yield an unbiased gradient estimator.
[Figures] Figure captions for simulation and hardware results should state the number of trials, random seeds, and exact values of alpha and mixture components used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the strengths and limitations of our work. We address each major comment point by point below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: performance deltas (success rates, planning time, calibration MAE) are presented without error bars, confidence intervals, or statistical significance tests, and without exact baseline implementations or ablation on mixture-component count and CVaR alpha. These omissions make it impossible to verify that the reported improvements are robust rather than artifacts of particular random seeds or hyper-parameter choices.

Authors: We agree that the lack of error bars, confidence intervals, statistical tests, and ablations weakens the ability to assess robustness. In the revised manuscript we will report means and standard deviations over at least five independent random seeds for all simulation metrics, include paired statistical significance tests against baselines, document the precise baseline implementations (including particle count and CEM parameters), and add an ablation study on the number of mixture components and CVaR alpha values to confirm that performance gains hold across reasonable hyper-parameter settings. revision: yes
Referee: [Hardware Experiments] Hardware validation paragraph: both the variational method and the Gaussian baseline are reported to succeed on the tested object-pose perturbations, yet no failure cases for the baseline under multimodal contact uncertainty are shown. Consequently the hardware results demonstrate efficiency and proxy-quality gains but do not provide direct evidence that the multimodal posterior approximation translates into robustness advantages on the physical system.

Authors: We acknowledge that the hardware trials were conducted under object-pose uncertainty and therefore do not directly exhibit failure modes of the Gaussian baseline under multimodal contact uncertainty. Inducing repeatable multimodal contact uncertainty on hardware is practically difficult. The multimodal robustness claims are supported by the simulation results under explicit contact-parameter uncertainty. In revision we will clarify the scope of the hardware experiments, discuss the challenges of hardware multimodal testing, and report any incidental observations of grasp quality differences that occurred during the physical trials. revision: partial
Referee: [Belief Parameterization] Method section on belief parameterization: the claim that a finite Gaussian mixture plus variational inference yields a sufficiently accurate approximation to the true multimodal posterior for reliable CVaR optimization rests on the weakest assumption identified in the review; no diagnostic (e.g., posterior predictive checks or comparison against a high-fidelity sampler) is supplied to quantify approximation error under the contact-parameter regimes used in the CVaR objective.

Authors: The referee is correct that direct quantification of approximation error would strengthen the justification for using the finite Gaussian mixture. While the reported calibration MAE already provides an indirect validation of belief quality for the CVaR objective, it does not fully characterize posterior fidelity. We will add posterior predictive checks and side-by-side comparisons of samples drawn from the variational Gaussian mixture versus a high-fidelity particle-filter sampler in representative contact-parameter regimes to better support the modeling assumption. revision: yes

Circularity Check

0 steps flagged

No circularity: standard VI tools applied to robotics domain with empirical validation

full rationale

The derivation formulates grasp planning as variational inference over contact parameters and pose using a differentiable Gaussian mixture belief, Gumbel-Softmax selection, and location-scale reparameterization to enable pathwise gradients through a CVaR surrogate. These are standard techniques from the VI literature applied to the domain; the reported gains in robust success rate, planning time, and calibration error are measured directly from simulation rollouts against particle-filter MPC and hardware trials against a Gaussian baseline. No equation reduces a performance metric to a fitted parameter by construction, no self-citation chain bears the central claim, and no ansatz or uniqueness result is smuggled in. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard variational inference assumptions plus domain-specific modeling choices whose validity is not independently verified in the abstract.

free parameters (2)

Number of Gaussian mixture components
Chosen to represent multimodal contact uncertainty; value not stated but required for the belief parameterization.
CVaR risk level alpha
Controls the tail quantile being optimized; must be selected and affects reported robustness.

axioms (2)

domain assumption Contact variability and object pose uncertainty admit a multimodal distribution that a finite Gaussian mixture can approximate sufficiently well for planning.
Central modeling choice invoked when representing the belief.
standard math Gumbel-Softmax and location-scale reparameterization yield unbiased pathwise gradients for the CVaR surrogate.
Required to enable direct optimization through the belief parameters.

pith-pipeline@v0.9.0 · 5580 in / 1523 out tokens · 112399 ms · 2026-05-07T15:32:37.419169+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Pose Error Robust Grasping From Contact Wrench Space Metrics,

J. Weisz and P. K. Allen, “Pose Error Robust Grasping From Contact Wrench Space Metrics,” in2012 IEEE International Conference On Robotics and Automation, pp. 557–562, 2012. ISSN: 1050-4729

2012
[2]

Toward An Analytic Theory of Intrinsic Robustness for Dexterous Grasping,

A. H. Li, P. Culbertson, and A. D. Ames, “Toward An Analytic Theory of Intrinsic Robustness for Dexterous Grasping,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2992–2999, 2024. ISSN: 2153-0866

2024
[3]

GraspIt! A Versatile Simulator for Robotic Grasping,

A. T. Miller and P. K. Allen, “GraspIt! A Versatile Simulator for Robotic Grasping,”IEEE Robotics & Automation Magazine, vol. 11, no. 4, pp. 110–122, 2004

2004
[4]

TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning,

H. Li, Q. Ye, Y. Huo, Q. Liu, S. Jiang, T. Zhou, X. Li, Y. Zhou, and J. Chen, “TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 18106–18112, 2024

2024
[5]

Synthesizing Diverse and Physically Stable Grasps With Arbitrary Hand Structures Using Differentiable Force Closure Estimator,

T. Liu, Z. Liu, Z. Jiao, Y. Zhu, and S.-C. Zhu, “Synthesizing Diverse and Physically Stable Grasps With Arbitrary Hand Structures Using Differentiable Force Closure Estimator,”IEEE Robotics and Automa- tion Letters, vol. 7, no. 1, pp. 470–477, 2022

2022
[6]

Deep Learning Approaches to Grasp Synthesis: A Review,

R. Newbury, M. Gu, L. Chumbley, A. Mousavian, C. Eppner, J. Leit- ner, J. Bohg, A. Morales, T. Asfour, D. Kragic, D. Fox, and A. Cosgun, “Deep Learning Approaches to Grasp Synthesis: A Review,”IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3994–4015, 2023

2023
[7]

Neu- ralGrasps: Learning Implicit Representations for Grasps of Multiple Robotic Hands,

N. Khargonkar, N. Song, Z. Xu, B. Prabhakaran, and Y. Xiang, “Neu- ralGrasps: Learning Implicit Representations for Grasps of Multiple Robotic Hands,” 2022

2022
[8]

Dex- GraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation,

R. Wang, J. Zhang, J. Chen, Y. Xu, P. Li, T. Liu, and H. Wang, “Dex- GraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation,” 2023

2023
[9]

Deep Varia- tional Bayes Filters: Unsupervised Learning of State Space Models from Raw Data,

M. Karl, M. Soelch, J. Bayer, and P. van der Smagt, “Deep Varia- tional Bayes Filters: Unsupervised Learning of State Space Models from Raw Data,” inInternational Conference on Learning Represen- tations (ICLR), 2017

2017
[10]

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,

J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,”Robotics: Science and Systems (RSS), 2017

2017
[11]

Planning and Acting in Partially Observable Stochastic Domains,

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,”Artificial Intelli- gence, vol. 101, no. 1, pp. 99–134, 1998

1998
[12]

Thrun, W

S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. MIT Press, 2005

2005
[13]

Improved Particle Filter Al- gorithm for Multi-Target Detection and Tracking,

Y. Cheng, W. Ren, C. Xiu, and Y. Li, “Improved Particle Filter Al- gorithm for Multi-Target Detection and Tracking,”Sensors (Basel, Switzerland), vol. 24, no. 14, p. 4708, 2024

2024
[14]

Maintaining Multimodality through Mixture Tracking,

J. Vermaak, A. Doucet, and P. P ´erez, “Maintaining Multimodality through Mixture Tracking,” inProceedings Ninth IEEE International Conference on Computer Vision, pp. 1110–1116 vol.2, 2003

2003
[15]

Categorical Reparameterization with Gumbel-Softmax,

E. Jang, S. Gu, and B. Poole, “Categorical Reparameterization with Gumbel-Softmax,” 2017

2017
[16]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review arXiv 2013
[17]

Monte Carlo Methods for Value-at- Risk and Conditional Value-at-Risk: A Review,

L. J. Hong, Z. Hu, and G. Liu, “Monte Carlo Methods for Value-at- Risk and Conditional Value-at-Risk: A Review,”ACM Transactions on Modeling and Computer Simulation, vol. 24, no. 4, pp. 22:1–22:37, 2014

2014
[18]

Belief- Space Planning Assuming Maximum Likelihood Observations,

R. Platt Jr, L. P. Kaelbling, T. Lozano-Perez, and R. Tedrake, “Belief- Space Planning Assuming Maximum Likelihood Observations,” in Robotics: Science and Systems, vol. 6, pp. 37–44, 2010

2010
[19]

Global Localization of Objects via Touch,

A. Petrovskaya and O. Khatib, “Global Localization of Objects via Touch,”IEEE Transactions on Robotics, vol. 27, no. 3, pp. 569–585, 2011

2011
[20]

Particle MPC for Uncertain and Learning-Based Control,

R. Dyro, J. Harrison, A. Sharma, and M. Pavone, “Particle MPC for Uncertain and Learning-Based Control,” in2021 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pp. 7127– 7134, 2021. ISSN: 2153-0866

2021
[21]

LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information,

J. van den Berg, P. Abbeel, and K. Goldberg, “LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information,”The International Journal of Robotics Research, vol. 30, no. 7, pp. 895–913, 2011

2011
[22]

Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization,

A. Hakobyan, G. C. Kim, and I. Yang, “Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization,”IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4538–4545, 2019

2019
[23]

Variational Infer- ence: A Review for Statisticians,

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational Infer- ence: A Review for Statisticians,”Journal of the American Statistical Association, vol. 112, no. 518, pp. 859–877, 2017

2017
[24]

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks,

S. Depeweg, J. M. Hern ´andez-Lobato, F. Doshi-Velez, and S. Udluft, “Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks,” inInternational Conference on Learning Representations (ICLR), 2017

2017
[25]

Deep Rein- forcement Learning in a Handful of Trials Using Probabilistic Dynam- ics Models,

K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep Rein- forcement Learning in a Handful of Trials Using Probabilistic Dynam- ics Models,” inAdvances in Neural Information Processing Systems (NeurIPS), pp. 4754–4765, 2018

2018
[26]

Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression,

C. Enwerem, A. G. Puranic, J. S. Baras, and C. Belta, “Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression,” in2025 IEEE 64th Conference on Decision and Control (CDC), pp. 4890–4895, IEEE, 2025

2025
[27]

Distri- butional Reinforcement Learning with Quantile Regression,

W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, “Distri- butional Reinforcement Learning with Quantile Regression,” inPro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018

2018
[28]

Probabilistic Differential Dynamic Pro- gramming,

Y. Pan and E. A. Theodorou, “Probabilistic Differential Dynamic Pro- gramming,” inAdvances in Neural Information Processing Systems, vol. 27, Curran Associates, Inc., 2014

2014
[29]

R. M. Murray, Z. Li, and S. S. Sastry,A Mathematical Introduction to Robotic Manipulation. CRC Press, 1 ed., 2017

2017
[30]

Grasping,

D. Prattichizzo and J. C. Trinkle, “Grasping,” inSpringer Handbook of Robotics(B. Siciliano and O. Khatib, eds.), pp. 671–700, Springer Berlin Heidelberg, 2008

2008
[31]

Optimization of Conditional Value- at-Risk,

R. T. Rockafellar and S. Uryasev, “Optimization of Conditional Value- at-Risk,”Journal of Risk, vol. 2, no. 3, pp. 21–42, 2000

2000
[32]

Implicit Neural Representations with Periodic Activa- tion Functions,

V. Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, “Implicit Neural Representations with Periodic Activa- tion Functions,”Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 7462–7473, 2020

2020
[33]

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization,

P. Xu, J. Chen, D. Zou, and Q. Gu, “Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization,” inAd- vances in Neural Information Processing Systems, vol. 31, Curran As- sociates, Inc., 2018

2018
[34]

Planning Optimal Grasps,

C. Ferrari and J. Canny, “Planning Optimal Grasps,”IEEE Inter- national Conference on Robotics and Automation (ICRA), vol. 3, pp. 2290–2295, 1992

1992
[35]

Sample-Efficient Cross-Entropy Method for Real-time Planning,

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Ro- linek, and G. Martius, “Sample-Efficient Cross-Entropy Method for Real-time Planning,” inProceedings of the 2020 Conference on Robot Learning, pp. 1049–1065, PMLR, 2021

2020
[36]

Coefficient of Friction Reference Chart,

Schneider & Company, “Coefficient of Friction Reference Chart,”
[37]

25, 2026

Accessed: Feb. 25, 2026

2026
[38]

The YCB Object and Model Set: Towards Common Bench- marks for Manipulation Research,

B. Calli, A. Walsman, A. Singh, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The YCB Object and Model Set: Towards Common Bench- marks for Manipulation Research,” in2015 International Conference on Advanced Robotics (ICAR), pp. 510–517, 2015

2015
[39]

Envelope Quantile Regression,

S. Ding, Z. Su, G. Zhu, and L. Wang, “Envelope Quantile Regression,” Statistica Sinica, 2019

2019
[40]

The Envelope Theorem in Dynamic Optimization,

J. T. LaFrance and L. D. Barney, “The Envelope Theorem in Dynamic Optimization,”Journal of Economic Dynamics and Control, vol. 15, no. 2, pp. 355–385, 1991. Appendix A. Pathwise Risk Gradients via Action Optimization Theorem 1(Reparameterized CVaR Gradient).Letb(ϕ)be a reparameterizable belief distribution with samplesθ i = g(ϵi,ϕ),ϵ i ∼p(ϵ), and letC:R d...

1991