arxiv: 2605.08512 · v2 · submitted 2026-05-08 · 💻 cs.LG

Recognition: no theorem link

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

Yusuf Syed , Viraj Parimi , Brian Williams

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords contrastive representation learningpreference modulationplanningconditional learningdensity ratio preservationinference-based planning

0 comments

The pith

MoMo conditions contrastive representations with a scalar preference to modulate planning conservativeness at inference time while preserving density ratios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MoMo to address the limitation of single latent geometries in contrastive planning that cannot handle multiple behaviors for the same query. It allows continuous modulation of plan safety via user preference without retraining by jointly conditioning the representation geometry with Feature-Wise Linear Modulation and the latent prediction operator with low-rank neural modulation. This setup is shown to preserve the probability density ratio needed for efficient inference-driven planning. A sympathetic reader would care because it enables flexible, preference-aware planning in complex environments while keeping computational efficiency.

Core claim

MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively, which preserves the probability density ratio encoded in the representation space required for inference-driven contrastive planning and retains its inference-time efficiency.

What carries the argument

Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator, which together enable preference-conditioned planning without altering the core contrastive structure.

If this is right

MoMo smoothly adapts plan safety according to user preferences across environments.
It yields improved temporal and preferential consistency over state augmentation baselines.
The formulation retains inference-time efficiency of contrastive planning.
Plans can trade task efficiency against risk exposure continuously for the same start-goal query.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar modulation techniques could extend to other latent-space planning methods beyond contrastive ones.
Users might achieve fine-grained control in real-world robotics by tuning a single preference parameter.
Testing in more dynamic environments could reveal limits of the preservation of density ratios.

Load-bearing premise

That Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator preserves the probability density ratio without introducing inconsistencies or requiring retraining.

What would settle it

An observation that the modulated representations lead to inconsistent probability density ratios or require retraining to maintain planning accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.08512 by Brian Williams, Viraj Parimi, Yusuf Syed.

**Figure 2.** Figure 2: Preference-conditioned planning on the 29-dimensional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Preference-conditioned planning on a pick-and-place [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Preference modulation and planning fidelity across six environments. For each environment, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Matrix summaries of representation space manifold deformation for Point Four Obstacle [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Summary of FiLM and transition matrix conditioning mechanism parameters for Ant [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

Temporally contrastive representation learning induces a latent structure capable of reducing long-horizon planning to inference in a low-dimensional linear system. However, existing contrastive planning work learns a single latent geometry which cannot distinguish multiple valid behaviors trading task efficiency against risk exposure for the same start-goal query. We introduce MoMo, a preference-conditioned contrastive planner allowing a scalar user preference to continuously modulate plan conservativeness at inference time, without retraining. MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively. We show that our formulation preserves the probability density ratio encoded in the representation space that is required for inference-driven contrastive planning, further retaining its inference-time efficiency. Across six environments, MoMo smoothly adapts plan safety according to user preferences, yielding improved temporal and preferential consistency over state augmentation baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MoMo adds preference modulation to contrastive planning via FiLM and low-rank updates, but the density ratio preservation needs an explicit derivation to hold up.

read the letter

MoMo takes existing temporally contrastive planning and adds a scalar preference input that modulates both the latent geometry and the prediction operator. It applies Feature-Wise Linear Modulation to the representations and low-rank neural modulation to the dynamics, so the same trained model can produce more or less conservative plans at inference time without retraining. The abstract claims this keeps the probability density ratio intact, which is what lets the method reduce planning to inference in the latent space. Experiments on six environments show smoother preference adaptation and better consistency than simple state-augmentation baselines. That practical result is the clearest contribution. The formulation itself is new relative to the single-geometry contrastive planning papers it cites. The main soft spot is the invariance claim. The stress-test note is right that no derivation is visible showing why the FiLM scales and low-rank updates cancel exactly in the ratio p(z|s,g,preference)/p(z|s,g) for arbitrary preference values. If the modulation depends on preference in a way that does not factor out, the inference step could shift in uncontrolled ways. The paper needs to put those algebraic steps in the main text, not just assert preservation. This is aimed at people working on representation-based planners that need controllable trade-offs between efficiency and risk. Readers already using contrastive methods for long-horizon tasks will see the most direct value. It deserves peer review because the idea is clean, the experiments are positive, and the gap is fixable rather than fatal.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MoMo, a preference-conditioned contrastive representation learning approach for planning. It jointly conditions the latent representation geometry via Feature-Wise Linear Modulation (FiLM) and the latent prediction operator via low-rank neural modulation, allowing a scalar user preference to modulate plan conservativeness continuously at inference time without retraining. The central claim is that this formulation preserves the probability density ratio encoded in the representation space required for inference-driven contrastive planning, while retaining inference efficiency; experiments across six environments demonstrate improved temporal and preferential consistency over state-augmentation baselines.

Significance. If the density-ratio preservation is rigorously established, MoMo would enable tunable safety-efficiency trade-offs in contrastive planning without sacrificing the core inference advantages of the latent linear system, which is a meaningful extension for applications such as robotics where user-specified risk preferences matter.

major comments (3)

[§3] §3 (Method): The claim that FiLM conditioning of the representation and low-rank modulation of the prediction operator preserve the probability density ratio p(z|s,g,preference)/p(z|s,g) is asserted without an explicit derivation or algebraic cancellation steps. No equations demonstrate invariance of the ratio to the preference scalar under the chosen modulation forms.
[Abstract] Abstract and §3.1: The preservation is presented as an independent property of the modulation choice, yet the manuscript supplies no section reference or proof showing that the FiLM scales and low-rank updates cancel identically in the numerator and denominator for arbitrary preference values.
[§4] §4 (Experiments): Reported improvements in consistency lack error bars, quantitative metrics confirming density-ratio preservation, or ablations isolating the contribution of FiLM versus low-rank modulation, leaving the empirical support for the central theoretical claim incomplete.

minor comments (2)

[§3] Notation for the preference scalar and its injection points should be standardized across equations to avoid ambiguity with conditioning variables.
[Figure 1] Figure captions could explicitly label the FiLM and low-rank modules to improve readability of the architecture diagram.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns regarding the theoretical derivation and experimental reporting. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [§3] §3 (Method): The claim that FiLM conditioning of the representation and low-rank modulation of the prediction operator preserve the probability density ratio p(z|s,g,preference)/p(z|s,g) is asserted without an explicit derivation or algebraic cancellation steps. No equations demonstrate invariance of the ratio to the preference scalar under the chosen modulation forms.

Authors: We agree that an explicit algebraic derivation was not included in the original submission. In the revised manuscript we have added a new subsection (now §3.3) that derives the invariance step-by-step. Starting from the modulated encoder and predictor forms, we show that the FiLM scale and shift terms appear identically in both the numerator and denominator of the density ratio, while the low-rank update to the transition matrix factors out of the linear system solution, leaving the ratio unchanged for any scalar preference value. The derivation confirms that the contrastive planning objective remains valid under continuous modulation. revision: yes
Referee: [Abstract] Abstract and §3.1: The preservation is presented as an independent property of the modulation choice, yet the manuscript supplies no section reference or proof showing that the FiLM scales and low-rank updates cancel identically in the numerator and denominator for arbitrary preference values.

Authors: We have updated both the abstract and §3.1 to explicitly reference the new derivation subsection. The abstract now states that the preservation follows from the algebraic cancellation shown in §3.3, and §3.1 includes a brief forward pointer to the proof. This makes the independence of the property from specific preference values clear. revision: yes
Referee: [§4] §4 (Experiments): Reported improvements in consistency lack error bars, quantitative metrics confirming density-ratio preservation, or ablations isolating the contribution of FiLM versus low-rank modulation, leaving the empirical support for the central theoretical claim incomplete.

Authors: We have added error bars (standard deviation over 5 random seeds) to all consistency metrics in Tables 1–3 and Figure 4. We also include a new ablation table (Table 4) that isolates FiLM-only, low-rank-only, and joint modulation, showing that both components are required for smooth preference modulation. A direct quantitative metric for density-ratio preservation is difficult to obtain in high-dimensional latent spaces without ground-truth densities; we therefore rely on the indirect but theoretically grounded consistency metrics, which we now explicitly link back to the derivation in §3.3. revision: partial

Circularity Check

0 steps flagged

No circularity: density-ratio preservation presented as independent property of modulation

full rationale

The paper claims that Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator preserves the probability density ratio required for inference-driven contrastive planning. No equations, self-citations, or fitted parameters are exhibited that reduce this preservation to a tautology, a renamed input, or a self-referential result by construction. The central assertion is offered as a mathematical consequence of the chosen conditioning operators rather than a re-expression of the training objective or a prior author result. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the unproven preservation property under the chosen modulations; no free parameters or new entities are explicitly introduced beyond the conditioning mechanisms.

axioms (2)

domain assumption Temporally contrastive representation learning induces a latent structure that reduces planning to inference in a low-dimensional linear system
Stated as background from existing contrastive planning work.
ad hoc to paper Joint conditioning via FiLM and low-rank modulation preserves the probability density ratio
Load-bearing claim required for the inference procedure to remain valid.

pith-pipeline@v0.9.0 · 5448 in / 1268 out tokens · 48351 ms · 2026-05-15T06:03:44.584405+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 13 internal anchors

[1]

Harry Asada

Roberto Bolli and H. Harry Asada. Elderly bodily assistance robot (e-bar): A robot system for body-weight support, ambulation assistance, and fall catching, without the use of a harness. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 713–719,

work page
[2]

doi: 10.1109/ICRA55743.2025.11127403

work page doi:10.1109/icra55743.2025.11127403 2025
[3]

Papadopoulos

Jorgen Cani, Panagiotis Koletsis, Konstantinos Foteinos, Ioannis Kefaloukos, Lampros Argyriou, Manolis Falelakis, Iván Del Pino, Angel Santamaria-Navarro, Martin ˇCech, Ondˇrej Severa, Alessandro Umbrico, Francesca Fracasso, AndreA Orlandini, Dimitrios Drakoulis, Evangelos Markakis, Iraklis Varlamis, and Georgios Th. Papadopoulos. Triffid: Autonomous robo...

work page arXiv 2025
[4]

Wu, Yonjae Kim, Axel Krieger, and Peter C.W

Simon Leonard, Kyle L. Wu, Yonjae Kim, Axel Krieger, and Peter C.W. Kim. Smart tissue anastomosis robot (star): A vision-guided robotics system for laparoscopic suturing.IEEE Transactions on Biomedical Engineering, 61(4):1305–1317, 2014. doi: 10.1109/TBME.2014. 2302385

work page doi:10.1109/tbme.2014 2014
[5]

Lars Blackmore and Masahiro Ono.Convex Chance Constrained Predictive Control Without Sampling. 2009. doi: 10.2514/6.2009-5876. URL https://arc.aiaa.org/doi/abs/10. 2514/6.2009-5876

work page doi:10.2514/6.2009-5876 2009
[6]

Williams

Masahiro Ono and Brian C. Williams. Iterative risk allocation: A new approach to robust model predictive control with a joint chance constraint. In2008 47th IEEE Conference on Decision and Control, pages 3427–3432, 2008. doi: 10.1109/CDC.2008.4739221

work page doi:10.1109/cdc.2008.4739221 2008
[7]

Hofmann, and Brian C

Charles Dawson, Ashkan Jasour, Andreas G. Hofmann, and Brian C. Williams. Provably safe trajectory optimization in the presence of uncertain convex obstacles.CoRR, 2020. URL https://arxiv.org/abs/2003.07811

work page arXiv 2020
[8]

Gavrila, and Javier Alonso-Mora

Oscar de Groot, Laura Ferranti, Dariu M. Gavrila, and Javier Alonso-Mora. Scenario-based motion planning with bounded probability of collision.The International Journal of Robotics Research, 44(9):1507–1525, 2025. doi: 10.1177/02783649251315203. URL https://doi. org/10.1177/02783649251315203

work page doi:10.1177/02783649251315203 2025
[9]

Williams

Xin Huang, Sungkweon Hong, Andreas Hofmann, and Brian C. Williams. Online risk-bounded motion planning for autonomous vehicles in dynamic environments.Proceedings of the Interna- tional Conference on Automated Planning and Scheduling, 29(1):214–222, Jul. 2019. doi: 10.1609/icaps.v29i1.3479. URL https://ojs.aaai.org/index.php/ICAPS/article/ view/3479

work page doi:10.1609/icaps.v29i1.3479 2019
[10]

Chance constrained motion planning for high-dimensional robots

Siyu Dai, Shawn Schaffert, Ashkan Jasour, Andreas Hofmann, and Brian Williams. Chance constrained motion planning for high-dimensional robots. In2019 International Conference on Robotics and Automation (ICRA), pages 8805–8811, 2019. doi: 10.1109/ICRA.2019.8793660

work page doi:10.1109/icra.2019.8793660 2019
[11]

Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning

Meng Feng, Viraj Parimi, and Brian Williams. Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), page 16869–16875. IEEE, May 2025. doi: 10.1109/icra55743.2025. 11127461. URLhttp://dx.doi.org/10.1109/ICRA55743.2025.11127461

work page doi:10.1109/icra55743.2025 2025
[12]

RAIL: Risk-Averse Imitation Learning

Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, and Bharat Kaul. Rail: Risk-averse imitation learning, 2017. URL https://arxiv.org/abs/1707.06658

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Jasour, Guy Rosman, and Brian Charles Williams

Xin Huang, Meng Feng, Ashkan M. Jasour, Guy Rosman, and Brian Charles Williams. Risk conditioned neural motion planning.2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9057–9063, 2021. URL https://api.semanticscholar. org/CorpusID:236912881

work page 2021
[14]

Gwangpyo Yoo, Jinwoo Park, and Honguk Woo. Risk-conditioned reinforcement learning: A generalized approach for adapting to varying risk measures.Proceedings of the AAAI Conference on Artificial Intelligence, 38(15):16513–16521, Mar. 2024. doi: 10.1609/aaai.v38i15.29589. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/29589. 10

work page doi:10.1609/aaai.v38i15.29589 2024
[15]

Saformer: A conditional sequence modeling approach to offline safe reinforcement learning, 2023

Qin Zhang, Linrui Zhang, Haoran Xu, Li Shen, Bowen Wang, Yongzhe Chang, Xueqian Wang, Bo Yuan, and Dacheng Tao. Saformer: A conditional sequence modeling approach to offline safe reinforcement learning, 2023. URLhttps://arxiv.org/abs/2301.12203

work page arXiv 2023
[16]

Pacer: Preference-conditioned all-terrain costmap generation.arXiv preprint arXiv:2410.23488, 2024

Luisa Mao, Garrett Warnell, Peter Stone, and Joydeep Biswas. Pacer: Preference-conditioned all-terrain costmap generation.arXiv preprint arXiv:2410.23488, 2024. URL https://arxiv. org/abs/2410.23488

work page arXiv 2024
[17]

Contrastive learning as goal-conditioned reinforcement learning, 2023

Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine. Contrastive learning as goal-conditioned reinforcement learning, 2023. URLhttps://arxiv.org/abs/ 2206.07568

work page arXiv 2023
[18]

Inference via interpolation: Contrastive representations provably enable planning and inference, 2025

Benjamin Eysenbach, Vivek Myers, Ruslan Salakhutdinov, and Sergey Levine. Inference via interpolation: Contrastive representations provably enable planning and inference, 2025. URL https://arxiv.org/abs/2403.04082

work page arXiv 2025
[19]

Contrastive difference predictive coding, 2025

Chongyi Zheng, Ruslan Salakhutdinov, and Benjamin Eysenbach. Contrastive difference predictive coding, 2025. URLhttps://arxiv.org/abs/2310.20141

work page arXiv 2025
[20]

Stabilizing contrastive rl: Techniques for robotic goal reaching from offline data, 2025

Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhut- dinov, and Sergey Levine. Stabilizing contrastive rl: Techniques for robotic goal reaching from offline data, 2025. URLhttps://arxiv.org/abs/2306.03346

work page arXiv 2025
[21]

Risk-aware motion planning and control using cvar-constrained optimization.IEEE Robotics and Automation Letters, 4(4): 3924–3931, 2019

Astghik Hakobyan, Gyeong Chan Kim, and Insoon Yang. Risk-aware motion planning and control using cvar-constrained optimization.IEEE Robotics and Automation Letters, 4(4): 3924–3931, 2019. doi: 10.1109/LRA.2019.2929980

work page doi:10.1109/lra.2019.2929980 2019
[22]

Belief control barrier functions for risk-aware control, 2023

Matti Vahs, Christian Pek, and Jana Tumova. Belief control barrier functions for risk-aware control, 2023. URLhttps://arxiv.org/abs/2309.06499

work page arXiv 2023
[23]

Williams

Viraj Parimi and Brian C. Williams. Risk-bounded multi-agent visual navigation via iterative risk allocation, 2026. URLhttps://arxiv.org/abs/2509.08157

work page arXiv 2026
[24]

Constrained decision transformer for offline safe reinforcement learning

Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, and Ding Zhao. Constrained decision transformer for offline safe reinforcement learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 21611–21630. PMLR, 2023. URL https://proceedings.mlr. press/v...

work page 2023
[25]

Temporal logic specification-conditioned deci- sion transformer for offline safe reinforcement learning

Zijian Guo, Weichao Zhou, and Wenchao Li. Temporal logic specification-conditioned deci- sion transformer for offline safe reinforcement learning. InProceedings of the 41st Interna- tional Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 17003–17019. PMLR, 2024. URL https://proceedings.mlr.press/ v235/guo24j.html

work page 2024
[26]

Constraint-conditioned actor-critic for offline safe reinforcement learning

Zijian Guo, Weichao Zhou, Shengao Wang, and Wenchao Li. Constraint-conditioned actor-critic for offline safe reinforcement learning. InInternational Conference on Learning Representa- tions, 2025. URLhttps://openreview.net/forum?id=nrRkAAAufl

work page 2025
[27]

Datasets and benchmarks for offline safe reinforcement learning.Journal of Data-centric Machine Learning Research, 2024

Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, and Ding Zhao. Datasets and benchmarks for offline safe reinforcement learning.Journal of Data-centric Machine Learning Research, 2024

work page 2024
[28]

A generalized algorithm for multi- objective reinforcement learning and policy adaptation

Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi- objective reinforcement learning and policy adaptation. InAdvances in Neural Information Processing Systems, 2019. URLhttps://openreview.net/forum?id=B1lR3HBeUB

work page 2019
[29]

Distributional pareto-optimal multi-objective reinforcement learning

Xin-Qiang Cai, Pushi Zhang, Li Zhao, Jiang Bian, Masashi Sugiyama, and Ashley Juan Llorens. Distributional pareto-optimal multi-objective reinforcement learning. InAdvances in Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id= prIwYTU9PV. 11

work page 2023
[30]

Scaling pareto-efficient decision making via offline multi-objective rl

Baiting Zhu, Meihua Dang, and Aditya Grover. Scaling pareto-efficient decision making via offline multi-objective rl. ICLR 2023 Poster, 2023. URL https://openreview.net/forum? id=Ki4ocDm364

work page 2023
[31]

Efficient discovery of pareto front for multi-objective reinforcement learning

Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, and Jiang Bian. Efficient discovery of pareto front for multi-objective reinforcement learning. InInternational Conference on Learning Representations, 2025. URL https://openreview.net/forum? id=fDGPIuCdGi

work page 2025
[32]

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images, 2015. URL https://arxiv.org/abs/1506.07365

work page internal anchor Pith review Pith/arXiv arXiv 2015
[33]

Learning Latent Dynamics for Planning from Pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels, 2019. URL https: //arxiv.org/abs/1811.04551

work page internal anchor Pith review Pith/arXiv arXiv 2019
[34]

Robot Motion Planning in Learned Latent Spaces

Brian Ichter and Marco Pavone. Robot motion planning in learned latent spaces, 2018. URL https://arxiv.org/abs/1807.10366

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

A constrained motion planning method exploiting learned latent space for high-dimensional state and constraint spaces.IEEE/ASME Transactions on Mechatronics, 29(4):3001–3009, 2024

Suhan Park, Suhyun Jeon, and Jaeheung Park. A constrained motion planning method exploiting learned latent space for high-dimensional state and constraint spaces.IEEE/ASME Transactions on Mechatronics, 29(4):3001–3009, 2024. doi: 10.1109/TMECH.2024.3399594

work page doi:10.1109/tmech.2024.3399594 2024
[36]

Local path opti- mization in the latent space using learned distance gradient

Jiawei Zhang, Chengchao Bai, Wei Pan, Tianhang Liu, and Jifeng Guo. Local path opti- mization in the latent space using learned distance gradient. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 15940–15946. IEEE, 2025. doi: 10.1109/iros60139.2025.11247535. URL http://dx.doi.org/10.1109/IROS60139.2025. 11247535

work page doi:10.1109/iros60139.2025.11247535 2025
[37]

Williams

Marlyse Reeves and Brian C. Williams. Laplass: Latent space planning for stochastic systems,

work page
[38]

URLhttps://arxiv.org/abs/2404.07063

work page arXiv
[39]

Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes.International Journal of Social Robotics, 4(4):357–368, 2012

Shuonan Dong and Brian Williams. Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes.International Journal of Social Robotics, 4(4):357–368, 2012. doi: 10.1007/s12369-012-0155-x. URL https://doi.org/10.1007/ s12369-012-0155-x

work page doi:10.1007/s12369-012-0155-x 2012
[40]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. An introduction to variational autoencoders.Foundations and Trends® in Machine Learning, 12(4):307–392, November 2019. ISSN 1935-8245. doi: 10.1561/2200000056. URLhttp://dx.doi.org/10.1561/2200000056

work page doi:10.1561/2200000056 2019
[41]

Time-contrastive networks: Self-supervised learning from video

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. Time-contrastive networks: Self-supervised learning from video. In2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1134–1141, 2018. doi: 10.1109/ICRA.2018.8462891. URLhttps://doi.org/10.1109/ICRA.2018.8462891

work page doi:10.1109/icra.2018.8462891 2018
[42]

Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

Peter Dayan. Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993. doi: 10.1162/neco.1993.5.4.613. URL https://doi.org/10.1162/neco.1993.5.4.613

work page doi:10.1162/neco.1993.5.4.613 1993
[43]

Hunt, Tom Schaul, Hado P

André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van Hasselt, and David Silver. Successor features for transfer in reinforce- ment learning. InAdvances in Neural Information Processing Systems 30, pages 4058–4068, 2017. URL https://papers.nips.cc/paper_files/paper/2017/hash/ 350db081a661525235354dd3e19b8c05-Abstract.html

work page 2017
[44]

Conditional Similarity Networks

Andreas Veit, Serge Belongie, and Theofanis Karaletsos. Conditional similarity networks, 2017. URLhttps://arxiv.org/abs/1603.07810

work page internal anchor Pith review Pith/arXiv arXiv 2017
[45]

Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhut- dinov, and Louis-Philippe Morency

Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhut- dinov, and Louis-Philippe Morency. Conditional contrastive learning for improving fairness in self-supervised learning, 2022. URLhttps://arxiv.org/abs/2106.02866. 12

work page arXiv 2022
[46]

Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, and Ruslan Salakhutdinov

Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, and Ruslan Salakhutdinov. Conditional contrastive learning with kernel, 2022. URL https://arxiv.org/abs/2202.05458

work page arXiv 2022
[47]

Conditional contrastive networks, 2022

Emily Mu and John Guttag. Conditional contrastive networks, 2022. URL https: //table-representation-learning.github.io/assets/papers/conditional_ contrastive_networ.pdf

work page 2022
[48]

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer, 2017. URL https://arxiv.org/abs/ 1709.07871

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

David Ha, Andrew Dai, and Quoc V . Le. Hypernetworks, 2016. URLhttps://arxiv.org/ abs/1609.09106

work page internal anchor Pith review Pith/arXiv arXiv 2016
[50]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding, 2019. URLhttps://arxiv.org/abs/1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2019
[51]

Tyrrell Rockafellar and Stanislav Uryasev

R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at risk.Journal of Risk, 3:21–41, 2000. URLhttps://api.semanticscholar.org/CorpusID:854622

work page 2000
[52]

Prajit Ramachandran, Barret Zoph, and Quoc V . Le. Searching for activation functions, 2017. URLhttps://arxiv.org/abs/1710.05941

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T

Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains, 2020. URL https: //arxiv.org/abs/2006.10739

work page arXiv 2020
[54]

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[55]

Habitat: A Platform for Embodied AI Research

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A Platform for Embodied AI Research. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

work page 2019
[56]

Habitat 2.0: Training home assistants to rearrange their habitat

Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir V ondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Training home assist...

work page 2021
[57]

Habitat 3.0: A co-habitat for humans, avatars and robots, 2023

Xavi Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Ruslan Partsey, Jimmy Yang, Ruta Desai, Alexander William Clegg, Michal Hlavac, Tiffany Min, Theo Gervet, Vladimir V ondrus, Vincent-Pierre Berges, John Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakr- ishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai...

work page 2023
[58]

D4rl: Datasets for deep data-driven reinforcement learning, 2021

Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning, 2021. URL https://arxiv.org/abs/2004. 07219

work page 2021
[59]

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, and Amy Zhang. Vip: Towards universal visual reward and representation via value-implicit pre-training, 2023. URLhttps://arxiv.org/abs/2210.00030. 13

work page internal anchor Pith review Pith/arXiv arXiv 2023
[60]

E. W. Hobson. On the fundamental lemma of the calculus of variations, and on some related theorems.Proceedings of the London Mathematical Society, s2-11(1):17–28, 1913. doi: https: //doi.org/10.1112/plms/s2-11.1.17. URL https://londmathsoc.onlinelibrary.wiley. com/doi/abs/10.1112/plms/s2-11.1.17

work page doi:10.1112/plms/s2-11.1.17 1913
[61]

f-micl: Understanding and generalizing infonce-based contrastive learning, 2024

Yiwei Lu, Guojun Zhang, Sun Sun, Hongyu Guo, and Yaoliang Yu. f-micl: Understanding and generalizing infonce-based contrastive learning, 2024. URL https://arxiv.org/abs/2402. 10150

work page 2024
[62]

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

Zhuang Ma and Michael Collins. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency, 2018. URL https://arxiv.org/ abs/1809.01812

work page internal anchor Pith review Pith/arXiv arXiv 2018
[63]

Kuffner and S.M

J.J. Kuffner and S.M. LaValle. Rrt-connect: An efficient approach to single-query path planning. InProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), volume 2, pages 995–1001 vol.2, 2000. doi: 10.1109/ROBOT.2000.844730

work page doi:10.1109/robot.2000.844730 2000
[64]

On the Spectral Bias of Neural Networks

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks, 2019. URL https://arxiv.org/abs/1806.08734

work page internal anchor Pith review Pith/arXiv arXiv 2019
[65]

Infonce induces gaussian distribu- tion, 2026

Roy Betser, Eyal Gofer, Meir Yossef Levi, and Guy Gilboa. Infonce induces gaussian distribu- tion, 2026. URLhttps://arxiv.org/abs/2602.24012. 14 A Conditioned Contrastive Theory A.1 Density Encoding in Learned Logits The Gauss-Markov inference-driven planning method outlined in Section 3.2 requires learned logits to encode the probability ratio expressed ...

work page arXiv 2026