pith. machine review for the scientific record. sign in

arxiv: 2605.08512 · v2 · submitted 2026-05-08 · 💻 cs.LG

Recognition: no theorem link

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords contrastive representation learningpreference modulationplanningconditional learningdensity ratio preservationinference-based planning
0
0 comments X

The pith

MoMo conditions contrastive representations with a scalar preference to modulate planning conservativeness at inference time while preserving density ratios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MoMo to address the limitation of single latent geometries in contrastive planning that cannot handle multiple behaviors for the same query. It allows continuous modulation of plan safety via user preference without retraining by jointly conditioning the representation geometry with Feature-Wise Linear Modulation and the latent prediction operator with low-rank neural modulation. This setup is shown to preserve the probability density ratio needed for efficient inference-driven planning. A sympathetic reader would care because it enables flexible, preference-aware planning in complex environments while keeping computational efficiency.

Core claim

MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively, which preserves the probability density ratio encoded in the representation space required for inference-driven contrastive planning and retains its inference-time efficiency.

What carries the argument

Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator, which together enable preference-conditioned planning without altering the core contrastive structure.

If this is right

  • MoMo smoothly adapts plan safety according to user preferences across environments.
  • It yields improved temporal and preferential consistency over state augmentation baselines.
  • The formulation retains inference-time efficiency of contrastive planning.
  • Plans can trade task efficiency against risk exposure continuously for the same start-goal query.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar modulation techniques could extend to other latent-space planning methods beyond contrastive ones.
  • Users might achieve fine-grained control in real-world robotics by tuning a single preference parameter.
  • Testing in more dynamic environments could reveal limits of the preservation of density ratios.

Load-bearing premise

That Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator preserves the probability density ratio without introducing inconsistencies or requiring retraining.

What would settle it

An observation that the modulated representations lead to inconsistent probability density ratios or require retraining to maintain planning accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.08512 by Brian Williams, Viraj Parimi, Yusuf Syed.

Figure 1
Figure 1. Figure 1: Preference-conditioned planning in a risk-structured environment with four obstacles and a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Preference-conditioned planning on the 29-dimensional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Preference-conditioned planning on a pick-and-place [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Preference modulation and planning fidelity across six environments. For each environment, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Matrix summaries of representation space manifold deformation for Point Four Obstacle [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Summary of FiLM and transition matrix conditioning mechanism parameters for Ant [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
read the original abstract

Temporally contrastive representation learning induces a latent structure capable of reducing long-horizon planning to inference in a low-dimensional linear system. However, existing contrastive planning work learns a single latent geometry which cannot distinguish multiple valid behaviors trading task efficiency against risk exposure for the same start-goal query. We introduce MoMo, a preference-conditioned contrastive planner allowing a scalar user preference to continuously modulate plan conservativeness at inference time, without retraining. MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively. We show that our formulation preserves the probability density ratio encoded in the representation space that is required for inference-driven contrastive planning, further retaining its inference-time efficiency. Across six environments, MoMo smoothly adapts plan safety according to user preferences, yielding improved temporal and preferential consistency over state augmentation baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MoMo, a preference-conditioned contrastive representation learning approach for planning. It jointly conditions the latent representation geometry via Feature-Wise Linear Modulation (FiLM) and the latent prediction operator via low-rank neural modulation, allowing a scalar user preference to modulate plan conservativeness continuously at inference time without retraining. The central claim is that this formulation preserves the probability density ratio encoded in the representation space required for inference-driven contrastive planning, while retaining inference efficiency; experiments across six environments demonstrate improved temporal and preferential consistency over state-augmentation baselines.

Significance. If the density-ratio preservation is rigorously established, MoMo would enable tunable safety-efficiency trade-offs in contrastive planning without sacrificing the core inference advantages of the latent linear system, which is a meaningful extension for applications such as robotics where user-specified risk preferences matter.

major comments (3)
  1. [§3] §3 (Method): The claim that FiLM conditioning of the representation and low-rank modulation of the prediction operator preserve the probability density ratio p(z|s,g,preference)/p(z|s,g) is asserted without an explicit derivation or algebraic cancellation steps. No equations demonstrate invariance of the ratio to the preference scalar under the chosen modulation forms.
  2. [Abstract] Abstract and §3.1: The preservation is presented as an independent property of the modulation choice, yet the manuscript supplies no section reference or proof showing that the FiLM scales and low-rank updates cancel identically in the numerator and denominator for arbitrary preference values.
  3. [§4] §4 (Experiments): Reported improvements in consistency lack error bars, quantitative metrics confirming density-ratio preservation, or ablations isolating the contribution of FiLM versus low-rank modulation, leaving the empirical support for the central theoretical claim incomplete.
minor comments (2)
  1. [§3] Notation for the preference scalar and its injection points should be standardized across equations to avoid ambiguity with conditioning variables.
  2. [Figure 1] Figure captions could explicitly label the FiLM and low-rank modules to improve readability of the architecture diagram.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns regarding the theoretical derivation and experimental reporting. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The claim that FiLM conditioning of the representation and low-rank modulation of the prediction operator preserve the probability density ratio p(z|s,g,preference)/p(z|s,g) is asserted without an explicit derivation or algebraic cancellation steps. No equations demonstrate invariance of the ratio to the preference scalar under the chosen modulation forms.

    Authors: We agree that an explicit algebraic derivation was not included in the original submission. In the revised manuscript we have added a new subsection (now §3.3) that derives the invariance step-by-step. Starting from the modulated encoder and predictor forms, we show that the FiLM scale and shift terms appear identically in both the numerator and denominator of the density ratio, while the low-rank update to the transition matrix factors out of the linear system solution, leaving the ratio unchanged for any scalar preference value. The derivation confirms that the contrastive planning objective remains valid under continuous modulation. revision: yes

  2. Referee: [Abstract] Abstract and §3.1: The preservation is presented as an independent property of the modulation choice, yet the manuscript supplies no section reference or proof showing that the FiLM scales and low-rank updates cancel identically in the numerator and denominator for arbitrary preference values.

    Authors: We have updated both the abstract and §3.1 to explicitly reference the new derivation subsection. The abstract now states that the preservation follows from the algebraic cancellation shown in §3.3, and §3.1 includes a brief forward pointer to the proof. This makes the independence of the property from specific preference values clear. revision: yes

  3. Referee: [§4] §4 (Experiments): Reported improvements in consistency lack error bars, quantitative metrics confirming density-ratio preservation, or ablations isolating the contribution of FiLM versus low-rank modulation, leaving the empirical support for the central theoretical claim incomplete.

    Authors: We have added error bars (standard deviation over 5 random seeds) to all consistency metrics in Tables 1–3 and Figure 4. We also include a new ablation table (Table 4) that isolates FiLM-only, low-rank-only, and joint modulation, showing that both components are required for smooth preference modulation. A direct quantitative metric for density-ratio preservation is difficult to obtain in high-dimensional latent spaces without ground-truth densities; we therefore rely on the indirect but theoretically grounded consistency metrics, which we now explicitly link back to the derivation in §3.3. revision: partial

Circularity Check

0 steps flagged

No circularity: density-ratio preservation presented as independent property of modulation

full rationale

The paper claims that Feature-Wise Linear Modulation of the representation geometry combined with low-rank neural modulation of the prediction operator preserves the probability density ratio required for inference-driven contrastive planning. No equations, self-citations, or fitted parameters are exhibited that reduce this preservation to a tautology, a renamed input, or a self-referential result by construction. The central assertion is offered as a mathematical consequence of the chosen conditioning operators rather than a re-expression of the training objective or a prior author result. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the unproven preservation property under the chosen modulations; no free parameters or new entities are explicitly introduced beyond the conditioning mechanisms.

axioms (2)
  • domain assumption Temporally contrastive representation learning induces a latent structure that reduces planning to inference in a low-dimensional linear system
    Stated as background from existing contrastive planning work.
  • ad hoc to paper Joint conditioning via FiLM and low-rank modulation preserves the probability density ratio
    Load-bearing claim required for the inference procedure to remain valid.

pith-pipeline@v0.9.0 · 5448 in / 1268 out tokens · 48351 ms · 2026-05-15T06:03:44.584405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 13 internal anchors

  1. [1]

    Harry Asada

    Roberto Bolli and H. Harry Asada. Elderly bodily assistance robot (e-bar): A robot system for body-weight support, ambulation assistance, and fall catching, without the use of a harness. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 713–719,

  2. [2]

    doi: 10.1109/ICRA55743.2025.11127403

  3. [3]

    Papadopoulos

    Jorgen Cani, Panagiotis Koletsis, Konstantinos Foteinos, Ioannis Kefaloukos, Lampros Argyriou, Manolis Falelakis, Iván Del Pino, Angel Santamaria-Navarro, Martin ˇCech, Ondˇrej Severa, Alessandro Umbrico, Francesca Fracasso, AndreA Orlandini, Dimitrios Drakoulis, Evangelos Markakis, Iraklis Varlamis, and Georgios Th. Papadopoulos. Triffid: Autonomous robo...

  4. [4]

    Wu, Yonjae Kim, Axel Krieger, and Peter C.W

    Simon Leonard, Kyle L. Wu, Yonjae Kim, Axel Krieger, and Peter C.W. Kim. Smart tissue anastomosis robot (star): A vision-guided robotics system for laparoscopic suturing.IEEE Transactions on Biomedical Engineering, 61(4):1305–1317, 2014. doi: 10.1109/TBME.2014. 2302385

  5. [5]

    Lars Blackmore and Masahiro Ono.Convex Chance Constrained Predictive Control Without Sampling. 2009. doi: 10.2514/6.2009-5876. URL https://arc.aiaa.org/doi/abs/10. 2514/6.2009-5876

  6. [6]

    Williams

    Masahiro Ono and Brian C. Williams. Iterative risk allocation: A new approach to robust model predictive control with a joint chance constraint. In2008 47th IEEE Conference on Decision and Control, pages 3427–3432, 2008. doi: 10.1109/CDC.2008.4739221

  7. [7]

    Hofmann, and Brian C

    Charles Dawson, Ashkan Jasour, Andreas G. Hofmann, and Brian C. Williams. Provably safe trajectory optimization in the presence of uncertain convex obstacles.CoRR, 2020. URL https://arxiv.org/abs/2003.07811

  8. [8]

    Gavrila, and Javier Alonso-Mora

    Oscar de Groot, Laura Ferranti, Dariu M. Gavrila, and Javier Alonso-Mora. Scenario-based motion planning with bounded probability of collision.The International Journal of Robotics Research, 44(9):1507–1525, 2025. doi: 10.1177/02783649251315203. URL https://doi. org/10.1177/02783649251315203

  9. [9]

    Williams

    Xin Huang, Sungkweon Hong, Andreas Hofmann, and Brian C. Williams. Online risk-bounded motion planning for autonomous vehicles in dynamic environments.Proceedings of the Interna- tional Conference on Automated Planning and Scheduling, 29(1):214–222, Jul. 2019. doi: 10.1609/icaps.v29i1.3479. URL https://ojs.aaai.org/index.php/ICAPS/article/ view/3479

  10. [10]

    Chance constrained motion planning for high-dimensional robots

    Siyu Dai, Shawn Schaffert, Ashkan Jasour, Andreas Hofmann, and Brian Williams. Chance constrained motion planning for high-dimensional robots. In2019 International Conference on Robotics and Automation (ICRA), pages 8805–8811, 2019. doi: 10.1109/ICRA.2019.8793660

  11. [11]

    Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning

    Meng Feng, Viraj Parimi, and Brian Williams. Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), page 16869–16875. IEEE, May 2025. doi: 10.1109/icra55743.2025. 11127461. URLhttp://dx.doi.org/10.1109/ICRA55743.2025.11127461

  12. [12]

    RAIL: Risk-Averse Imitation Learning

    Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, and Bharat Kaul. Rail: Risk-averse imitation learning, 2017. URL https://arxiv.org/abs/1707.06658

  13. [13]

    Jasour, Guy Rosman, and Brian Charles Williams

    Xin Huang, Meng Feng, Ashkan M. Jasour, Guy Rosman, and Brian Charles Williams. Risk conditioned neural motion planning.2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9057–9063, 2021. URL https://api.semanticscholar. org/CorpusID:236912881

  14. [14]

    Gwangpyo Yoo, Jinwoo Park, and Honguk Woo. Risk-conditioned reinforcement learning: A generalized approach for adapting to varying risk measures.Proceedings of the AAAI Conference on Artificial Intelligence, 38(15):16513–16521, Mar. 2024. doi: 10.1609/aaai.v38i15.29589. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/29589. 10

  15. [15]

    Saformer: A conditional sequence modeling approach to offline safe reinforcement learning, 2023

    Qin Zhang, Linrui Zhang, Haoran Xu, Li Shen, Bowen Wang, Yongzhe Chang, Xueqian Wang, Bo Yuan, and Dacheng Tao. Saformer: A conditional sequence modeling approach to offline safe reinforcement learning, 2023. URLhttps://arxiv.org/abs/2301.12203

  16. [16]

    Pacer: Preference-conditioned all-terrain costmap generation.arXiv preprint arXiv:2410.23488, 2024

    Luisa Mao, Garrett Warnell, Peter Stone, and Joydeep Biswas. Pacer: Preference-conditioned all-terrain costmap generation.arXiv preprint arXiv:2410.23488, 2024. URL https://arxiv. org/abs/2410.23488

  17. [17]

    Contrastive learning as goal-conditioned reinforcement learning, 2023

    Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine. Contrastive learning as goal-conditioned reinforcement learning, 2023. URLhttps://arxiv.org/abs/ 2206.07568

  18. [18]

    Inference via interpolation: Contrastive representations provably enable planning and inference, 2025

    Benjamin Eysenbach, Vivek Myers, Ruslan Salakhutdinov, and Sergey Levine. Inference via interpolation: Contrastive representations provably enable planning and inference, 2025. URL https://arxiv.org/abs/2403.04082

  19. [19]

    Contrastive difference predictive coding, 2025

    Chongyi Zheng, Ruslan Salakhutdinov, and Benjamin Eysenbach. Contrastive difference predictive coding, 2025. URLhttps://arxiv.org/abs/2310.20141

  20. [20]

    Stabilizing contrastive rl: Techniques for robotic goal reaching from offline data, 2025

    Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhut- dinov, and Sergey Levine. Stabilizing contrastive rl: Techniques for robotic goal reaching from offline data, 2025. URLhttps://arxiv.org/abs/2306.03346

  21. [21]

    Risk-aware motion planning and control using cvar-constrained optimization.IEEE Robotics and Automation Letters, 4(4): 3924–3931, 2019

    Astghik Hakobyan, Gyeong Chan Kim, and Insoon Yang. Risk-aware motion planning and control using cvar-constrained optimization.IEEE Robotics and Automation Letters, 4(4): 3924–3931, 2019. doi: 10.1109/LRA.2019.2929980

  22. [22]

    Belief control barrier functions for risk-aware control, 2023

    Matti Vahs, Christian Pek, and Jana Tumova. Belief control barrier functions for risk-aware control, 2023. URLhttps://arxiv.org/abs/2309.06499

  23. [23]

    Williams

    Viraj Parimi and Brian C. Williams. Risk-bounded multi-agent visual navigation via iterative risk allocation, 2026. URLhttps://arxiv.org/abs/2509.08157

  24. [24]

    Constrained decision transformer for offline safe reinforcement learning

    Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, and Ding Zhao. Constrained decision transformer for offline safe reinforcement learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 21611–21630. PMLR, 2023. URL https://proceedings.mlr. press/v...

  25. [25]

    Temporal logic specification-conditioned deci- sion transformer for offline safe reinforcement learning

    Zijian Guo, Weichao Zhou, and Wenchao Li. Temporal logic specification-conditioned deci- sion transformer for offline safe reinforcement learning. InProceedings of the 41st Interna- tional Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 17003–17019. PMLR, 2024. URL https://proceedings.mlr.press/ v235/guo24j.html

  26. [26]

    Constraint-conditioned actor-critic for offline safe reinforcement learning

    Zijian Guo, Weichao Zhou, Shengao Wang, and Wenchao Li. Constraint-conditioned actor-critic for offline safe reinforcement learning. InInternational Conference on Learning Representa- tions, 2025. URLhttps://openreview.net/forum?id=nrRkAAAufl

  27. [27]

    Datasets and benchmarks for offline safe reinforcement learning.Journal of Data-centric Machine Learning Research, 2024

    Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, and Ding Zhao. Datasets and benchmarks for offline safe reinforcement learning.Journal of Data-centric Machine Learning Research, 2024

  28. [28]

    A generalized algorithm for multi- objective reinforcement learning and policy adaptation

    Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi- objective reinforcement learning and policy adaptation. InAdvances in Neural Information Processing Systems, 2019. URLhttps://openreview.net/forum?id=B1lR3HBeUB

  29. [29]

    Distributional pareto-optimal multi-objective reinforcement learning

    Xin-Qiang Cai, Pushi Zhang, Li Zhao, Jiang Bian, Masashi Sugiyama, and Ashley Juan Llorens. Distributional pareto-optimal multi-objective reinforcement learning. InAdvances in Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id= prIwYTU9PV. 11

  30. [30]

    Scaling pareto-efficient decision making via offline multi-objective rl

    Baiting Zhu, Meihua Dang, and Aditya Grover. Scaling pareto-efficient decision making via offline multi-objective rl. ICLR 2023 Poster, 2023. URL https://openreview.net/forum? id=Ki4ocDm364

  31. [31]

    Efficient discovery of pareto front for multi-objective reinforcement learning

    Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, and Jiang Bian. Efficient discovery of pareto front for multi-objective reinforcement learning. InInternational Conference on Learning Representations, 2025. URL https://openreview.net/forum? id=fDGPIuCdGi

  32. [32]

    Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

    Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images, 2015. URL https://arxiv.org/abs/1506.07365

  33. [33]

    Learning Latent Dynamics for Planning from Pixels

    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels, 2019. URL https: //arxiv.org/abs/1811.04551

  34. [34]

    Robot Motion Planning in Learned Latent Spaces

    Brian Ichter and Marco Pavone. Robot motion planning in learned latent spaces, 2018. URL https://arxiv.org/abs/1807.10366

  35. [35]

    A constrained motion planning method exploiting learned latent space for high-dimensional state and constraint spaces.IEEE/ASME Transactions on Mechatronics, 29(4):3001–3009, 2024

    Suhan Park, Suhyun Jeon, and Jaeheung Park. A constrained motion planning method exploiting learned latent space for high-dimensional state and constraint spaces.IEEE/ASME Transactions on Mechatronics, 29(4):3001–3009, 2024. doi: 10.1109/TMECH.2024.3399594

  36. [36]

    Local path opti- mization in the latent space using learned distance gradient

    Jiawei Zhang, Chengchao Bai, Wei Pan, Tianhang Liu, and Jifeng Guo. Local path opti- mization in the latent space using learned distance gradient. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 15940–15946. IEEE, 2025. doi: 10.1109/iros60139.2025.11247535. URL http://dx.doi.org/10.1109/IROS60139.2025. 11247535

  37. [37]

    Williams

    Marlyse Reeves and Brian C. Williams. Laplass: Latent space planning for stochastic systems,

  38. [38]

    URLhttps://arxiv.org/abs/2404.07063

  39. [39]

    Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes.International Journal of Social Robotics, 4(4):357–368, 2012

    Shuonan Dong and Brian Williams. Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes.International Journal of Social Robotics, 4(4):357–368, 2012. doi: 10.1007/s12369-012-0155-x. URL https://doi.org/10.1007/ s12369-012-0155-x

  40. [40]

    Kingma and Max Welling

    Diederik P. Kingma and Max Welling. An introduction to variational autoencoders.Foundations and Trends® in Machine Learning, 12(4):307–392, November 2019. ISSN 1935-8245. doi: 10.1561/2200000056. URLhttp://dx.doi.org/10.1561/2200000056

  41. [41]

    Time-contrastive networks: Self-supervised learning from video

    Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. Time-contrastive networks: Self-supervised learning from video. In2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1134–1141, 2018. doi: 10.1109/ICRA.2018.8462891. URLhttps://doi.org/10.1109/ICRA.2018.8462891

  42. [42]

    Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993

    Peter Dayan. Improving generalization for temporal difference learning: The successor repre- sentation.Neural Computation, 5(4):613–624, 1993. doi: 10.1162/neco.1993.5.4.613. URL https://doi.org/10.1162/neco.1993.5.4.613

  43. [43]

    Hunt, Tom Schaul, Hado P

    André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van Hasselt, and David Silver. Successor features for transfer in reinforce- ment learning. InAdvances in Neural Information Processing Systems 30, pages 4058–4068, 2017. URL https://papers.nips.cc/paper_files/paper/2017/hash/ 350db081a661525235354dd3e19b8c05-Abstract.html

  44. [44]

    Conditional Similarity Networks

    Andreas Veit, Serge Belongie, and Theofanis Karaletsos. Conditional similarity networks, 2017. URLhttps://arxiv.org/abs/1603.07810

  45. [45]

    Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhut- dinov, and Louis-Philippe Morency

    Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhut- dinov, and Louis-Philippe Morency. Conditional contrastive learning for improving fairness in self-supervised learning, 2022. URLhttps://arxiv.org/abs/2106.02866. 12

  46. [46]

    Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, and Ruslan Salakhutdinov

    Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, and Ruslan Salakhutdinov. Conditional contrastive learning with kernel, 2022. URL https://arxiv.org/abs/2202.05458

  47. [47]

    Conditional contrastive networks, 2022

    Emily Mu and John Guttag. Conditional contrastive networks, 2022. URL https: //table-representation-learning.github.io/assets/papers/conditional_ contrastive_networ.pdf

  48. [48]

    FiLM: Visual Reasoning with a General Conditioning Layer

    Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer, 2017. URL https://arxiv.org/abs/ 1709.07871

  49. [49]

    David Ha, Andrew Dai, and Quoc V . Le. Hypernetworks, 2016. URLhttps://arxiv.org/ abs/1609.09106

  50. [50]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding, 2019. URLhttps://arxiv.org/abs/1807.03748

  51. [51]

    Tyrrell Rockafellar and Stanislav Uryasev

    R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at risk.Journal of Risk, 3:21–41, 2000. URLhttps://api.semanticscholar.org/CorpusID:854622

  52. [52]

    Prajit Ramachandran, Barret Zoph, and Quoc V . Le. Searching for activation functions, 2017. URLhttps://arxiv.org/abs/1710.05941

  53. [53]

    Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T

    Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains, 2020. URL https: //arxiv.org/abs/2006.10739

  54. [54]

    Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

  55. [55]

    Habitat: A Platform for Embodied AI Research

    Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A Platform for Embodied AI Research. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

  56. [56]

    Habitat 2.0: Training home assistants to rearrange their habitat

    Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir V ondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Training home assist...

  57. [57]

    Habitat 3.0: A co-habitat for humans, avatars and robots, 2023

    Xavi Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Ruslan Partsey, Jimmy Yang, Ruta Desai, Alexander William Clegg, Michal Hlavac, Tiffany Min, Theo Gervet, Vladimir V ondrus, Vincent-Pierre Berges, John Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakr- ishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai...

  58. [58]

    D4rl: Datasets for deep data-driven reinforcement learning, 2021

    Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning, 2021. URL https://arxiv.org/abs/2004. 07219

  59. [59]

    VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

    Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, and Amy Zhang. Vip: Towards universal visual reward and representation via value-implicit pre-training, 2023. URLhttps://arxiv.org/abs/2210.00030. 13

  60. [60]

    E. W. Hobson. On the fundamental lemma of the calculus of variations, and on some related theorems.Proceedings of the London Mathematical Society, s2-11(1):17–28, 1913. doi: https: //doi.org/10.1112/plms/s2-11.1.17. URL https://londmathsoc.onlinelibrary.wiley. com/doi/abs/10.1112/plms/s2-11.1.17

  61. [61]

    f-micl: Understanding and generalizing infonce-based contrastive learning, 2024

    Yiwei Lu, Guojun Zhang, Sun Sun, Hongyu Guo, and Yaoliang Yu. f-micl: Understanding and generalizing infonce-based contrastive learning, 2024. URL https://arxiv.org/abs/2402. 10150

  62. [62]

    Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

    Zhuang Ma and Michael Collins. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency, 2018. URL https://arxiv.org/ abs/1809.01812

  63. [63]

    Kuffner and S.M

    J.J. Kuffner and S.M. LaValle. Rrt-connect: An efficient approach to single-query path planning. InProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), volume 2, pages 995–1001 vol.2, 2000. doi: 10.1109/ROBOT.2000.844730

  64. [64]

    On the Spectral Bias of Neural Networks

    Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks, 2019. URL https://arxiv.org/abs/1806.08734

  65. [65]

    Infonce induces gaussian distribu- tion, 2026

    Roy Betser, Eyal Gofer, Meir Yossef Levi, and Guy Gilboa. Infonce induces gaussian distribu- tion, 2026. URLhttps://arxiv.org/abs/2602.24012. 14 A Conditioned Contrastive Theory A.1 Density Encoding in Learned Logits The Gauss-Markov inference-driven planning method outlined in Section 3.2 requires learned logits to encode the probability ratio expressed ...