Mechanisms of Misgeneralization in Physical Sequence Modeling
Pith reviewed 2026-05-21 07:39 UTC · model grok-4.3
The pith
Generative sequence models produce individually plausible physical trajectories while distorting the aggregate distribution over quantities like distance or energy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When generative sequence models are trained on demonstrations curated to achieve specific distributions over physical quantities, the models can still generate trajectories that individually appear valid yet collectively produce an incorrect distribution over those quantities. This physical misgeneralization arises because local errors typical of the model class propagate through the physical measurement to shift the recovered distribution. The authors quantify the errors with a data deviation kernel that predicts which parts of the distribution gain or lose probability mass, as validated on synthetic tasks and on maze navigation and double-pendulum examples.
What carries the argument
The data deviation kernel, which estimates local sequence prediction errors to anticipate how they bias the aggregate distribution over a physical quantity when the errors are integrated along each trajectory.
If this is right
- In maze navigation the distribution of travel distances will show systematic over- or under-representation of particular lengths.
- In double-pendulum motion the distribution of mechanical energies will be shifted away from the training distribution.
- The kernel can be used in advance to identify which physical quantities are most likely to be misgeneralized.
- A kernel-informed intervention can structurally reduce the distribution shift without requiring changes to the base model architecture.
Where Pith is reading between the lines
- The same error-propagation mechanism could appear in any setting where a model is trained to match aggregate statistics that are obtained by integrating local predictions, such as cumulative cost or total reward.
- Directly incorporating the data deviation kernel into the training objective might enforce distribution matching on the physical quantity rather than only on individual steps.
- The findings suggest that simply increasing model capacity or data volume may not eliminate the misgeneralization if the local error structure remains unchanged.
Load-bearing premise
Local errors made by the model when predicting the next step are systematic enough that, once integrated through the physical quantity calculation, they produce a consistent shift in the recovered distribution.
What would settle it
Train a model on a synthetic task while artificially suppressing the local errors identified by the kernel and check whether the predicted distribution shift over the physical quantity disappears or is substantially reduced.
Figures
read the original abstract
Generative sequence models are often trained to plan motion in physical domains, from robotics to mechanical simulations. When constructing a dataset to train such a model, engineers may curate demonstrations to specify how trajectories should be distributed over a physical quantity like travel distance or mechanical energy. For example, a roboticist building a maze navigation agent might choose demonstrations whose travel distances cover a fixed range uniformly, hoping to constrain the agent's expected power usage. We find that standard deep learning can violate this intent: each generated trajectory can seem plausible on its own, but the aggregate distribution over the physical quantity is wrong. We call this failure physical misgeneralization, and develop an account of its mechanism. Using controlled synthetic tasks, we show that physical misgeneralization arises when local errors typical of the model class propagate through the physical measurement to shift the recovered distribution. We estimate these errors with a data deviation kernel, and we use it to predict which physical quantities gain or lose mass in both our synthetic and more applied maze navigation and double-pendulum motion tasks. Finally, our mechanistic interpretation helps identify which mitigation strategies are structurally promising, and we use it to propose a kernel-informed intervention.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces 'physical misgeneralization' as a failure mode in generative sequence models for physical domains (e.g., robotics, mechanical simulation). While individual generated trajectories may appear plausible, the aggregate distribution over a physical quantity (travel distance, mechanical energy) deviates from the intended distribution encoded in the training demonstrations. The central claim is that this arises mechanistically when local errors typical of the model class propagate through the physical measurement function; the authors introduce a data deviation kernel to estimate these errors and predict which quantities gain or lose mass. They validate the account on controlled synthetic tasks and apply it to maze navigation and double-pendulum motion, then use the mechanistic view to propose a kernel-informed intervention.
Significance. If the data deviation kernel isolates causal propagation of local errors rather than post-hoc correlation with observed shifts, the work would be significant for understanding and mitigating unintended distribution shifts in learned physical models. Such shifts matter for downstream properties like power consumption or safety in robotics. The explicit link from model-class errors to aggregate statistics, together with the proposed intervention, could inform training practices beyond standard likelihood maximization.
major comments (3)
- [§3.2] §3.2 (Data deviation kernel definition): the kernel is computed from model outputs on the same trajectories whose physical quantities are later measured to obtain the observed distribution shift. This raises the possibility that the kernel is fitted to the very quantity it is claimed to predict, undermining the claim that it isolates the propagation mechanism from independent error statistics.
- [§4.1–4.2] §4.1–4.2 (Synthetic task results): the reported predictive accuracy of the kernel for mass shifts is shown after the full distributions have been measured; it is not demonstrated that the kernel produces accurate forecasts on held-out trajectories or before the aggregate statistics are inspected. This weakens the evidence that local errors propagate causally rather than the kernel simply capturing the observed aggregate effect.
- [§5] §5 (Applied tasks: maze and double-pendulum): the match between kernel predictions and observed shifts is presented qualitatively. Quantitative metrics (e.g., correlation between predicted and actual mass shifts, or out-of-sample prediction error) are needed to establish that the mechanism generalizes beyond the synthetic setting where other factors such as optimization dynamics or sequence length could produce similar shifts.
minor comments (2)
- [Figure 3] Figure 3: the visualization of kernel-estimated versus observed distributions would benefit from an explicit legend distinguishing the two and from error bars on the kernel predictions.
- [Notation] Notation: the symbol for the physical measurement function is introduced without a clear forward reference to its definition in the methods; a single consolidated notation table would improve readability.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on the data deviation kernel and the empirical validation of our results. We have made revisions to address the concerns raised and provide point-by-point responses below.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Data deviation kernel definition): the kernel is computed from model outputs on the same trajectories whose physical quantities are later measured to obtain the observed distribution shift. This raises the possibility that the kernel is fitted to the very quantity it is claimed to predict, undermining the claim that it isolates the propagation mechanism from independent error statistics.
Authors: We agree that the original presentation could be interpreted as using the same trajectories for both kernel computation and distribution measurement. In the revised manuscript, we clarify that the kernel is constructed from local per-step deviations, which are independent of the aggregate physical quantity. Furthermore, we now report results where the kernel is fit on a separate set of model-generated trajectories and then used to predict shifts on the evaluation trajectories, demonstrating that it captures the propagation mechanism without direct access to the target distribution. revision: yes
-
Referee: [§4.1–4.2] §4.1–4.2 (Synthetic task results): the reported predictive accuracy of the kernel for mass shifts is shown after the full distributions have been measured; it is not demonstrated that the kernel produces accurate forecasts on held-out trajectories or before the aggregate statistics are inspected. This weakens the evidence that local errors propagate causally rather than the kernel simply capturing the observed aggregate effect.
Authors: The current results in §4.1–4.2 do indeed present the kernel predictions in conjunction with the measured distributions. To strengthen the causal claim, we have added experiments in the revision showing the kernel's out-of-sample predictive performance: the kernel is estimated from error statistics on one set of trajectories and then applied to forecast the distribution shifts on completely held-out trajectories. These new results are now included in §4.1–4.2. revision: yes
-
Referee: [§5] §5 (Applied tasks: maze and double-pendulum): the match between kernel predictions and observed shifts is presented qualitatively. Quantitative metrics (e.g., correlation between predicted and actual mass shifts, or out-of-sample prediction error) are needed to establish that the mechanism generalizes beyond the synthetic setting where other factors such as optimization dynamics or sequence length could produce similar shifts.
Authors: We acknowledge that the applied results in §5 were presented qualitatively. In the revised version, we have added quantitative evaluations, including Pearson correlation coefficients between the kernel-predicted mass shifts and the observed shifts, as well as out-of-sample prediction errors for both the maze navigation and double-pendulum tasks. These metrics are reported in the updated §5 and support the generalization of the proposed mechanism. revision: yes
Circularity Check
No significant circularity in the derivation chain.
full rationale
The paper develops its account of physical misgeneralization from controlled synthetic tasks demonstrating propagation of local model errors through physical measurement functions to produce aggregate distribution shifts. The data deviation kernel serves as an estimator for those errors and is applied to predict mass shifts across both the synthetic controls and separate applied tasks (maze navigation, double-pendulum). Because the synthetic tasks provide independent verification of the mechanism and the applied tasks function as external benchmarks, the central claim retains content independent of any fitted quantities. No self-citation chains, self-definitional reductions, or renamings of known results appear in the provided description, and the derivation remains self-contained against the stated experimental controls.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sumukh K Aithal, Pratyush Maini, Zachary C. Lipton, and J. Zico Kolter. Understanding hallucinations in diffusion models through mode interpolation. In Advances in Neural Information Processing Systems, volume 37, pages 134614--134644. Curran Associates, Inc., 2024. doi:10.52202/079017-4278. URL https://proceedings.neurips.cc/paper_files/paper/2024/hash/f...
-
[2]
Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, and Pulkit Agrawal. Is conditional generative modeling all you need for decision-making? In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sP1fo2K9DFG
work page 2023
-
[3]
wav2vec 2.0: A framework for self-supervised learning of speech representations
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems, volume 33, pages 12449--12460. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f...
work page 2020
-
[4]
Duncan Wadsworth, and Hanna Wallach
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, W. Duncan Wadsworth, and Hanna Wallach. Designing disaggregated evaluations of ai systems: Choices, considerations, and tradeoffs. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 368--378. ACM, 2021. doi:10.1145/346170...
-
[5]
Richard Bellman and Karl J. str \"o m. On structural identifiability. Mathematical Biosciences, 7 0 (3--4): 0 329--339, 1970. doi:10.1016/0025-5564(70)90132-X
-
[6]
Scheduled sampling for sequence prediction with recurrent neural networks
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/hash/e995f98d56967d946471af29d7bf99f1-Abstract.html
work page 2015
-
[7]
doi: 10.1080/01621459.2017.1285773
David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112 0 (518): 0 859--877, 2017. doi:10.1080/01621459.2017.1285773
-
[8]
A mechanistic analysis of a transformer trained on a symbolic multi-step reasoning task
Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda, and Christian Bartelt. A mechanistic analysis of a transformer trained on a symbolic multi-step reasoning task. In Findings of the Association for Computational Linguistics: ACL 2024, pages 4082--4102, 2024. doi:10.18653/v1/2024.findings-acl.242
-
[9]
Jake Bruce, Michael D. Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Maria Elisabeth Bechtle, Feryal Behbahani, Stephanie C. Y. Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nand...
work page 2024
-
[10]
Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, and Felix Hill. Data distributional properties drive emergent in-context learning in transformers. In Advances in Neural Information Processing Systems, volume 35, pages 18878--18891. Curran Associates, Inc., 2022. URL https://proceedi...
work page 2022
-
[11]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44 0 (10--11): 0 1684--1704, 2025. doi:10.1177/02783649241273668
-
[12]
Learning Constraints from Demonstrations
Glen Chou, Dmitry Berenson, and Necmiye Ozay. Learning constraints from demonstrations, 2018. URL https://arxiv.org/abs/1812.07084
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp
Yusuf Umut Ciftci, Darren Chiu, Zeyuan Feng, Gaurav S. Sukhatme, and Somil Bansal. SAFE-GIL : SAFE ty guided imitation learning for robotic systems. In IEEE International Conference on Robotics and Automation, pages 3559--3566, 2025. doi:10.1109/ICRA55743.2025.11128298
-
[14]
arXiv preprint arXiv:2003.04630 , year=
Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks, 2020. URL https://arxiv.org/abs/2003.04630
-
[15]
Exploiting the signal-leak bias in diffusion models
Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine S \"u sstrunk, and Radhakrishna Achanta. Exploiting the signal-leak bias in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4025--4034, 2024
work page 2024
-
[16]
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning, 2020. URL https://arxiv.org/abs/2004.07219
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[17]
Sanketi, Dorsa Sadigh, Chelsea Finn, and Sergey Levine
Dibya Ghosh, Homer Rich Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Quan Vuong, Ted Xiao, Pannag R. Sanketi, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. In Proceedings of Robotics: Science and Systems...
-
[18]
Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/hash/26cd8ecadce0d4efd6cc8a8725cbd1f8-Abstract.html
work page 2019
-
[19]
Robot data curation with mutual information estimators, 2025
Joey Hejna, Suvir Mirchandani, Ashwin Balakrishna, Annie Xie, Ayzaan Wahid, Jonathan Tompson, Pannag Sanketi, Dhruv Shah, Coline Devin, and Dorsa Sadigh. Robot data curation with mutual information estimators, 2025
work page 2025
-
[20]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview.net/forum?id=qw8AKxfYbI
work page 2021
-
[21]
Matthew D. Hoffman and Matthew J. Johnson. ELBO surgery: Yet another way to carve up the variational evidence lower bound. In NIPS 2016 Workshop on Advances in Approximate Bayesian Inference, 2016. URL https://approximateinference.org/archives/2016/accepted/HoffmanJohnson2016.pdf
work page 2016
-
[22]
Keith Ito and Linda Johnson. The LJ speech dataset. https://keithito.com/LJ-Speech-Dataset/, 2017
work page 2017
-
[23]
Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 9902--9915. PMLR, 2022. URL https://proceedings.mlr.press/v162/janner22a.html
work page 2022
-
[24]
T2m-gpt: Generating human motion from textual descriptions with discrete representations
Chiyu Max Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, and Dragomir Anguelov. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9644--9653, 2023. doi:10.1109/CVPR52729.2023.00930
-
[25]
2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp
Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In IEEE International Conference on Robotics and Automation, pages 16923--16930, 2025. doi:10.1109/ICRA55743.2025.11127809
-
[26]
Generative modeling of molecular dynamics trajectories
Bowen Jing, Hannes St \"a rk, Tommi Jaakkola, and Bonnie Berger. Generative modeling of molecular dynamics trajectories. In Advances in Neural Information Processing Systems, volume 37, pages 40534--40564. Curran Associates, Inc., 2024. doi:10.52202/079017-1282. URL https://proceedings.neurips.cc/paper_files/paper/2024/hash/478b06f60662d3cdc1d4f15d4587173...
-
[27]
Jari P. Kaipio and Erkki Somersalo. Statistical and Computational Inverse Problems. Applied Mathematical Sciences. Springer, 2005. doi:10.1007/b138659
-
[28]
An analytic theory of creativity in convolutional diffusion models
Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 28795--28831. PMLR, 2025. URL https://proceedings.mlr.press/v267/kamb25a.html
work page 2025
-
[29]
Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014
work page 2014
-
[30]
HiFi - GAN : Generative adversarial networks for efficient and high fidelity speech synthesis
Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. HiFi - GAN : Generative adversarial networks for efficient and high fidelity speech synthesis. In Advances in Neural Information Processing Systems, volume 33, pages 17022--17033. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/hash/c5d736809766d46260d816d8dbc9eb44-Abst...
work page 2020
-
[31]
Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C
Alex M. Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. Professor forcing: A new algorithm for training recurrent networks. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/hash/16026d60ff9b54410b3435b403afd226-A...
work page 2016
-
[32]
Hopkins, David Bau, Fernanda Viegas, Hanspeter Pfister, and Martin Wattenberg
Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viegas, Hanspeter Pfister, and Martin Wattenberg. Emergent world representations: Exploring a sequence model trained on a synthetic task. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=DeG07_TcZvT
work page 2023
-
[33]
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, and Hidenori Tanaka. A percolation model of emergence: Analyzing transformers trained on a formal language. In International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=0pLCDJVVRD
work page 2025
-
[34]
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders, 2016. URL https://arxiv.org/abs/1511.05644
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[35]
Mimicgen: A data generation system for scalable robot learning using human demonstrations
Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, pages 1820--1864. PMLR, 2023. URL htt...
work page 2023
-
[36]
Language model evaluation beyond perplexity
Clara Meister and Ryan Cotterell. Language model evaluation beyond perplexity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5328--5339, 2021. doi:10.18653/v1/2021.acl-long.414
-
[37]
Reliable fidelity and diversity metrics for generative models
Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, and Jaejun Yoo. Reliable fidelity and diversity metrics for generative models. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7176--7185. PMLR, 2020. URL https://proceedings.mlr.press/v119/naeem20a.html
work page 2020
-
[38]
Progress measures for grokking via mechanistic interpretability
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. In International Conference on Learning Representations, 2023
work page 2023
-
[39]
Representation shattering in transformers: A synthetic study with knowledge editing
Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, and Ekdeep Singh Lubana. Representation shattering in transformers: A synthetic study with knowledge editing. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 46525--46553. PMLR, 2025. URL https://proc...
work page 2025
-
[40]
Iclr: In-context learning of representations
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Maya Okawa, Kento Nishi, Martin Wattenberg, and Hidenori Tanaka. Iclr: In-context learning of representations. In International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=pXlmOmlHJZ
work page 2025
-
[41]
Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. FiLM : Visual reasoning with a general conditioning layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32 0 (1), 2018. doi:10.1609/aaai.v32i1.11671
-
[42]
Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning. Nature, 637 0 (8044): 0 84--90, 2025. doi:10.1038/s41586-024-08252-9
-
[43]
Speechbrain: A general-purpose speech toolkit, 2021
Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, Fran c ois Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, and Yoshua Bengio. Speechbrain: A general-purpo...
-
[44]
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy. The mechanistic basis of data dependence and abrupt learning in an in-context classification task. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=aN4Jf6Cx69
work page 2024
-
[45]
St \'e phane Ross, Geoffrey J. Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 627--635. PMLR, 2011. URL https://proceedings.ml...
work page 2011
-
[46]
Generalization in generation: A closer look at exposure bias
Florian Schmidt. Generalization in generation: A closer look at exposure bias. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 157--167, 2019. doi:10.18653/v1/D19-5616
-
[47]
Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions
Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerry-Ryan, et al. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4779--4783, 2018. doi:10.1109...
-
[48]
Selective underfitting in diffusion models, 2025
Kiwhan Song, Jaeyeon Kim, Sitan Chen, Yilun Du, Sham Kakade, and Vincent Sitzmann. Selective underfitting in diffusion models, 2025. URL https://arxiv.org/abs/2510.01378
-
[49]
Inverse Problem Theory and Methods for Model Parameter Estimation
Albert Tarantola. Inverse Problem Theory and Methods for Model Parameter Estimation. Society for Industrial and Applied Mathematics, 2005. doi:10.1137/1.9780898717921
-
[50]
Andrei N. Tikhonov and Vasiliy Y. Arsenin. Solutions of Ill-Posed Problems. Winston, Washington, D.C., 1977
work page 1977
-
[51]
Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HkL7n1-0b
work page 2018
-
[52]
Swing-by dynamics in concept learning and compositional generalization
Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, and Hidenori Tanaka. Swing-by dynamics in concept learning and compositional generalization. In International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=s1zO0YBEF8
work page 2025
-
[53]
Decision stacks: Flexible reinforcement learning via modular generative models
Siyan Zhao and Aditya Grover. Decision stacks: Flexible reinforcement learning via modular generative models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 80306--80323. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/...
work page 2023
-
[54]
Advances in Neural Information Processing Systems , volume=
Denoising Diffusion Probabilistic Models , author=. Advances in Neural Information Processing Systems , volume=
-
[55]
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , url=. 2011.13456 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[56]
Proceedings of the 39th International Conference on Machine Learning , pages=
Planning with Diffusion for Flexible Behavior Synthesis , author=. Proceedings of the 39th International Conference on Machine Learning , pages=
-
[57]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. The International Journal of Robotics Research , volume=. 2025 , doi=. 2303.04137 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Advances in Neural Information Processing Systems , volume=
Hamiltonian Neural Networks , author=. Advances in Neural Information Processing Systems , volume=
- [59]
-
[60]
International Conference on Learning Representations , year=
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task , author=. International Conference on Learning Representations , year=. 2210.13382 , archivePrefix=
-
[61]
Progress measures for grokking via mechanistic interpretability
Progress Measures for Grokking via Mechanistic Interpretability , author=. International Conference on Learning Representations , year=. 2301.05217 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Physics of Language Models: Part 1, Learning Hierarchical Language Structures , author=. 2025 , eprint=
work page 2025
-
[63]
Advances in Neural Information Processing Systems , volume=
Data Distributional Properties Drive Emergent In-Context Learning in Transformers , author=. Advances in Neural Information Processing Systems , volume=
-
[64]
International Conference on Learning Representations , year=
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language , author=. International Conference on Learning Representations , year=. 2408.12578 , archivePrefix=
-
[65]
International Conference on Learning Representations , year=
Swing-by Dynamics in Concept Learning and Compositional Generalization , author=. International Conference on Learning Representations , year=. 2410.08309 , archivePrefix=
-
[66]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=. 2024 , doi=
work page 2024
-
[67]
Proceedings of the 41st International Conference on Machine Learning , pages=
Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model , author=. Proceedings of the 41st International Conference on Machine Learning , pages=. 2024 , url=. 2402.07757 , archivePrefix=
-
[68]
International Conference on Learning Representations , year=
The Mechanistic Basis of Data Dependence and Abrupt Learning in an In-Context Classification Task , author=. International Conference on Learning Representations , year=. 2312.03002 , archivePrefix=
-
[69]
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task , author=. Advances in Neural Information Processing Systems , volume=. 2023 , url=. 2310.09336 , archivePrefix=
-
[70]
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing , author=. Proceedings of the 42nd International Conference on Machine Learning , pages=. 2025 , url=. 2410.17194 , archivePrefix=
-
[71]
International Conference on Learning Representations , year=
ICLR: In-Context Learning of Representations , author=. International Conference on Learning Representations , year=
-
[72]
There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=
work page 2026
-
[73]
D4RL: Datasets for Deep Data-Driven Reinforcement Learning , author=. 2020 , eprint=
work page 2020
-
[74]
Advances in Neural Information Processing Systems , editor =
Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models , author =. Advances in Neural Information Processing Systems , editor =. 2023 , url =
work page 2023
-
[75]
Advances in Neural Information Processing Systems , volume=
Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans , author=. Advances in Neural Information Processing Systems , volume=. 2023 , url=. 2310.19427 , archivePrefix=
-
[76]
VH-Diffuser: Variable Horizon Diffusion Planner for Time-Aware Goal-Conditioned Trajectory Planning , author=. 2025 , eprint=
work page 2025
-
[77]
Un- derstanding hallucinations in diffusion mod- els through mode interpolation.URL https://arxiv
Understanding Hallucinations in Diffusion Models through Mode Interpolation , author=. Advances in Neural Information Processing Systems , volume=. 2024 , doi=. 2406.09358 , archivePrefix=
-
[78]
International Conference on Learning Representations , year=
Don't Play Favorites: Minority Guidance for Diffusion Models , author=. International Conference on Learning Representations , year=. 2301.12334 , archivePrefix=
- [79]
-
[80]
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , month=
How I Met Your Bias: Investigating Bias Amplification in Diffusion Models , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , month=. 2026 , doi=. 2512.20233 , archivePrefix=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.