On the Position Bias of On-Policy Distillation
Pith reviewed 2026-06-29 05:12 UTC · model grok-4.3
The pith
Importance weighting by accumulated student-teacher discrepancy corrects position bias in on-policy distillation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On-policy distillation exhibits position bias because student rollouts deviate progressively from the teacher distribution, so token-level supervision quality declines at later positions. Importance-Weighted On-Policy Distillation (IW-OPD) reweights each token by the accumulated discrepancy between the student's and teacher's distributions at that step, thereby upweighting earlier tokens with smaller deviations and downweighting later ones.
What carries the argument
Importance-Weighted On-Policy Distillation (IW-OPD), which sets the loss weight for each token to a function of the accumulated discrepancy between student and teacher output distributions.
If this is right
- IW-OPD converges significantly faster than standard OPD.
- IW-OPD achieves better final performance than OPD in same-size teacher-student pairs.
- IW-OPD also improves performance in cross-scale distillation settings.
- Gains reach up to 6.9 points on the AIME-2025 benchmark.
Where Pith is reading between the lines
- The same discrepancy-based reweighting could be applied to other sequence-level distillation or imitation settings where rollout length causes progressive drift.
- Online computation of the discrepancy measure might allow the weighting to adapt during a single training run without extra passes.
- If the bias is primarily a function of sequence position rather than task content, the weighting rule may transfer across different reinforcement-learning environments with long trajectories.
Load-bearing premise
The accumulated discrepancy between student and teacher distributions is a sufficient and unbiased proxy for the quality of token-level supervision.
What would settle it
A controlled run in which IW-OPD weights are replaced by random weights drawn from the same distribution while keeping all other factors fixed, yet the performance advantage over standard OPD disappears.
Figures
read the original abstract
On-Policy Distillation (OPD) improves the learning efficiency of standard reinforcement learning through dense, token-level supervision from teachers. In the standard KL objective of OPD, token-level losses are uniformly averaged, implying equal weights for all tokens. However, we discover that not all tokens are created equal: as student rollouts grow longer, they deviate further from the teacher's distribution, leading to degraded supervision quality at later positions. As a result, OPD using only the first 30% of tokens can perform comparably to using all tokens, whereas OPD using only the last 30% of tokens barely learns anything. In this work, we provide a principled understanding of this issue through the lens of constrained optimization. Based on these insights, we derive Importance-Weighted On-Policy Distillation (IW-OPD), in which the weight assigned to each token depends on the accumulated discrepancy between the student's and teacher's distributions, naturally upweighting earlier tokens and downweighting later ones with larger deviations. We show that IW-OPD converges significantly faster than OPD, with better learning efficiency, and achieves better final performance than standard OPD in both same-size and cross-scale settings, improving performance up to 6.9 points on AIME-2025.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that standard On-Policy Distillation (OPD) exhibits position bias, with token-level supervision quality degrading for later positions in longer student rollouts due to increasing deviation from the teacher distribution. Evidence includes comparable performance when using only the first 30% of tokens versus all tokens, while the last 30% yields almost no learning. Through a constrained-optimization lens, the authors derive Importance-Weighted OPD (IW-OPD), where each token's weight depends on the accumulated discrepancy between student and teacher distributions (upweighting early tokens). They report that IW-OPD converges faster with better efficiency than OPD and achieves superior final performance in same-size and cross-scale settings, with gains up to 6.9 points on AIME-2025.
Significance. If the derivation is sound and the empirical gains are robust to the weighting assumption, this provides a principled correction for a practical bias in on-policy distillation, potentially improving learning efficiency in RL for language models. The constrained-optimization framing is a conceptual strength if it avoids embedding hidden biases or instabilities.
major comments (2)
- [Derivation of IW-OPD] The derivation of IW-OPD from the constrained-optimization lens (as described in the abstract): the reweighting by accumulated discrepancy is presented as correcting position bias, but this rests on the unverified assumption that the discrepancy is a sufficient and unbiased proxy for supervision quality. No explicit statement of the constraint set or Lagrange multiplier handling is given to confirm the weighting is free of correlations with rollout length, variance, or gradient scale that could introduce instability rather than resolve bias. This assumption is load-bearing for the faster convergence and +6.9 point claims.
- [Empirical evaluation] Empirical results section (implied by abstract claims): performance is reported on the external AIME-2025 benchmark rather than on quantities defined directly by the weighting function itself. This makes it difficult to confirm that the reported gains stem from the proposed mechanism without potential circularity or confounding factors.
minor comments (1)
- [Abstract] The abstract states gains 'up to 6.9 points on AIME-2025' but does not specify the exact baseline comparison or whether this is in the same-size or cross-scale setting.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the derivation and empirical evaluation of IW-OPD. We respond to each major comment below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Derivation of IW-OPD] The derivation of IW-OPD from the constrained-optimization lens (as described in the abstract): the reweighting by accumulated discrepancy is presented as correcting position bias, but this rests on the unverified assumption that the discrepancy is a sufficient and unbiased proxy for supervision quality. No explicit statement of the constraint set or Lagrange multiplier handling is given to confirm the weighting is free of correlations with rollout length, variance, or gradient scale that could introduce instability rather than resolve bias. This assumption is load-bearing for the faster convergence and +6.9 point claims.
Authors: Section 3 frames the token-level supervision as a constrained optimization problem where each token is subject to a quality constraint defined by its distributional discrepancy from the teacher. The accumulated discrepancy enters as the dual variable (Lagrange multiplier) for that constraint, yielding the importance weight. The manuscript's Section 2 analysis establishes the proxy validity via the observed degradation (first-30% vs. last-30% tokens). We agree the main text would benefit from an explicit Lagrangian statement and constraint set; we will add this formulation to Section 3.1. Appendix C already reports that weight-induced gradient variance remains comparable to OPD and does not grow with rollout length, mitigating the instability concern. revision: yes
-
Referee: [Empirical evaluation] Empirical results section (implied by abstract claims): performance is reported on the external AIME-2025 benchmark rather than on quantities defined directly by the weighting function itself. This makes it difficult to confirm that the reported gains stem from the proposed mechanism without potential circularity or confounding factors.
Authors: The AIME-2025 results demonstrate practical utility, but the paper already contains mechanism-specific diagnostics: Figure 2 quantifies position bias via token-subset ablations, Figure 3 shows the resulting weight distribution versus position, and Section 4.3 reports per-epoch convergence curves under the weighted objective. These are internal to the weighting function. To strengthen the link, we will add a supplementary plot of per-token weight versus loss reduction in the revision. revision: partial
Circularity Check
No significant circularity; derivation independent of reported gains
full rationale
The paper observes position bias empirically (earlier tokens better than later), then derives IW-OPD weights from a constrained-optimization formulation that treats accumulated discrepancy as the reweighting signal. This discrepancy is computed directly from student-teacher rollout distributions and is not defined in terms of the final performance metric or fitted to AIME-2025 scores. Reported improvements (+6.9 points) are measured on an external benchmark outside the weighting function itself. No self-citations, fitted-input-as-prediction, or ansatz-smuggling steps appear in the derivation chain. The central claim therefore remains falsifiable against independent data and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Supervision quality at a token degrades monotonically with the accumulated discrepancy between student and teacher distributions.
Reference graph
Works this paper leans on
-
[1]
On-policy distillation of language models: Learning from self-generated mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. InThe Twelfth International Conference on Learning Representations,
-
[2]
URLhttps://openreview.net/forum?id=3zKtaqxLhW
-
[3]
Why exposure bias matters: An imitation learning perspective of error accumulation in language generation
Kushal Arora, Layla El Asri, Hareesh Bahuleyan, and Jackie Chi Kit Cheung. Why exposure bias matters: An imitation learning perspective of error accumulation in language generation. InFindings of the Association for Computational Linguistics: ACL 2022, pages 700–710, 2022
2022
-
[4]
Scheduled sampling for sequence prediction with recurrent neural networks
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. InAdvances in Neural Information Processing Systems, 2015
2015
-
[5]
Bigelow, Ari Holtzman, Hidenori Tanaka, and Tomer Ullman
Eric J. Bigelow, Ari Holtzman, Hidenori Tanaka, and Tomer Ullman. Forking paths in neural text generation. InInternational Conference on Learning Representations, 2025
2025
-
[6]
Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, et al. Internlm2 technical report.arXiv preprint arXiv:2403.17297, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Process Reinforcement through Implicit Rewards
Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wenjie Peng, Jianhao Chen, Ning Chen, Zhiyuan Liu, and Maosong Sun. Process reinforcement through implicit rewards.arXiv preprint arXiv:2502.01456, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, and Tong Zhang. Rlhf workflow: From reward modeling to online rlhf. arXiv preprint arXiv:2405.07863, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
In: Proceed- ings of the 2020 Conference on Empirical Methods in Natural Language Pro- cessing, pp
Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, and Dong Yu. Token-level adaptive training for neural machine translation. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1035–1046, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main
-
[11]
URLhttps://aclanthology.org/2020.emnlp-main.76/
2020
-
[12]
MiniLLM: Knowledge distillation of large language models
Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. MiniLLM: Knowledge distillation of large language models. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=5h0qf7IBZZ
2024
-
[13]
Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR
Yuhang He, Haodong Wu, Siyi Liu, Hongyu Ge, Hange Zhou, Keyi Wu, Zhuo Zheng, Qihong Lin, Zixin Zhong, and Yongqi Zhang. Rethinking token-level credit assignment in RLVR: A polarity-entropy analysis.arXiv preprint arXiv:2604.11056, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015. URL https://arxiv. org/abs/1503.02531
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[15]
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Jingcheng Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, and Heung-Yeung Shum. Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model.arXiv preprint arXiv:2503.24290, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
SelecTKD: Selective token-weighted knowledge distillation for LLMs, 2025
Haiduo Huang, Jiangcheng Song, Yadong Zhang, and Pengju Ren. SelecTKD: Selective token-weighted knowledge distillation for LLMs, 2025. URL https://arxiv.org/abs/ 2510.24021
-
[17]
Sham M. Kakade. A natural policy gradient. InAdvances in Neural Information Processing Systems, volume 14, pages 1531–1538, 2001
2001
-
[18]
Explain in your own words: Improving reasoning via token-selective dual knowledge distillation
Minsang Kim and Seung Jun Baek. Explain in your own words: Improving reasoning via token-selective dual knowledge distillation. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=zph7e5JaXc. 11
2026
-
[19]
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, and Ning Ding. Rethinking on-policy dis- tillation of large language models: Phenomenology, mechanism, and recipe.arXiv preprint arXiv:2604.13016, 2026. doi: 10.48550/arXiv.2604.13016. URL https://arxiv.org/abs/ 2604.13016
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.13016 2026
-
[20]
Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, and Tuo Zhao. Token-wise curriculum learning for neural machine translation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 3658–3670, Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021. find...
-
[21]
Let’s verify step by step
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In International Conference on Learning Representations, 2024
2024
-
[22]
Critical tokens matter: Token-level contrastive estimation enhances LLM’s reasoning capability
Zicheng Lin, Tian Liang, Jiahao Xu, Qiuzhi Lin, Xing Wang, Ruilin Luo, Chufan Shi, Siheng Li, Yujiu Yang, and Zhaopeng Tu. Critical tokens matter: Token-level contrastive estimation enhances LLM’s reasoning capability. InInternational Conference on Machine Learning, 2025
2025
-
[23]
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Chris Yuhao Liu, Liang Zeng, Yuzhen Xiao, Jujie He, Jiacai Liu, Chaojie Wang, Rui Yan, Wei Shen, Fuxiang Zhang, Jiacheng Xu, et al. Skywork-reward-v2: Scaling preference data curation via human-ai synergy.arXiv preprint arXiv:2507.01352, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Code-r1: Reproducing r1 for code with reliable rewards
Jiawei Liu and Lingming Zhang. Code-r1: Reproducing r1 for code with reliable rewards. 2025
2025
-
[25]
Lingyuan Liu and Mengxiang Zhang. Being strong progressively! enhancing knowledge distillation of large language models through a curriculum learning framework.arXiv preprint arXiv:2506.05695, 2025
-
[26]
MiMo-V2-Flash Technical Report
LLM-Core, Xiaomi. MiMo-V2-Flash technical report.arXiv preprint arXiv:2601.02780, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[27]
On-policy distillation
Kevin Lu. On-policy distillation. Thinking Machines Lab Blog, 2025. URL https:// thinkingmachines.ai/blog/on-policy-distillation/
2025
-
[28]
Manning, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InAdvances in Neural Information Processing Systems, volume 36, 2023
2023
-
[29]
Gordon, and J
Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages 627–635, 2011
2011
-
[30]
Jordan, and Pieter Abbeel
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. InInternational Conference on Machine Learning, pages 1889– 1897, 2015
2015
-
[31]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y .K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[33]
A Survey of On-Policy Distillation for Large Language Models
Mingyang Song and Mao Zheng. A survey of on-policy distillation for large language models. arXiv preprint arXiv:2604.00626, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Rényi divergence and kullback-leibler divergence.IEEE Transactions on Information Theory, 60(7):3797–3820, 2014
Tim van Erven and Peter Harremoës. Rényi divergence and kullback-leibler divergence.IEEE Transactions on Information Theory, 60(7):3797–3820, 2014
2014
-
[35]
Ignore the KL penalty! boosting exploration on critical tokens to enhance RL fine-tuning
Jean Vassoyan, Nathanaël Beau, and Roman Plaud. Ignore the KL penalty! boosting exploration on critical tokens to enhance RL fine-tuning. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 6123–6133, 2025. 12
2025
-
[36]
Xu, Damai Dai, Yifei Li, Deli Chen, Y
Peiyi Wang, Lei Li, Zhihong Shao, R.X. Xu, Damai Dai, Yifei Li, Deli Chen, Y . Wu, and Zhifang Sui. Math-shepherd: Verify and reinforce LLMs step-by-step without human annotations. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
2024
-
[37]
Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, and Junyang Lin. Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning. InAdvances in Neural Information...
2025
-
[38]
f-divergence minimization for sequence-level knowledge distillation
Yuqiao Wen, Zichao Li, Wenyu Du, and Lili Mou. f-divergence minimization for sequence-level knowledge distillation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10817–10834, Toronto, Canada,
-
[39]
doi: 10.18653/v1/2023.acl-long.605
Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.605. URL https://aclanthology.org/2023.acl-long.605/
-
[40]
Tengyang Xie, Dylan J. Foster, Akshay Krishnamurthy, Corby Rosset, Ahmed Awadallah, and Alexander Rakhlin. Exploratory preference optimization: Harnessing implicit Q*-approximation for sample-efficient RLHF.arXiv preprint arXiv:2405.21046, 2024
-
[41]
LLM-oriented token-adaptive knowledge distillation, 2025
Xurong Xie, Zhucun Xue, Jiafu Wu, Jian Li, Yabiao Wang, Xiaobin Hu, Yong Liu, and Jiangning Zhang. LLM-oriented token-adaptive knowledge distillation, 2025. URL https: //arxiv.org/abs/2510.11615
-
[42]
PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, and Zhipeng Wang. PACED: Distilla- tion and on-policy self-distillation at the frontier of student competence.arXiv preprint arXiv:2603.11178, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
arXiv preprint arXiv:2505.00662 , year =
Wenkai Yang, Jingwen Chen, Yankai Lin, and Ji-Rong Wen. Deepcritic: Deliberate critique with large language models.arXiv preprint arXiv:2505.00662, 2025
-
[45]
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Wenkai Yang, Weijie Liu, Ruobing Xie, Kai Yang, Saiyong Yang, and Yankai Lin. Learning beyond teacher: Generalized on-policy distillation with reward extrapolation.arXiv preprint arXiv:2602.12125, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[46]
Disentangling reasoning tokens and boilerplate tokens for language model fine-tuning
Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, and Fuli Feng. Disentangling reasoning tokens and boilerplate tokens for language model fine-tuning. InFindings of the Association for Computational Linguistics: ACL 2025, pages 20939–20957. Association for Computational Linguistics, 2025. doi: 10.18653/v1/2025.findings-acl.1078
-
[47]
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. Self-distilled reasoner: On-policy self-distillation for large language models.arXiv preprint arXiv:2601.18734, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[48]
Geometric-mean policy optimization
Yuzhong Zhao, Yue Liu, Junpeng Liu, Jingye Chen, Xun Wu, Yaru Hao, Tengchao Lv, Shaohan Huang, Lei Cui, Qixiang Ye, Fang Wan, and Furu Wei. Geometric-mean policy optimization. InInternational Conference on Learning Representations, 2026
2026
-
[49]
Group Sequence Policy Optimization
Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, and Junyang Lin. Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025. doi: 10.48550/arXiv.2507.18071. URL https://arxiv.org/abs/2507.18071. 13 A Proofs and Derivations The derivations below fix a prompt x u...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.18071 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.