Recognition: 2 theorem links
· Lean TheoremPhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay
Pith reviewed 2026-05-12 05:18 UTC · model grok-4.3
The pith
Folding Manhattan distance decay into attention kernels and RL rewards lets EDA models scale linearly while transferring across chip sizes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PhysEDA shows that a single Manhattan-distance exponential decay kernel can be reused as a multiplicative bias inside Physics-Structured Linear Attention (PSLA) to achieve linear-time attention and as the generator of a physical potential inside Potential-Based Reward Shaping (PBRS) to densify sparse RL rewards while preserving policy invariance. When the resulting framework is tested on decoupling-capacitor placement, macro placement, and IR-drop prediction it produces a 56.8 percent gain in zero-shot cross-scale transfer, a 14 times inference speedup, and 98.5 percent memory reduction on 100-by-100 grids, with PBRS contributing an extra 10.8 percent improvement under sparse rewards.
What carries the argument
The exponential decay kernel defined on Manhattan distance, inserted multiplicatively into the linear attention computation and used to construct a potential function for reward shaping.
If this is right
- Chip layouts an order of magnitude larger become computationally tractable for attention-based models.
- Models trained on small instances can be deployed on much larger instances without retraining or extra data.
- Reinforcement-learning agents for placement receive usable learning signals even when terminal rewards remain sparse.
- Memory and latency reductions make on-device or real-time design-space exploration feasible.
Where Pith is reading between the lines
- The same distance-decay construction could be tried in other grid-based engineering problems such as thermal or signal-integrity simulation.
- If the decay length scale turns out to be derivable from first principles rather than fitted, the framework could become entirely hyperparameter-free.
- The approach may extend to any sparse-reward RL setting on planar graphs where locality is governed by a known distance metric.
Load-bearing premise
Pairwise electrical and routing interactions decay exponentially with Manhattan distance in a sufficiently uniform manner across tasks and scales that the same fixed kernel can be used without per-task retuning.
What would settle it
A controlled experiment on a new grid size or EDA task in which adding the Manhattan decay bias lowers accuracy or transfer performance relative to an otherwise identical baseline model without the bias.
Figures
read the original abstract
Electronic design automation (EDA) addresses placement, routing, timing analysis, and power-integrity verification for integrated circuits. Learning methods -- attention (Transformer) and reinforcement learning (RL) -- have recently emerged on EDA tasks, yet face two common bottlenecks: vanilla attention's quadratic complexity limits scaling, and data-scarce models overfit statistical noise and amplify weak long-range correlations against the underlying physics. We observe that EDA tasks share a physical prior -- pairwise electrical and routing interactions decay exponentially along Manhattan distance -- and integrate it as a unified inductive bias into both architecture and training. We propose PhysEDA, comprising two components Physics-Structured Linear Attention (PSLA) folds the separable Manhattan decay into the linear-attention kernel as a multiplicative bias, reducing complexity from quadratic to linear; Potential-Based Reward Shaping (PBRS) constructs a physical potential from the same kernel, providing dense reward signal under sparse RL while preserving the optimal policy via the policy-invariance theorem. Across three EDA scenarios -- decoupling-capacitor placement, macro placement, and IR-drop prediction -- PhysEDA improves zero-shot cross-scale transfer by 56.8% and achieves 14x inference speedup with 98.5% memory savings on 100x100 grids; PBRS adds another 10.8% in sparse-reward DPP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PhysEDA, a framework that incorporates a physics-based prior of exponential decay in Manhattan distance for pairwise interactions in EDA tasks. This prior is integrated into Physics-Structured Linear Attention (PSLA) as a multiplicative bias to achieve linear complexity and into Potential-Based Reward Shaping (PBRS) to provide dense rewards in RL while maintaining policy invariance. The paper reports substantial improvements in zero-shot cross-scale transfer by 56.8%, 14x inference speedup, 98.5% memory savings on large grids, and additional 10.8% from PBRS in sparse-reward scenarios across three EDA tasks: decoupling-capacitor placement, macro placement, and IR-drop prediction.
Significance. If the results hold and the physical prior is shown to be effective without per-task retuning, this work could significantly advance efficient learning-based methods for EDA by providing a scalable inductive bias that addresses both computational complexity and data scarcity. The dual use of the prior in attention and reward shaping is a creative approach. However, the current presentation lacks sufficient evidence to confirm that the gains stem from the physics prior rather than other factors.
major comments (3)
- Abstract: The performance numbers (56.8% improvement, 14x speedup, 98.5% memory savings, 10.8% from PBRS) are presented without error bars, ablation studies, statistical tests, or derivation details, which is load-bearing for assessing the reliability of the central claims.
- PBRS section: The policy-invariance theorem for PBRS is asserted yet the actual proof steps are not shown, which is essential to validate that the dense reward signal preserves the optimal policy.
- PSLA and decay kernel: The Manhattan decay rate is introduced as a physical prior, but it is unclear whether this rate parameter is fixed from first principles or tuned on the evaluation benchmarks; this is critical because if tuned per task or scale, the reported gains in zero-shot transfer and speedups cannot be fully attributed to the inductive bias rather than regularization.
minor comments (1)
- Abstract: The acronym 'DPP' is used without expansion, reducing clarity for readers unfamiliar with the term.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Abstract: The performance numbers (56.8% improvement, 14x speedup, 98.5% memory savings, 10.8% from PBRS) are presented without error bars, ablation studies, statistical tests, or derivation details, which is load-bearing for assessing the reliability of the central claims.
Authors: We agree that the abstract would be strengthened by additional qualifiers on the reported metrics. In the revised manuscript we will add error bars (computed over multiple random seeds) and a brief reference to the ablation studies and statistical tests already present in Sections 5.2–5.4. Derivation details for the metrics appear in the main text and appendix; we will insert one sentence in the abstract summarizing the robustness of the gains. Full ablation tables and p-values will remain in the body due to abstract length constraints. revision: partial
-
Referee: PBRS section: The policy-invariance theorem for PBRS is asserted yet the actual proof steps are not shown, which is essential to validate that the dense reward signal preserves the optimal policy.
Authors: We acknowledge that the explicit proof steps were omitted. The claim follows directly from the standard potential-based shaping result (Ng et al., 1999). In the revision we will insert the complete short proof in the main text or appendix, showing that the shaped reward R'(s,a,s') = R(s,a,s') + γΦ(s') − Φ(s) leaves the optimal policy unchanged because the added terms telescope in the return. revision: yes
-
Referee: PSLA and decay kernel: The Manhattan decay rate is introduced as a physical prior, but it is unclear whether this rate parameter is fixed from first principles or tuned on the evaluation benchmarks; this is critical because if tuned per task or scale, the reported gains in zero-shot transfer and speedups cannot be fully attributed to the inductive bias rather than regularization.
Authors: The decay rate λ is fixed from first principles and is not tuned. It is derived from the exponential attenuation of electric fields along Manhattan paths in on-chip interconnects, using standard values of dielectric constant and feature size from semiconductor physics (see Section 3.1). The identical λ is used for all three tasks and all scales; no per-benchmark or per-scale optimization was performed. We will add an explicit derivation subsection and a statement confirming the fixed parameter, which is further supported by the zero-shot cross-scale results. revision: yes
Circularity Check
No significant circularity; physical prior introduced as external observation and applied as inductive bias
full rationale
The paper states the exponential Manhattan decay as an observed physical prior shared across EDA tasks, then folds it into PSLA as a multiplicative bias on the linear attention kernel and into PBRS as a potential for reward shaping. These steps follow standard linear attention and policy-invariance constructions with an added separable bias term; the reported speedups and transfer gains are presented as empirical outcomes on the three scenarios rather than quantities forced by construction from fitted parameters or self-citations. No equations reduce the claimed improvements to the inputs by definition, and the prior is not shown to be tuned per-task on the evaluation benchmarks within the provided claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- Manhattan decay rate
axioms (1)
- domain assumption Pairwise electrical and routing interactions decay exponentially with Manhattan distance.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
pairwise electrical and routing interactions decay exponentially along Manhattan distance... PSLA folds the separable Manhattan decay into the linear-attention kernel as a multiplicative bias
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ΦDPP(s) = Σ exp(−α dM(i,p)) − λ Σ exp(−α dM(i,j))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A graph placement methodology for fast chip design , author =. Nature , volume =
- [2]
-
[3]
Kim, Haeyeon and Kim, Minsu and Berto, Federico and Kim, Joungho and Park, Jinkyoo , booktitle =
-
[4]
Lai, Yao and Liu, Jinxin and Tang, Zhentao and Wang, Bin and Hao, Jianye and Luo, Ping , booktitle =
-
[5]
Lai, Yao and Mu, Yao and Luo, Ping , booktitle =
-
[6]
Geng, Zijie and Wang, Jie and Liu, Ziyan and Xu, Siyuan and Tang, Zhentao and Kai, Shixiong and Yuan, Mingxuan and Hao, Jianye and Wu, Feng , booktitle =
-
[7]
Lin, Yibo and Dhar, Shounak and Li, Wuxi and Ren, Haoxing and Khailany, Brucek and Pan, David Z. , booktitle =
-
[8]
Chai, Zhuomin and Zhao, Yuxiang and Liu, Wei and Lin, Yibo and Wang, Runsheng and Huang, Ru , journal =
-
[9]
Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. International Conference on Machine Learning (ICML) , year =
-
[10]
Choromanski, Krzysztof and Likhosherstov, Valerii and Dohan, David and Song, Xingyou and Gane, Andreea and Sarlos, Tamas and Hawkins, Peter and Davis, Jared and Mohiuddin, Afroz and Kaiser, Lukasz and others , booktitle =. Rethinking attention with
-
[11]
Linformer: Self-Attention with Linear Complexity
Linformer: Self-attention with linear complexity , author =. arXiv preprint arXiv:2006.04768 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[12]
International Conference on Learning Representations (ICLR) , year =
cosFormer: Rethinking softmax in attention , author =. International Conference on Learning Representations (ICLR) , year =
-
[13]
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated linear attention transformers with hardware-efficient training , author =. arXiv preprint arXiv:2312.06635 , year =
work page internal anchor Pith review arXiv
-
[14]
Retentive Network: A Successor to Transformer for Large Language Models
Retentive network: A successor to transformer for large language models , author =. arXiv preprint arXiv:2307.08621 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-time sequence modeling with selective state spaces , author =. arXiv preprint arXiv:2312.00752 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
and Ermon, Stefano and Rudra, Atri and R
Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[17]
Journal of Computational Physics , volume =
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , author =. Journal of Computational Physics , volume =
-
[18]
International Conference on Learning Representations (ICLR) , year =
Fourier neural operator for parametric partial differential equations , author =. International Conference on Learning Representations (ICLR) , year =
-
[19]
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning: Grids, groups, graphs, geodesics, and gauges , author =. arXiv preprint arXiv:2104.13478 , year =
work page internal anchor Pith review arXiv
-
[20]
International Conference on Learning Representations (ICLR) , year =
Train short, test long: Attention with linear biases enables input length extrapolation , author =. International Conference on Learning Representations (ICLR) , year =
-
[21]
Su, Jianlin and Ahmed, Murtadha H. M. and Lu, Yu and Pan, Shengfeng and Bo, Wen and Liu, Yunfeng , journal =
-
[22]
International Conference on Machine Learning (ICML) , year =
Policy invariance under reward transformations: Theory and application to reward shaping , author =. International Conference on Machine Learning (ICML) , year =
-
[23]
International Conference on Machine Learning (ICML) , year =
Graph optimal transport for cross-domain alignment , author =. International Conference on Machine Learning (ICML) , year =
-
[24]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Decision transformer: Reinforcement learning via sequence modeling , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[25]
Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D. , booktitle =. What Does
-
[26]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Reinforcement learning policy as macro regulator rather than macro placer , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[27]
International Conference on Machine Learning (ICML) , year =
Chip placement with diffusion models , author =. International Conference on Machine Learning (ICML) , year =
-
[28]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Reinforcement learning gradients as vitamin for online finetuning decision transformers , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[29]
Xie, Zhiyao and Ren, Haoxing and Khailany, Brucek and Sheng, Ye and Santosh, Santosh and Hu, Jiang and Chen, Yiran , booktitle =. 2020 , note =
work page 2020
-
[30]
and Ahuja, Vipul and Prabhu, Ashwath and Patil, Nikhil and Jain, Palkesh and Sapatnekar, Sachin S
Chhabria, Vidya A. and Ahuja, Vipul and Prabhu, Ashwath and Patil, Nikhil and Jain, Palkesh and Sapatnekar, Sachin S. , booktitle =. Thermal and. 2021 , note =
work page 2021
-
[31]
Zhao, Yuxiang and Chai, Zhuomin and Jiang, Xun and Lin, Yibo and Wang, Runsheng and Huang, Ru , journal =
-
[32]
Xie, Zhiyao and Li, Hai and Xu, Xiaoqing and Hu, Jiang and Chen, Yiran , journal =. Fast
-
[33]
International Symposium on Physical Design (ISPD) , year =
Machine learning applications in physical design: Recent results and directions , author =. International Symposium on Physical Design (ISPD) , year =
-
[34]
and Hu, Jin and Kim, Myung-Chul , journal =
Markov, Igor L. and Hu, Jin and Kim, Myung-Chul , journal =. Progress and challenges in
-
[35]
Benchmarking end-to-end performance of
Wang, Zhihai and Geng, Zijie and Tu, Zhaojie and Wang, Jie and Qian, Yuxi and Xu, Zhexuan and Liu, Ziyan and Xu, Siyuan and Tang, Zhentao and Kai, Shixiong and Yuan, Mingxuan and Hao, Jianye and Li, Bin and Zhang, Yongdong and Wu, Feng , journal =. Benchmarking end-to-end performance of
-
[36]
arXiv preprint arXiv:2507.01652 , year =
Autoregressive image generation with linear complexity: A spatial-aware decay perspective , author =. arXiv preprint arXiv:2507.01652 , year =
-
[37]
Power Integrity Modeling and Design for Semiconductors and Systems , author =. 2007 , publisher =
work page 2007
-
[38]
IEEE Transactions on Advanced Packaging , volume =
Chip-package hierarchical power distribution network modeling and analysis based on a segmentation method , author =. IEEE Transactions on Advanced Packaging , volume =
-
[39]
IEEE International Symposium on Electromagnetic Compatibility , volume =
Quantifying decoupling capacitor location , author =. IEEE International Symposium on Electromagnetic Compatibility , volume =
-
[40]
IEEE Transactions on Circuits and Systems I , volume =
D\". IEEE Transactions on Circuits and Systems I , volume =
-
[41]
IEEE Transactions on Advanced Packaging , volume =
Modeling of irregular shaped power distribution planes using transmission matrix method , author =. IEEE Transactions on Advanced Packaging , volume =
-
[42]
Koh, Cheng-Kok and Madden, Patrick H. , booktitle =. Manhattan or non-Manhattan?: a study of alternative
- [43]
-
[44]
Berducci, Luigi and Aguilar, Edgar A. and Ni. Frontiers in Robotics and AI , volume =. 2024 , doi =
work page 2024
- [45]
-
[46]
Fast and Accurate Deep Network Learning by Exponential Linear Units (
Clevert, Djork-Arn. Fast and Accurate Deep Network Learning by Exponential Linear Units (. International Conference on Learning Representations (ICLR) , year =
-
[47]
IEEE Transactions on Computers , volume =
On a pin versus block relationship for partitions of logic graphs , author =. IEEE Transactions on Computers , volume =
-
[48]
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is All You Need , booktitle =
-
[49]
Kipf, Thomas N. and Welling, Max , title =. International Conference on Learning Representations (ICLR) , year =
- [50]
-
[51]
Journal of the Royal Statistical Society: Series B , volume =
Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B , volume =
-
[52]
Cortes, Corinna and Vapnik, Vladimir , title =. Machine Learning , volume =
-
[53]
ACM Transactions on Design Automation of Electronic Systems , volume =
Lu, Jingwei and Chen, Pengwen and Chang, Chin-Chih and Sha, Lu and Huang, Dennis Jen-Hsin and Teng, Chin-Chi and Cheng, Chung-Kuan , title =. ACM Transactions on Design Automation of Electronic Systems , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.