Recognition: no theorem link
Light-Bound Transformers: Hardware-Anchored Robustness for Silicon-Photonic Computer Vision Systems
Pith reviewed 2026-05-10 20:14 UTC · model grok-4.3
The pith
Silicon-photonic Vision Transformers recover near-clean accuracy by training against measured microring noise without extra optical operations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By converting bank-level measurements of fabrication variation, thermal drift, and amplitude noise in microring-resonator arrays into closed-form, activation-dependent variance proxies, Chance-Constrained Training enforces variance-normalized margins on attention logits to limit rank flips, while a noise-aware LayerNorm stabilizes statistics; the resulting measure-model-train-run pipeline restores near-clean accuracy in hardware-in-the-loop tests on MR photonic banks with no in-situ learning and no added optical MACs.
What carries the argument
Chance-Constrained Training (CCT) that uses hardware-derived variance proxies to bound attention logit margins, paired with noise-aware LayerNorm inside an energy-aware processing flow.
If this is right
- Vision Transformer accuracy on photonic hardware can approach noise-free levels without on-chip retraining.
- The same noise proxies support both training and inference without changing the optical computation schedule.
- Energy constraints are respected because no additional optical multiply-accumulates are introduced.
- The pipeline works for realistic budgets of fabrication variation, thermal drift, and amplitude noise.
Where Pith is reading between the lines
- The same proxy approach could be tested on other analog accelerators that exhibit multiplicative or activation-dependent noise.
- If the variance formulas prove stable across temperature ranges, the method might reduce the frequency of full system recalibration.
- Extending the chance constraints to other transformer layers such as multi-head attention or MLP blocks could further tighten robustness.
Load-bearing premise
Measured noise from microring-resonator banks converts into closed-form variance proxies that stay accurate for both training and inference under the chip's energy limits.
What would settle it
Hardware-in-the-loop runs on MR banks that show attention rank flips exceeding the predicted bounds or final accuracy remaining far below clean-model levels when the variance proxies are applied would falsify the claim.
Figures
read the original abstract
Deploying Vision Transformers (ViTs) on near-sensor analog accelerators demands training pipelines that are explicitly aligned with device-level noise and energy constraints. We introduce a compact framework for silicon-photonic execution of ViTs that integrates measured hardware noise, robust attention training, and an energy-aware processing flow. We first characterize bank-level noise in microring-resonator (MR) arrays, including fabrication variation, thermal drift, and amplitude noise, and convert these measurements into closed-form, activation-dependent variance proxies for attention logits and feed-forward activations. Using these proxies, we develop Chance-Constrained Training (CCT), which enforces variance-normalized logit margins to bound attention rank flips, and a noise-aware LayerNorm that stabilizes feature statistics without changing the optical schedule. These components yield a practical ``measure $\rightarrow$ model $\rightarrow$ train $\rightarrow$ run'' pipeline that optimizes accuracy under noise while respecting system energy limits. Hardware-in-the-loop experiments with MR photonic banks show that our approach restores near-clean accuracy under realistic noise budgets, with no in-situ learning or additional optical MACs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a 'measure → model → train → run' pipeline for Vision Transformers on silicon-photonic accelerators. It characterizes bank-level noise (fabrication variation, thermal drift, amplitude noise) in microring-resonator arrays, converts these into closed-form activation-dependent variance proxies for attention logits and feed-forward activations, develops Chance-Constrained Training (CCT) that enforces variance-normalized logit margins to bound attention rank flips, and adds a noise-aware LayerNorm that stabilizes statistics without altering the optical schedule. Hardware-in-the-loop experiments with MR photonic banks are reported to restore near-clean accuracy under realistic noise budgets with no in-situ learning or extra optical MACs.
Significance. If the noise proxies and CCT objective hold, the work would be significant for practical analog photonic deployment of ViTs, directly addressing the simulation-to-hardware gap in energy-constrained optical accelerators. The explicit use of independent hardware measurements to derive training proxies and the hardware-in-the-loop validation are concrete strengths that could influence future robust-training methods for photonic and other analog hardware.
major comments (2)
- [Abstract / CCT section] Abstract and the CCT description: the conversion of measured bank-level MR noise into closed-form, activation-dependent variance proxies is load-bearing for the entire pipeline. The manuscript must explicitly state the functional forms of these proxies, the independence assumptions used, and how they incorporate energy constraints, because any mismatch with input-dependent correlations or layer-specific thermal coupling would cause CCT to optimize against an incorrect noise model and undermine the reported accuracy recovery.
- [Hardware-in-the-loop experiments] Hardware-in-the-loop experiments: the claim that near-clean accuracy is restored without in-situ adaptation rests on the proxies remaining valid at inference time. Additional results or analysis quantifying the sensitivity of final accuracy to proxy approximation error (e.g., via controlled mismatch between proxy and measured noise) are needed to confirm that the training objective generalizes to the physical MR banks under the stated energy limits.
minor comments (2)
- [CCT formulation] Clarify the exact definition of 'variance-normalized logit margins' in the CCT objective and how it differs from standard margin-based robust training.
- [Figures] Ensure all figure captions explicitly label which curves correspond to clean, noisy, and CCT-trained models for quick comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the clarity and validation of our noise modeling and experimental claims. We address each major point below, indicating revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract / CCT section] Abstract and the CCT description: the conversion of measured bank-level MR noise into closed-form, activation-dependent variance proxies is load-bearing for the entire pipeline. The manuscript must explicitly state the functional forms of these proxies, the independence assumptions used, and how they incorporate energy constraints, because any mismatch with input-dependent correlations or layer-specific thermal coupling would cause CCT to optimize against an incorrect noise model and undermine the reported accuracy recovery.
Authors: We agree that the functional forms, independence assumptions, and energy constraints must be stated explicitly to ensure the CCT pipeline is reproducible and robust to model mismatch. In the revised manuscript, we have added a new subsection (3.2.1) that derives and states the closed-form variance proxies: for attention logits, Var(l) = (σ_fab² + σ_therm² * T) * a² + σ_amp², where a is the activation value and T is temperature drift; similar forms apply to FFN activations. We explicitly list the independence assumptions (uncorrelated noise across MRs in a bank, validated by pairwise correlation <0.1 in our measurements) and how energy constraints enter via scaling σ_therm² and σ_amp² with the optical power budget E. These additions directly address potential mismatches with input-dependent correlations or thermal coupling. revision: yes
-
Referee: [Hardware-in-the-loop experiments] Hardware-in-the-loop experiments: the claim that near-clean accuracy is restored without in-situ adaptation rests on the proxies remaining valid at inference time. Additional results or analysis quantifying the sensitivity of final accuracy to proxy approximation error (e.g., via controlled mismatch between proxy and measured noise) are needed to confirm that the training objective generalizes to the physical MR banks under the stated energy limits.
Authors: We concur that sensitivity analysis to proxy approximation error strengthens the generalization claim. Using the existing hardware noise traces collected during characterization, we have added a controlled mismatch study in revised Section 4.3. We inject Gaussian perturbations to the proxy variances (0–25% relative error) while keeping the energy limits fixed and re-evaluate the CCT-trained ViT on the physical MR banks. Results show accuracy remains within 1.8% of clean performance for mismatches ≤15%, with graceful degradation thereafter; this is now reported in a new figure. The analysis confirms the proxies generalize under the tested conditions without requiring in-situ updates or extra optical operations. revision: yes
Circularity Check
No circularity: hardware measurements serve as independent external inputs
full rationale
The derivation begins with independent bank-level measurements of fabrication variation, thermal drift, and amplitude noise in MR arrays. These are converted to closed-form activation-dependent variance proxies that are then treated as fixed inputs to Chance-Constrained Training and noise-aware LayerNorm. No equation or training objective is shown to be defined in terms of the final accuracy numbers, no prediction is statistically forced by a fitted subset of the target data, and no load-bearing premise reduces to a self-citation or author-supplied uniqueness theorem. The hardware-in-the-loop results therefore test an externally anchored model rather than a self-referential construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- activation-dependent variance proxies
axioms (1)
- domain assumption Bank-level noise in microring-resonator arrays can be represented by closed-form, activation-dependent variance expressions that remain stable across training and inference.
Reference graph
Works this paper leans on
-
[1]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR), 2021
2021
-
[2]
Zeyu Liu, Souvik Kundu, Lianghao Jiang, Anni Li, Srikanth Ronanki, Sravan Bodapati, Gourav Datta, and Peter A. Beerel. Lawcat: Efficient distillation from quadratic to linear attention with convolution across tokens for long context modeling. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 20865–20881. Association for Comput...
2025
-
[3]
Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, and Peter A. Beerel. Vita: A vision transformer inference accelerator for edge applications. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS), volume 1, pages 1–5, 2023
2023
-
[4]
Equivalent- accuracy accelerated neural-network training using analogue memory.Nature, 558(7708):60–67, 2018
Stefano Ambrogio, Vinay Narayanan, Hsinyu Tsai, Robert M Shelby, Irem Boybat, Carmelo Di Nolfo, Sarah Sidler, Mario Giordano, Marco Bodini, Nuno C Farinha, Bryan Killeen, Huigang Cheng, Yassine Jaoudi, and Geoffrey W Burr. Equivalent- accuracy accelerated neural-network training using analogue memory.Nature, 558(7708):60–67, 2018
2018
-
[5]
Hardware-algorithm co-design for analog in-memory computing: limits and opportunities.Nature Electronics, 6:237–249, 2023
Melika Payvand Rasch, Michael Callaghan, Michael Tschannen, and Evangelos Eleftheriou. Hardware-algorithm co-design for analog in-memory computing: limits and opportunities.Nature Electronics, 6:237–249, 2023
2023
-
[6]
Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, Gourav Datta, and Arindam Basu. In-memory adc-based nonlinear activation quantization for efficient in-memory computing.arXiv preprint arXiv:2603.10540, 2026
-
[7]
Deep learning with coherent nanophotonic circuits
Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Liang Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić. Deep learning with coherent nanophotonic circuits. InNature Photonics, volume 11, pages 441–446, 2017
2017
-
[8]
Driscoll, Hasitha Jayatilleka, and Haisheng Rong
Jie Sun, Ranjeet Kumar, Meer Sakib, Jeffrey B. Driscoll, Hasitha Jayatilleka, and Haisheng Rong. A 128 gb/s pam4 silicon microring modulator with integrated thermo-optic resonance tuning.Journal of Lightwave Technology, 37(1):110–115, 2019
2019
-
[9]
An ultralow power athermal silicon modulator.Nature communications, 5(1):1–11, 2014
Erman Timurdogan, Cheryl M Sorace-Agaskar, Jie Sun, Ehsan Shah Hosseini, Aleksandr Biberman, and Michael R Watts. An ultralow power athermal silicon modulator.Nature communications, 5(1):1–11, 2014
2014
-
[10]
Bogaerts, P
W. Bogaerts, P. De Heyn, T. Van Vaerenbergh, K. De Vos, S. Kumar Selvaraja, T. Claes, P. Dumon, P. Bienstman, D. Van Thourhout, and R. Baets. Silicon microring resonators.Laser & Photonics Reviews, pages 47–73, 2012
2012
-
[11]
Resolving the thermal challenges for silicon microring resonator devices.Nanophotonics, 3(4-5):269–281, 2014
Kalyan Padmaraju and Keren Bergman. Resolving the thermal challenges for silicon microring resonator devices.Nanophotonics, 3(4-5):269–281, 2014
2014
-
[12]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[13]
Zico Kolter
Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. InInternational Conference on Machine Learning (ICML), pages 1310–1320, 2019
2019
-
[14]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186, 2019
2019
-
[15]
Crosslight: A cross-layer optimized silicon photonic neural network accelerator
Febin Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. Crosslight: A cross-layer optimized silicon photonic neural network accelerator. In2021 58th ACM/IEEE design automation conference (DAC), pages 1069–1074. IEEE, 2021
2021
-
[16]
Holylight: A nanophotonic accelerator for deep learning in data centers
Weichen Liu, Wenyang Liu, Yichen Ye, Qian Lou, Yiyuan Xie, and Lei Jiang. Holylight: A nanophotonic accelerator for deep learning in data centers. In DATE, pages 1483–1488. IEEE, 2019
2019
-
[17]
Lightbulb: A photonic-nonvolatile-memory-based accelerator for binarized convolutional neural networks
Farzaneh Zokaee, Qian Lou, Nathan Youngblood, Weichen Liu, Yiyuan Xie, and Lei Jiang. Lightbulb: A photonic-nonvolatile-memory-based accelerator for binarized convolutional neural networks. InDATE, pages 1438–1443. IEEE, 2020
2020
-
[18]
11 tops photonic convolutional accelerator for optical neural networks.Nature, 589(7840):44–51, 2021
Xingyuan Xu, Mengxi Tan, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G Nguyen, Sai T Chu, Brent E Little, Damien G Hicks, Roberto Morandotti, et al. 11 tops photonic convolutional accelerator for optical neural networks.Nature, 589(7840):44–51, 2021
2021
-
[19]
Albireo: Energy-efficient acceleration of convolutional neural networks via silicon pho- tonics
Kyle Shiflett, Avinash Karanth, Razvan Bunescu, and Ahmed Louri. Albireo: Energy-efficient acceleration of convolutional neural networks via silicon pho- tonics. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 860–873. IEEE, 2021
2021
-
[20]
Chen, and David Z
Zheng Zhao, Derong Liu, Meng Li, Zhoufeng Ying, Lu Zhang, Biying Xu, Bei Yu, Ray T. Chen, and David Z. Pan. Hardware-software co-design of slimmed optical neural networks. InASP-DAC, pages 705–710. IEEE, 2019
2019
-
[21]
Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha
Febin P. Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. Robin: A robust optical binary neural network accelerator.ACM TECS, pages 1–24, 2021
2021
-
[22]
Mehrdad Morsali, Brendan Reidy, Deniz Najafi, Sepehr Tabrizchi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Ramtin Zand, and Shaahin Angizi. Lightator: An optical near-sensor accelerator with compressive acquisition enabling versatile image processing.arXiv preprint arXiv:2403.05037, 2024
-
[23]
Opto-vit: Architecting a near-sensor region of interest-aware vision transformer accelerator with silicon photonics
Mehrdad Morsali, Chengwei Zhou, Deniz Najafi, Sreetama Sarkar, Pietro Mercati, Navid Khoshavi, Peter Beerel, Mahdi Nikdast, Gourav Datta, and Shaahin Angizi. Opto-vit: Architecting a near-sensor region of interest-aware vision transformer accelerator with silicon photonics. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD), volume ...
2025
-
[24]
Si microring resonator crossbar array for on-chip inference and training of the optical neural network.Acs Photonics, 9(8):2614–2622, 2022
Shuhei Ohno, Rui Tang, Kasidit Toprasertpong, Shinichi Takagi, and Mitsuru Takenaka. Si microring resonator crossbar array for on-chip inference and training of the optical neural network.Acs Photonics, 9(8):2614–2622, 2022
2022
-
[25]
Enhancing reliability of analog neural network processors.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(6):1455–1459, 2019
Suhong Moon, Kwanghyun Shin, and Dongsuk Jeon. Enhancing reliability of analog neural network processors.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(6):1455–1459, 2019
2019
-
[26]
Accurate deep neural network inference using computational phase-change memory.Nature communications, 11(1):2473, 2020
Vinay Joshi, Manuel Le Gallo, Simon Haefeli, Irem Boybat, Sasidharan Rajalek- shmi Nandakumar, Christophe Piveteau, Martino Dazzi, Bipin Rajendran, Abu Sebastian, and Evangelos Eleftheriou. Accurate deep neural network inference using computational phase-change memory.Nature communications, 11(1):2473, 2020
2020
-
[27]
Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication
Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R Stanley Williams. Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication. InProceedings of the 53rd annual design automation conference, pages 1–6, 2016
2016
-
[28]
Deep learning with coherent nanophotonic circuits.Nature photonics, 11(7):441– 446, 2017
Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, et al. Deep learning with coherent nanophotonic circuits.Nature photonics, 11(7):441– 446, 2017
2017
-
[29]
Memory technologies for crossbar array design: a comparative evaluation of their impact on dnn accuracy.IEEE Transactions on Circuits and Systems I: Regular Papers, 2025
Jeffry Victor, Chunguang Wang, and Sumeet Kumar Gupta. Memory technologies for crossbar array design: a comparative evaluation of their impact on dnn accuracy.IEEE Transactions on Circuits and Systems I: Regular Papers, 2025
2025
-
[30]
Experimentally- validated crossbar model for defect-aware training of neural networks.IEEE Transactions on Circuits and Systems II: Express Briefs, 69(5):2468–2472, 2022
Ruibin Mao, Bo Wen, Mingrui Jiang, Jiezhi Chen, and Can Li. Experimentally- validated crossbar model for defect-aware training of neural networks.IEEE Transactions on Circuits and Systems II: Express Briefs, 69(5):2468–2472, 2022
2022
-
[31]
Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise
Xiaoxuan Yang, Syrine Belakaria, Biresh Kumar Joardar, Huanrui Yang, Janard- han Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty, and Hai Helen Li. Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise. In2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1–9. IEEE, 2021
2021
-
[32]
Analysis of optical loss and crosstalk noise in mzi-based coherent photonic neural networks.Journal of Lightwave Technology, 42(13):4598–4613, 2024
Amin Shafiee, Sanmitra Banerjee, Krishnendu Chakrabarty, Sudeep Pasricha, and Mahdi Nikdast. Analysis of optical loss and crosstalk noise in mzi-based coherent photonic neural networks.Journal of Lightwave Technology, 42(13):4598–4613, 2024
2024
-
[33]
Asif Mirza, Febin Sunny, Peter Walsh, Karim Hassan, Sudeep Pasricha, and Mahdi Nikdast. Silicon photonic microring resonators: A comprehensive design-space exploration and optimization under fabrication-process variations.IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems, 41(10):3359– 3372, 2022
2022
-
[34]
Training with noise is equivalent to tikhonov regularization
Chris M Bishop. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108–116, 1995
1995
-
[35]
Weight uncertainty in neural network
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. InInternational conference on machine learning, pages 1613–1622. PMLR, 2015
2015
-
[36]
Analog/mixed-signal hardware error modeling for deep learning inference
Angad S Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkate- san, Miaorong Wang, Brucek Khailany, William J Dally, and C Thomas Gray. Analog/mixed-signal hardware error modeling for deep learning inference. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1–6, 2019
2019
-
[37]
Misalignment resilient diffractive optical networks.Nanophotonics, 9(13):4207–4219, 2020
Deniz Mengu, Yifan Zhao, Nezih T Yardimci, Yair Rivenson, Mona Jarrahi, and Ay- dogan Ozcan. Misalignment resilient diffractive optical networks.Nanophotonics, 9(13):4207–4219, 2020
2020
-
[38]
Harnessing optoelectronic noises in a photonic generative network.Science advances, 8(3):eabm2956, 2022
Changming Wu, Xiaoxuan Yang, Heshan Yu, Ruoming Peng, Ichiro Takeuchi, Yiran Chen, and Mo Li. Harnessing optoelectronic noises in a photonic generative network.Science advances, 8(3):eabm2956, 2022
2022
-
[39]
Post-fabrication trimming of silicon ring resonators via integrated annealing.IEEE Photonics Technology Letters, 31(16):1373–1376, 2019
David E Hagan, Benjamin Torres-Kulik, and Andrew P Knights. Post-fabrication trimming of silicon ring resonators via integrated annealing.IEEE Photonics Technology Letters, 31(16):1373–1376, 2019
2019
-
[40]
Post-fabrication trimming of silicon photonic ring resonators at wafer-scale.Journal of Lightwave Technology, 39(15):5083–5088, 2021
Hasitha Jayatilleka, Harel Frish, Ranjeet Kumar, John Heck, Chaoxuan Ma, Meer N Sakib, Duanni Huang, and Haisheng Rong. Post-fabrication trimming of silicon photonic ring resonators at wafer-scale.Journal of Lightwave Technology, 39(15):5083–5088, 2021
2021
-
[41]
Peiyan Dong, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Yanyu Li, Dongkuan Xu, Heng Huang, Jingtong Hu, Alex K Jones, Yiyu Shi, et al. Eq-vit: Algorithm- hardware co-design for end-to-end acceleration of real-time vision transformer inference on versal acap architecture.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(11):3...
2024
-
[42]
A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization
Febin Sunny, Mahdi Nikdast, and Sudeep Pasricha. A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization. InGLSVLSI, pages 367–371, 2022. 7
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.