pith. machine review for the scientific record. sign in

arxiv: 2604.04330 · v1 · submitted 2026-04-06 · 💻 cs.ET

Recognition: no theorem link

Light-Bound Transformers: Hardware-Anchored Robustness for Silicon-Photonic Computer Vision Systems

Arman Roohi, Chengwei Zhou, Deniz Najafi, Gourav Datta, Mahdi Nikdast, Mohsen Imani, Pietro Mercati, Shaahin Angizi, Xuming Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:14 UTC · model grok-4.3

classification 💻 cs.ET
keywords silicon photonicsvision transformershardware noise modelingrobust trainingmicroring resonatorschance-constrained optimizationanalog acceleratorsphotonic computing
0
0 comments X

The pith

Silicon-photonic Vision Transformers recover near-clean accuracy by training against measured microring noise without extra optical operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to run Vision Transformers on silicon-photonic hardware by first measuring real noise in microring-resonator arrays, then turning those measurements into simple variance formulas that depend on the activation values. These formulas feed into a training procedure that adds chance constraints on attention logits and a modified LayerNorm, so the model learns to keep its decisions stable even when the hardware adds noise. The result is a full pipeline from measurement to deployment that keeps accuracy close to the noise-free case while staying inside the energy budget of the photonic system. A reader would care because analog photonic accelerators sit close to sensors yet suffer from fabrication variation and drift that normally destroy transformer performance.

Core claim

By converting bank-level measurements of fabrication variation, thermal drift, and amplitude noise in microring-resonator arrays into closed-form, activation-dependent variance proxies, Chance-Constrained Training enforces variance-normalized margins on attention logits to limit rank flips, while a noise-aware LayerNorm stabilizes statistics; the resulting measure-model-train-run pipeline restores near-clean accuracy in hardware-in-the-loop tests on MR photonic banks with no in-situ learning and no added optical MACs.

What carries the argument

Chance-Constrained Training (CCT) that uses hardware-derived variance proxies to bound attention logit margins, paired with noise-aware LayerNorm inside an energy-aware processing flow.

If this is right

  • Vision Transformer accuracy on photonic hardware can approach noise-free levels without on-chip retraining.
  • The same noise proxies support both training and inference without changing the optical computation schedule.
  • Energy constraints are respected because no additional optical multiply-accumulates are introduced.
  • The pipeline works for realistic budgets of fabrication variation, thermal drift, and amplitude noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same proxy approach could be tested on other analog accelerators that exhibit multiplicative or activation-dependent noise.
  • If the variance formulas prove stable across temperature ranges, the method might reduce the frequency of full system recalibration.
  • Extending the chance constraints to other transformer layers such as multi-head attention or MLP blocks could further tighten robustness.

Load-bearing premise

Measured noise from microring-resonator banks converts into closed-form variance proxies that stay accurate for both training and inference under the chip's energy limits.

What would settle it

Hardware-in-the-loop runs on MR banks that show attention rank flips exceeding the predicted bounds or final accuracy remaining far below clean-model levels when the variance proxies are applied would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.04330 by Arman Roohi, Chengwei Zhou, Deniz Najafi, Gourav Datta, Mahdi Nikdast, Mohsen Imani, Pietro Mercati, Shaahin Angizi, Xuming Chen.

Figure 1
Figure 1. Figure 1: (a) Fabricated SiPh MR array with >200 identical MR cells (SEM shown). (b) Input and through-port spectra after parameter imprinting via tuning the MR resonance. (c) Multiple MRs in one arm imprint weight values onto the input signal at different wavelengths. but does not target the pairwise logit orderings that govern atten￾tion routing, nor does it leverage the structure of device noise ob￾served on real… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed noisy under-test architecture. matrix. Each of the 𝐿 encoder blocks consists of Multi-Head Self￾Attention (MHSA) and Feed-Forward Network (FFN) modules with layer normalization and residual connections. MHSA uses ℎ heads with query, key, and value projections (𝑊𝑄,𝑊𝐾,𝑊𝑉 ), computing attention as softmax  𝑄𝐾𝑇 √ 𝑑𝑘  𝑉 . The outputs are concatenated and passed through the FFN to mode… view at source ↗
Figure 5
Figure 5. Figure 5: End-to-end noise-aware ViT for photonic hardware. Mea￾sured microring-bank statistics inform closed-form variance prox￾ies, driving two algorithmic defenses: chance-constrained attention (CCT) to preserve ranking under noise, and noise-aware LayerNorm for FFN channel stability. Training jointly minimizes task and con￾sistency losses; at deployment, the same weights yield robustness to fabrication, thermal,… view at source ↗
Figure 6
Figure 6. Figure 6: Top-1 mean accuracy (%) of ViT-Tiny on CIFAR [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance (KFPS/W) comparison with baseline SiPh designs and VCK190 FPGA and NVIDIA A100 GPU. The proposed design achieves a peak efficiency of 100.4 KFPS/W, whereas the VCK190 reaches only 1.42 KFPS/W (70.6× slower) and the A100 delivers 0.86 KFPS/W (116.7× slower) [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Processing energy and (b) delay breakdown for [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
read the original abstract

Deploying Vision Transformers (ViTs) on near-sensor analog accelerators demands training pipelines that are explicitly aligned with device-level noise and energy constraints. We introduce a compact framework for silicon-photonic execution of ViTs that integrates measured hardware noise, robust attention training, and an energy-aware processing flow. We first characterize bank-level noise in microring-resonator (MR) arrays, including fabrication variation, thermal drift, and amplitude noise, and convert these measurements into closed-form, activation-dependent variance proxies for attention logits and feed-forward activations. Using these proxies, we develop Chance-Constrained Training (CCT), which enforces variance-normalized logit margins to bound attention rank flips, and a noise-aware LayerNorm that stabilizes feature statistics without changing the optical schedule. These components yield a practical ``measure $\rightarrow$ model $\rightarrow$ train $\rightarrow$ run'' pipeline that optimizes accuracy under noise while respecting system energy limits. Hardware-in-the-loop experiments with MR photonic banks show that our approach restores near-clean accuracy under realistic noise budgets, with no in-situ learning or additional optical MACs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a 'measure → model → train → run' pipeline for Vision Transformers on silicon-photonic accelerators. It characterizes bank-level noise (fabrication variation, thermal drift, amplitude noise) in microring-resonator arrays, converts these into closed-form activation-dependent variance proxies for attention logits and feed-forward activations, develops Chance-Constrained Training (CCT) that enforces variance-normalized logit margins to bound attention rank flips, and adds a noise-aware LayerNorm that stabilizes statistics without altering the optical schedule. Hardware-in-the-loop experiments with MR photonic banks are reported to restore near-clean accuracy under realistic noise budgets with no in-situ learning or extra optical MACs.

Significance. If the noise proxies and CCT objective hold, the work would be significant for practical analog photonic deployment of ViTs, directly addressing the simulation-to-hardware gap in energy-constrained optical accelerators. The explicit use of independent hardware measurements to derive training proxies and the hardware-in-the-loop validation are concrete strengths that could influence future robust-training methods for photonic and other analog hardware.

major comments (2)
  1. [Abstract / CCT section] Abstract and the CCT description: the conversion of measured bank-level MR noise into closed-form, activation-dependent variance proxies is load-bearing for the entire pipeline. The manuscript must explicitly state the functional forms of these proxies, the independence assumptions used, and how they incorporate energy constraints, because any mismatch with input-dependent correlations or layer-specific thermal coupling would cause CCT to optimize against an incorrect noise model and undermine the reported accuracy recovery.
  2. [Hardware-in-the-loop experiments] Hardware-in-the-loop experiments: the claim that near-clean accuracy is restored without in-situ adaptation rests on the proxies remaining valid at inference time. Additional results or analysis quantifying the sensitivity of final accuracy to proxy approximation error (e.g., via controlled mismatch between proxy and measured noise) are needed to confirm that the training objective generalizes to the physical MR banks under the stated energy limits.
minor comments (2)
  1. [CCT formulation] Clarify the exact definition of 'variance-normalized logit margins' in the CCT objective and how it differs from standard margin-based robust training.
  2. [Figures] Ensure all figure captions explicitly label which curves correspond to clean, noisy, and CCT-trained models for quick comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the clarity and validation of our noise modeling and experimental claims. We address each major point below, indicating revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract / CCT section] Abstract and the CCT description: the conversion of measured bank-level MR noise into closed-form, activation-dependent variance proxies is load-bearing for the entire pipeline. The manuscript must explicitly state the functional forms of these proxies, the independence assumptions used, and how they incorporate energy constraints, because any mismatch with input-dependent correlations or layer-specific thermal coupling would cause CCT to optimize against an incorrect noise model and undermine the reported accuracy recovery.

    Authors: We agree that the functional forms, independence assumptions, and energy constraints must be stated explicitly to ensure the CCT pipeline is reproducible and robust to model mismatch. In the revised manuscript, we have added a new subsection (3.2.1) that derives and states the closed-form variance proxies: for attention logits, Var(l) = (σ_fab² + σ_therm² * T) * a² + σ_amp², where a is the activation value and T is temperature drift; similar forms apply to FFN activations. We explicitly list the independence assumptions (uncorrelated noise across MRs in a bank, validated by pairwise correlation <0.1 in our measurements) and how energy constraints enter via scaling σ_therm² and σ_amp² with the optical power budget E. These additions directly address potential mismatches with input-dependent correlations or thermal coupling. revision: yes

  2. Referee: [Hardware-in-the-loop experiments] Hardware-in-the-loop experiments: the claim that near-clean accuracy is restored without in-situ adaptation rests on the proxies remaining valid at inference time. Additional results or analysis quantifying the sensitivity of final accuracy to proxy approximation error (e.g., via controlled mismatch between proxy and measured noise) are needed to confirm that the training objective generalizes to the physical MR banks under the stated energy limits.

    Authors: We concur that sensitivity analysis to proxy approximation error strengthens the generalization claim. Using the existing hardware noise traces collected during characterization, we have added a controlled mismatch study in revised Section 4.3. We inject Gaussian perturbations to the proxy variances (0–25% relative error) while keeping the energy limits fixed and re-evaluate the CCT-trained ViT on the physical MR banks. Results show accuracy remains within 1.8% of clean performance for mismatches ≤15%, with graceful degradation thereafter; this is now reported in a new figure. The analysis confirms the proxies generalize under the tested conditions without requiring in-situ updates or extra optical operations. revision: yes

Circularity Check

0 steps flagged

No circularity: hardware measurements serve as independent external inputs

full rationale

The derivation begins with independent bank-level measurements of fabrication variation, thermal drift, and amplitude noise in MR arrays. These are converted to closed-form activation-dependent variance proxies that are then treated as fixed inputs to Chance-Constrained Training and noise-aware LayerNorm. No equation or training objective is shown to be defined in terms of the final accuracy numbers, no prediction is statistically forced by a fitted subset of the target data, and no load-bearing premise reduces to a self-citation or author-supplied uniqueness theorem. The hardware-in-the-loop results therefore test an externally anchored model rather than a self-referential construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that hardware noise can be summarized by closed-form variance proxies; no new physical entities are postulated and free parameters appear limited to those fitted during the initial noise characterization.

free parameters (1)
  • activation-dependent variance proxies
    Derived from measured fabrication variation, thermal drift, and amplitude noise in MR arrays and used to normalize logits and activations.
axioms (1)
  • domain assumption Bank-level noise in microring-resonator arrays can be represented by closed-form, activation-dependent variance expressions that remain stable across training and inference.
    Invoked to build the CCT objective and noise-aware LayerNorm without altering the optical schedule.

pith-pipeline@v0.9.0 · 5524 in / 1260 out tokens · 112244 ms · 2026-05-10T20:14:55.243294+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR), 2021

  2. [2]

    Zeyu Liu, Souvik Kundu, Lianghao Jiang, Anni Li, Srikanth Ronanki, Sravan Bodapati, Gourav Datta, and Peter A. Beerel. Lawcat: Efficient distillation from quadratic to linear attention with convolution across tokens for long context modeling. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 20865–20881. Association for Comput...

  3. [3]

    Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, and Peter A. Beerel. Vita: A vision transformer inference accelerator for edge applications. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS), volume 1, pages 1–5, 2023

  4. [4]

    Equivalent- accuracy accelerated neural-network training using analogue memory.Nature, 558(7708):60–67, 2018

    Stefano Ambrogio, Vinay Narayanan, Hsinyu Tsai, Robert M Shelby, Irem Boybat, Carmelo Di Nolfo, Sarah Sidler, Mario Giordano, Marco Bodini, Nuno C Farinha, Bryan Killeen, Huigang Cheng, Yassine Jaoudi, and Geoffrey W Burr. Equivalent- accuracy accelerated neural-network training using analogue memory.Nature, 558(7708):60–67, 2018

  5. [5]

    Hardware-algorithm co-design for analog in-memory computing: limits and opportunities.Nature Electronics, 6:237–249, 2023

    Melika Payvand Rasch, Michael Callaghan, Michael Tschannen, and Evangelos Eleftheriou. Hardware-algorithm co-design for analog in-memory computing: limits and opportunities.Nature Electronics, 6:237–249, 2023

  6. [6]

    In-memory adc-based nonlinear activation quantization for efficient in-memory computing.arXiv preprint arXiv:2603.10540, 2026

    Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, Gourav Datta, and Arindam Basu. In-memory adc-based nonlinear activation quantization for efficient in-memory computing.arXiv preprint arXiv:2603.10540, 2026

  7. [7]

    Deep learning with coherent nanophotonic circuits

    Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Liang Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić. Deep learning with coherent nanophotonic circuits. InNature Photonics, volume 11, pages 441–446, 2017

  8. [8]

    Driscoll, Hasitha Jayatilleka, and Haisheng Rong

    Jie Sun, Ranjeet Kumar, Meer Sakib, Jeffrey B. Driscoll, Hasitha Jayatilleka, and Haisheng Rong. A 128 gb/s pam4 silicon microring modulator with integrated thermo-optic resonance tuning.Journal of Lightwave Technology, 37(1):110–115, 2019

  9. [9]

    An ultralow power athermal silicon modulator.Nature communications, 5(1):1–11, 2014

    Erman Timurdogan, Cheryl M Sorace-Agaskar, Jie Sun, Ehsan Shah Hosseini, Aleksandr Biberman, and Michael R Watts. An ultralow power athermal silicon modulator.Nature communications, 5(1):1–11, 2014

  10. [10]

    Bogaerts, P

    W. Bogaerts, P. De Heyn, T. Van Vaerenbergh, K. De Vos, S. Kumar Selvaraja, T. Claes, P. Dumon, P. Bienstman, D. Van Thourhout, and R. Baets. Silicon microring resonators.Laser & Photonics Reviews, pages 47–73, 2012

  11. [11]

    Resolving the thermal challenges for silicon microring resonator devices.Nanophotonics, 3(4-5):269–281, 2014

    Kalyan Padmaraju and Keren Bergman. Resolving the thermal challenges for silicon microring resonator devices.Nanophotonics, 3(4-5):269–281, 2014

  12. [12]

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016

  13. [13]

    Zico Kolter

    Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. InInternational Conference on Machine Learning (ICML), pages 1310–1320, 2019

  14. [14]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186, 2019

  15. [15]

    Crosslight: A cross-layer optimized silicon photonic neural network accelerator

    Febin Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. Crosslight: A cross-layer optimized silicon photonic neural network accelerator. In2021 58th ACM/IEEE design automation conference (DAC), pages 1069–1074. IEEE, 2021

  16. [16]

    Holylight: A nanophotonic accelerator for deep learning in data centers

    Weichen Liu, Wenyang Liu, Yichen Ye, Qian Lou, Yiyuan Xie, and Lei Jiang. Holylight: A nanophotonic accelerator for deep learning in data centers. In DATE, pages 1483–1488. IEEE, 2019

  17. [17]

    Lightbulb: A photonic-nonvolatile-memory-based accelerator for binarized convolutional neural networks

    Farzaneh Zokaee, Qian Lou, Nathan Youngblood, Weichen Liu, Yiyuan Xie, and Lei Jiang. Lightbulb: A photonic-nonvolatile-memory-based accelerator for binarized convolutional neural networks. InDATE, pages 1438–1443. IEEE, 2020

  18. [18]

    11 tops photonic convolutional accelerator for optical neural networks.Nature, 589(7840):44–51, 2021

    Xingyuan Xu, Mengxi Tan, Bill Corcoran, Jiayang Wu, Andreas Boes, Thach G Nguyen, Sai T Chu, Brent E Little, Damien G Hicks, Roberto Morandotti, et al. 11 tops photonic convolutional accelerator for optical neural networks.Nature, 589(7840):44–51, 2021

  19. [19]

    Albireo: Energy-efficient acceleration of convolutional neural networks via silicon pho- tonics

    Kyle Shiflett, Avinash Karanth, Razvan Bunescu, and Ahmed Louri. Albireo: Energy-efficient acceleration of convolutional neural networks via silicon pho- tonics. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 860–873. IEEE, 2021

  20. [20]

    Chen, and David Z

    Zheng Zhao, Derong Liu, Meng Li, Zhoufeng Ying, Lu Zhang, Biying Xu, Bei Yu, Ray T. Chen, and David Z. Pan. Hardware-software co-design of slimmed optical neural networks. InASP-DAC, pages 705–710. IEEE, 2019

  21. [21]

    Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha

    Febin P. Sunny, Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. Robin: A robust optical binary neural network accelerator.ACM TECS, pages 1–24, 2021

  22. [22]

    Lightator: An optical near-sensor accelerator with compressive acquisition enabling versatile image processing.arXiv preprint arXiv:2403.05037, 2024

    Mehrdad Morsali, Brendan Reidy, Deniz Najafi, Sepehr Tabrizchi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Ramtin Zand, and Shaahin Angizi. Lightator: An optical near-sensor accelerator with compressive acquisition enabling versatile image processing.arXiv preprint arXiv:2403.05037, 2024

  23. [23]

    Opto-vit: Architecting a near-sensor region of interest-aware vision transformer accelerator with silicon photonics

    Mehrdad Morsali, Chengwei Zhou, Deniz Najafi, Sreetama Sarkar, Pietro Mercati, Navid Khoshavi, Peter Beerel, Mahdi Nikdast, Gourav Datta, and Shaahin Angizi. Opto-vit: Architecting a near-sensor region of interest-aware vision transformer accelerator with silicon photonics. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD), volume ...

  24. [24]

    Si microring resonator crossbar array for on-chip inference and training of the optical neural network.Acs Photonics, 9(8):2614–2622, 2022

    Shuhei Ohno, Rui Tang, Kasidit Toprasertpong, Shinichi Takagi, and Mitsuru Takenaka. Si microring resonator crossbar array for on-chip inference and training of the optical neural network.Acs Photonics, 9(8):2614–2622, 2022

  25. [25]

    Enhancing reliability of analog neural network processors.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(6):1455–1459, 2019

    Suhong Moon, Kwanghyun Shin, and Dongsuk Jeon. Enhancing reliability of analog neural network processors.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(6):1455–1459, 2019

  26. [26]

    Accurate deep neural network inference using computational phase-change memory.Nature communications, 11(1):2473, 2020

    Vinay Joshi, Manuel Le Gallo, Simon Haefeli, Irem Boybat, Sasidharan Rajalek- shmi Nandakumar, Christophe Piveteau, Martino Dazzi, Bipin Rajendran, Abu Sebastian, and Evangelos Eleftheriou. Accurate deep neural network inference using computational phase-change memory.Nature communications, 11(1):2473, 2020

  27. [27]

    Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication

    Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R Stanley Williams. Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication. InProceedings of the 53rd annual design automation conference, pages 1–6, 2016

  28. [28]

    Deep learning with coherent nanophotonic circuits.Nature photonics, 11(7):441– 446, 2017

    Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, et al. Deep learning with coherent nanophotonic circuits.Nature photonics, 11(7):441– 446, 2017

  29. [29]

    Memory technologies for crossbar array design: a comparative evaluation of their impact on dnn accuracy.IEEE Transactions on Circuits and Systems I: Regular Papers, 2025

    Jeffry Victor, Chunguang Wang, and Sumeet Kumar Gupta. Memory technologies for crossbar array design: a comparative evaluation of their impact on dnn accuracy.IEEE Transactions on Circuits and Systems I: Regular Papers, 2025

  30. [30]

    Experimentally- validated crossbar model for defect-aware training of neural networks.IEEE Transactions on Circuits and Systems II: Express Briefs, 69(5):2468–2472, 2022

    Ruibin Mao, Bo Wen, Mingrui Jiang, Jiezhi Chen, and Can Li. Experimentally- validated crossbar model for defect-aware training of neural networks.IEEE Transactions on Circuits and Systems II: Express Briefs, 69(5):2468–2472, 2022

  31. [31]

    Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise

    Xiaoxuan Yang, Syrine Belakaria, Biresh Kumar Joardar, Huanrui Yang, Janard- han Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty, and Hai Helen Li. Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise. In2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1–9. IEEE, 2021

  32. [32]

    Analysis of optical loss and crosstalk noise in mzi-based coherent photonic neural networks.Journal of Lightwave Technology, 42(13):4598–4613, 2024

    Amin Shafiee, Sanmitra Banerjee, Krishnendu Chakrabarty, Sudeep Pasricha, and Mahdi Nikdast. Analysis of optical loss and crosstalk noise in mzi-based coherent photonic neural networks.Journal of Lightwave Technology, 42(13):4598–4613, 2024

  33. [33]

    Asif Mirza, Febin Sunny, Peter Walsh, Karim Hassan, Sudeep Pasricha, and Mahdi Nikdast. Silicon photonic microring resonators: A comprehensive design-space exploration and optimization under fabrication-process variations.IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems, 41(10):3359– 3372, 2022

  34. [34]

    Training with noise is equivalent to tikhonov regularization

    Chris M Bishop. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108–116, 1995

  35. [35]

    Weight uncertainty in neural network

    Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. InInternational conference on machine learning, pages 1613–1622. PMLR, 2015

  36. [36]

    Analog/mixed-signal hardware error modeling for deep learning inference

    Angad S Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkate- san, Miaorong Wang, Brucek Khailany, William J Dally, and C Thomas Gray. Analog/mixed-signal hardware error modeling for deep learning inference. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1–6, 2019

  37. [37]

    Misalignment resilient diffractive optical networks.Nanophotonics, 9(13):4207–4219, 2020

    Deniz Mengu, Yifan Zhao, Nezih T Yardimci, Yair Rivenson, Mona Jarrahi, and Ay- dogan Ozcan. Misalignment resilient diffractive optical networks.Nanophotonics, 9(13):4207–4219, 2020

  38. [38]

    Harnessing optoelectronic noises in a photonic generative network.Science advances, 8(3):eabm2956, 2022

    Changming Wu, Xiaoxuan Yang, Heshan Yu, Ruoming Peng, Ichiro Takeuchi, Yiran Chen, and Mo Li. Harnessing optoelectronic noises in a photonic generative network.Science advances, 8(3):eabm2956, 2022

  39. [39]

    Post-fabrication trimming of silicon ring resonators via integrated annealing.IEEE Photonics Technology Letters, 31(16):1373–1376, 2019

    David E Hagan, Benjamin Torres-Kulik, and Andrew P Knights. Post-fabrication trimming of silicon ring resonators via integrated annealing.IEEE Photonics Technology Letters, 31(16):1373–1376, 2019

  40. [40]

    Post-fabrication trimming of silicon photonic ring resonators at wafer-scale.Journal of Lightwave Technology, 39(15):5083–5088, 2021

    Hasitha Jayatilleka, Harel Frish, Ranjeet Kumar, John Heck, Chaoxuan Ma, Meer N Sakib, Duanni Huang, and Haisheng Rong. Post-fabrication trimming of silicon photonic ring resonators at wafer-scale.Journal of Lightwave Technology, 39(15):5083–5088, 2021

  41. [41]

    Peiyan Dong, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Yanyu Li, Dongkuan Xu, Heng Huang, Jingtong Hu, Alex K Jones, Yiyu Shi, et al. Eq-vit: Algorithm- hardware co-design for end-to-end acceleration of real-time vision transformer inference on versal acap architecture.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(11):3...

  42. [42]

    A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization

    Febin Sunny, Mahdi Nikdast, and Sudeep Pasricha. A silicon photonic accelerator for convolutional neural networks with heterogeneous quantization. InGLSVLSI, pages 367–371, 2022. 7