Recognition: unknown
On-chip 1 TOPS Hyperdimensional Photonic Tensor Core using a WDM Silicon Photonic Coherent Crossbar
Pith reviewed 2026-05-14 18:52 UTC · model grok-4.3
The pith
A silicon photonic crossbar achieves 0.96 TOPS for hyperdimensional tensor computations using time-space-wavelength multiplexing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate a 4-channel 2-input TSWDM Xbar incorporating 56 GHz EAMs and 4-channel multiplexing stages that functions as a 4x2x1 tensor-vector multiplication unit with 3.9 percent average error at 0.96 TOPS throughput; the same hardware delivers 93.3 percent accuracy on Iris classification at 4x10 to 4x30 GBd and 83.3 percent at 4x60 GBd, while WDM integration in the SDM architecture lowers operating laser power and supports scaling toward POPS-regime accelerators.
What carries the argument
The time-space-wavelength multiplexed (TSWDM) silicon photonic coherent crossbar, which unfolds multiply-accumulate operations over the time domain while distributing computation across spatial and wavelength channels.
Load-bearing premise
That the 4x2x1 unit performance and error rates will hold when scaling to larger arrays and higher channel counts without significant additional noise, crosstalk, or power penalties.
What would settle it
Fabricate and measure a scaled prototype with at least 8 spatial channels and 4 wavelength channels while recording whether average multiplication error stays below 5 percent at the projected higher data rates.
Figures
read the original abstract
We demonstrate an on-chip 0.96 TOPS hyperdimensional photonic tensor core by utilizing a time-spacewavelength multiplexed silicon photonic Crossbar (Xbar). The novel architecture relies on serializing the large matrix-vector or tensor-vector products by unfolding multiply and accumulation operations over time domain, while simultaneously distributing the computational workload over different spatial and wavelength channels. We experimentally demonstrate the operation of a 4-channel 2-input TSWDM Xbar that incorporates 56 GHz electroabsorption modulators (EAMs) and 4-channel integrated multiplexing stages. Its successful operation as a 4x2x1 tensorvector multiplication unit demonstrated an average error of 3.9%. Its performance as a photonic AI accelerator was also evaluated in the classification task of the Iris dataset, presenting experimental accuracies of 93.3% at data rates between 4x10 and 4x30 GBd, reaching 83.3% when the data rate increases to 4x60 GBd. Finally, we discuss the TSWDM Xbar scalability potential, revealing that the inclusion of a WDM scheme in the SDM architecture reduces the operating laser power, feasibly boosting the potential of constructing photonic accelerators with computational throughput in the POPS regime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript experimentally demonstrates a 4-channel 2-input time-space-wavelength multiplexed (TSWDM) silicon photonic crossbar as a hyperdimensional tensor core, achieving 0.96 TOPS throughput via 4x60 GBd serialization, with 3.9% average error on tensor-vector multiplications and Iris classification accuracies of 93.3% (10-30 GBd) dropping to 83.3% at 60 GBd. It discusses scalability to POPS regimes by incorporating WDM to reduce laser power requirements.
Significance. The direct hardware measurements on a fabricated 4x2x1 unit provide concrete, reproducible performance metrics (error rate and dataset accuracy) that support the small-scale tensor core operation. This strengthens the case for photonic accelerators in AI tasks if the multiplexing approach can be extended without prohibitive noise penalties.
major comments (2)
- [Scalability discussion] Scalability discussion (final section): The claim that WDM inclusion in the SDM architecture feasibly enables POPS-regime accelerators assumes crosstalk, phase noise, and power penalties do not accumulate prohibitively beyond the demonstrated 4 WDM channels and 2 inputs. No measurements or quantitative simulations of noise scaling for larger arrays are provided, despite the observed accuracy degradation at 60 GBd indicating rate-dependent effects that would likely compound in hyperdimensional configurations requiring higher effective dimensionality.
- [Experimental results] Experimental results section: The hyperdimensional tensor core claim rests on achieving high effective dimensionality through time multiplexing plus spatial/WDM scaling, yet only the 4x2x1 unit is fabricated and tested. The manuscript does not report how the demonstrated serialization maintains tensor operation fidelity when unfolding larger matrix-vector products, which is load-bearing for extending the 0.96 TOPS result to true hyperdimensional operation.
minor comments (2)
- [Abstract] Abstract: The title states '1 TOPS' while the text reports 0.96 TOPS; include a brief note on the exact calculation (e.g., from 4x60 GBd serialization) to avoid minor inconsistency.
- [Figures] Figure clarity: Ensure all experimental traces (e.g., error vs. data rate) include error bars or multiple runs to quantify measurement repeatability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable feedback on our manuscript. We have addressed each of the major comments in detail below and will incorporate revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Scalability discussion] Scalability discussion (final section): The claim that WDM inclusion in the SDM architecture feasibly enables POPS-regime accelerators assumes crosstalk, phase noise, and power penalties do not accumulate prohibitively beyond the demonstrated 4 WDM channels and 2 inputs. No measurements or quantitative simulations of noise scaling for larger arrays are provided, despite the observed accuracy degradation at 60 GBd indicating rate-dependent effects that would likely compound in hyperdimensional configurations requiring higher effective dimensionality.
Authors: We agree that a more detailed analysis of noise scaling is necessary to support the scalability claims. In the revised version, we will add quantitative simulations of crosstalk and phase noise accumulation for arrays with up to 16 WDM channels, based on the experimental parameters measured in our 4-channel device. This will include an assessment of how the rate-dependent effects observed at 60 GBd impact larger hyperdimensional computations. revision: yes
-
Referee: [Experimental results] Experimental results section: The hyperdimensional tensor core claim rests on achieving high effective dimensionality through time multiplexing plus spatial/WDM scaling, yet only the 4x2x1 unit is fabricated and tested. The manuscript does not report how the demonstrated serialization maintains tensor operation fidelity when unfolding larger matrix-vector products, which is load-bearing for extending the 0.96 TOPS result to true hyperdimensional operation.
Authors: The experimental demonstration focuses on the fundamental 4x2x1 tensor-vector multiplication unit, which validates the TSWDM approach. The serialization unfolds the larger operations over time, and the measured 3.9% average error confirms the fidelity of individual multiply-accumulate steps. For larger products, the overall error would accumulate based on the number of operations, but the per-step fidelity remains as demonstrated. We will revise the manuscript to include a detailed explanation of the unfolding process and an analysis of error propagation for larger hyperdimensional vectors. revision: partial
Circularity Check
No circularity: experimental metrics obtained from direct hardware measurements on fabricated 4x2x1 TSWDM crossbar
full rationale
The manuscript reports measured throughput (0.96 TOPS), average error (3.9%), and classification accuracy (93.3% at 10-30 GBd, 83.3% at 60 GBd) on a physically realized 4-channel 2-input silicon photonic device incorporating 56 GHz EAMs and integrated WDM stages. These figures are obtained by direct experimental characterization rather than by any derivation, fitting procedure, or self-referential equation that reduces to its own inputs. The scalability discussion (WDM reducing laser power for POPS potential) is qualitative and does not invoke fitted parameters, self-citations, or uniqueness theorems that would force the reported results. No load-bearing ansatz, renaming of known results, or self-definitional steps appear in the presented chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- data rate
axioms (1)
- domain assumption Silicon photonic components can be monolithically integrated with electroabsorption modulators and wavelength multiplexers at the stated speeds.
Reference graph
Works this paper leans on
-
[1]
Photonic multiplexing techniques for neuromorphic computing
doi: 10.1016/j.joule.2023.09.004 [2]. Y. Bai et al., "Photonic multiplexing techniques for neuromorphic computing" Nanophotonics, vol. 12, no. 5, 2023, pp. 795-817. doi: 10.1515/nanoph-2022-0485 [3]. B.J. Shastri et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114, Apr
-
[2]
Roadmap on Neuromorphic Photonics,
doi: 10.1038/s41566-020-00754-y [4]. D. Brunner et al., “Roadmap on Neuromorphic Photonics,” arxiv.org,
-
[3]
Femtojoule per MAC Neuromorphic Photonics: An Energy and Technology Roadmap,
doi: 10.48550/arXiv.2501.07917 [5]. A. R. Totović, G. Dabos, N. Passalis, A. Tefas and N. Pleros, "Femtojoule per MAC Neuromorphic Photonics: An Energy and Technology Roadmap," in IEEE Journal of Selected Topics in Quantum Electronics, vol. 26, no. 5, pp. 1-15, Sept.-Oct. 2020, Art no. 8800115, doi: 10.1109/JSTQE.2020.2975579. [6]. A. Tsakyridis et al.,” ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.