pith. sign in

arxiv: 1907.09077 · v1 · pith:FCYTBQWFnew · submitted 2019-07-22 · 💻 cs.NE · cs.ET· cs.LG· eess.SP

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

Pith reviewed 2026-05-24 18:12 UTC · model grok-4.3

classification 💻 cs.NE cs.ETcs.LGeess.SP
keywords stochastic computingAQFPdeep neural networkssuperconducting logicenergy efficiencyhardware accelerationrandom number generation
0
0 comments X

The pith

The first stochastic-computing DNN acceleration framework is built on AQFP superconducting technology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an acceleration framework for deep neural networks that pairs stochastic computing with Adiabatic Quantum-Flux-Parametron superconducting circuits. It shows that AQFP's deep pipelining from AC clocking and its single-buffer true random number generation align directly with stochastic computing's use of bit sequences for approximate calculations. A sympathetic reader would care because the combination targets ultra-high energy efficiency for DNN inference, far beyond what CMOS can deliver, with direct relevance to large-scale computing and deep-space uses. The work presents itself as the initial development of such an integrated system.

Core claim

This work is the first to develop an SC-based DNN acceleration framework using AQFP technology. It leverages AQFP's deep pipelining nature since each logic gate connects to an AC clock signal and the unique opportunity of true random number generation using a single AQFP buffer, which together make AQFP especially compatible with stochastic computing's time-independent bit-sequence representation for approximate DNN computations.

What carries the argument

Stochastic computing, which represents values as time-independent bit sequences and tolerates approximate operations, paired with AQFP's AC-clocked deep pipelining and single-buffer random number generation.

If this is right

  • DNN inference achieves ultra-high energy efficiency using AQFP hardware compared with state-of-the-art CMOS.
  • Deep pipelining in AQFP circuits avoids read-after-write hazards when stochastic bit streams are used.
  • Large-scale systems with tens of thousands of Josephson junctions become practical for DNN accelerators.
  • The approach supports DNN acceleration in high-performance computing and deep-space applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same AQFP-SC pairing might apply to other approximate-computing workloads beyond neural networks.
  • Energy gains could enable complex inference on power-constrained platforms such as satellites.
  • Direct hardware prototypes would be needed to confirm simulation-based efficiency claims.

Load-bearing premise

AQFP's deep pipelining from AC clock signals and its single-buffer true random number generation make the technology especially compatible with stochastic computing for DNN inference.

What would settle it

Fabrication and energy measurement of a physical AQFP circuit running the proposed SC DNN framework that shows no substantial efficiency gain over CMOS while keeping acceptable accuracy would disprove the central advantage.

Figures

Figures reproduced from arXiv: 1907.09077 by Ao Ren, Caiwen Ding, Jie Han, Ning Liu, Nobuyuki Yoshikawa, Olivia Chen, Ruizhe Cai, Wenhui Luo, Xuehai Qian, Yanzhi Wang.

Figure 1
Figure 1. Figure 1: Junction level schematic of an AQFP buffer. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of AQFP logic gates. Data output (a) (b) Clock_in_phase 1 Clock_in_phase 2 Clock_in_phase 3 Clock_in_phase 4 Data input AQFP logic block 1 clock cycle Clock_in_phase 1 Data input Phase 1 (clock in) Phase 1 (data out) Phase 2 (clock in) Phase 2 (data out) Phase 3 (clock in) Phase 3 (data out) Phase 4 (clock in) Phase 4 (data out) Clock_in_phase 2 Clock_in_phase 3 Clock_in_phase 4 [PITH_FULL_IMAGE:f… view at source ↗
Figure 3
Figure 3. Figure 3: (a). Four phase clocking scheme for AQFP [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Feature extraction block of CMOS-based SC [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Proposed SC-based DNN architecture using [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a). 1-bit true RNG in AQFP; (b). Output dis [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: True RNG cluster consisting of N × N unit true RNGs, where each unit is shared by four N-bit random numbers. each two output random numbers only share a single bit in common. 4.2 Integration of Summation and Activation Function in CONV Layers To overcome the difficulty in accumulator implementation in AQFP, we re-formulate the operation of SC-based feature extraction block in a different aspect. The stocha… view at source ↗
Figure 9
Figure 9. Figure 9: Layout of 1-bit true RNG using AQFP. where N is the length of the stochastic stream and M is the number of inputs. The clip operation restricts the value be￾tween the given bounds. This formulation accounts for inner product (summation) and activation function. Consequently, Õ N i=1 SOi = clip Õ N i=1 Õ M j=1 SPi,j − M − 1 2 × N, 0, N  (2) That is, the total number of 1’s in the output stochastic stream,… view at source ↗
Figure 10
Figure 10. Figure 10: Example of 8-input binary bitonic sorter. [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: (a) Bitonic sorter for even-numbered in [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Activated output of the proposed feature ex [PITH_FULL_IMAGE:figures/full_fig_p007_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Categorization block implementation using [PITH_FULL_IMAGE:figures/full_fig_p008_15.png] view at source ↗
Figure 14
Figure 14. Figure 14: Proposed bitonic sorter based sub-sampling [PITH_FULL_IMAGE:figures/full_fig_p008_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: AQFP chip testing. 0’s; the output is 0 otherwise. The relative value/importance of each output can be reflected in this way, thereby ful￾filling the requirement of categorization block in FC lay￾ers.The proposed categorization logic can be realized us￾ing a simple majority chain structure. Thanks to the nature of AQFP technology, a three-input majority gate costs the same hardware resource as a two-input… view at source ↗
read the original abstract

The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to present the first stochastic-computing (SC) based DNN acceleration framework using Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology. It identifies AQFP's deep pipelining (due to AC clocking) and single-buffer true RNG as making the technology especially compatible with SC's time-independent bit-stream representation, and notes SC's prior suitability for approximate DNN computations; the work positions this combination as promising for ultra-low-energy inference in high-performance and space applications.

Significance. If a concrete framework with implementation details, hazard-resolution mechanisms, and energy/performance evaluations were provided and validated, the result would be significant as a novel bridge between superconducting logic families and stochastic computing for DNNs, potentially enabling orders-of-magnitude efficiency gains over CMOS. The manuscript as presented, however, contains no such elaboration, derivations, or results.

major comments (2)
  1. [Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.
  2. [Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.
minor comments (1)
  1. [Abstract] The abstract mentions '83,000 JJs' fabrication results from 2016 but does not cite the corresponding reference or explain its relevance to the proposed DNN framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address each major comment below. The abstract is intentionally concise, but we agree it can be strengthened to better preview the supporting material in the full manuscript. We will revise the abstract and add explicit references to the relevant sections, diagrams, and comparisons.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.

    Authors: We agree the abstract asserts the compatibility without derivation. The full manuscript derives this in Section II (AQFP characteristics) and Section III (SC compatibility): SC bit-streams are time-independent, which aligns with AQFP's AC-clocked deep pipelining and eliminates RAW hazards without additional buffering; a hazard-resolution scheme is described using the bit-stream representation itself. Figure 2 provides the timing diagram. Section II also compares the single-buffer true RNG in AQFP to CMOS RNG, which requires multiple gates or external entropy sources. We will revise the abstract to briefly reference these analyses and the sections/figure. revision: yes

  2. Referee: [Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.

    Authors: The full manuscript substantiates the claim with a high-level architecture in Section III (including Figure 1), the SC-to-AQFP mapping and DNN accelerator design in Section IV, and a Related Work section that reviews prior SC-DNN accelerators (all CMOS-based) and prior AQFP circuits (none using SC). The 'first' claim follows from the absence of any prior work combining the two. We will revise the abstract to explicitly reference these sections and the comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents an engineering framework whose central claim is novelty in combining SC with AQFP, justified by direct observational statements about compatibility (deep pipelining and single-buffer RNG) rather than any derivation chain, equations, or fitted quantities. No self-citations, ansatzes, or reductions of predictions to inputs appear in the provided text. The argument is propositional and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only; no free parameters, invented entities, or additional axioms beyond the stated compatibility of AQFP traits with SC are extractable.

axioms (1)
  • domain assumption AQFP exhibits deep pipelining and efficient true RNG via single buffer
    Invoked in abstract to justify SC compatibility

pith-pipeline@v0.9.0 · 5839 in / 998 out tokens · 15975 ms · 2026-05-24T18:12:20.490888+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    Armin Alaghi and John P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92

  2. [2]

    Armin Alaghi, Weikang Qian, and John P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515–1531

  3. [3]

    Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, and Warren J Gross. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2688–2699

  4. [4]

    Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim, Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, Trevor Mudge, Hun Seok Kim, David Blaauw, and Dennis Sylvester. 2017. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circu...

  5. [5]

    Bradley D Brown and Howard C Card. 2001. Stochastic neural compu- tation. I. Computational elements. IEEE Transactions on computers 50, 9 (2001), 891–905

  6. [6]

    Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. VIBNN: Hardware Acceleration of Bayesian Neural Networks. SIGPLAN Not. 53, 2 (March 2018), 476–488. https://doi.org/10.1145/3296957.3173212

  7. [7]

    Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269–284

  8. [8]

    Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam

  9. [9]

    In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture

    Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture. IEEE Computer Society, 609–622

  10. [10]

    Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep con- volutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138

  11. [11]

    John Clarke and Alex I Braginski. 2006. The SQUID handbook: Applica- tions of SQUIDs and SQUID systems . John Wiley & Sons

  12. [12]

    Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

  13. [13]

    Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1 A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Solid-State Circuits Conference (ISSCC), 2017 IEEE Intern...

  14. [14]

    Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104

  15. [15]

    Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference . ACM, 149–156

  16. [16]

    Brian R Gaines. 1969. Stochastic computing systems. In Advances in information systems science. Springer, 37–172

  17. [17]

    James E Gentle. 2006. Random number generation and Monte Carlo methods. Springer Science & Business Media

  18. [18]

    Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.. In FPGA. 75–84

  19. [19]

    Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd Inter- national Symposium on Computer Architecture . IEEE Press, 243–254

  20. [20]

    Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

  21. [21]

    Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Re- configurable Interconnects. In Proceedings of the Twenty-Third Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 461–475

  22. [22]

    Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann. lecun.com/exdb/mnist/

  23. [23]

    Vincent T Lee, Armin Alaghi, John P Hayes, Visvesh Sathe, and Luis Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In Proceedings of the Conference on De- sign, Automation & Test in Europe . European Design and Automation Association, 13–18

  24. [24]

    Likharev

    K. Likharev. 1977. Dynamics of some single flux quantum devices: I. Parametric quantron. IEEE Transactions on Magnetics 13, 1 (January 1977), 242–244. https://doi.org/10.1109/TMAG.1977.1059351

  25. [25]

    K. K. Likharev and V. K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (March 1991), 3–28. https://doi.org/10.1109/77.80745

  26. [26]

    Kathy J Liszka and Kenneth E Batcher. 1993. A generalized bitonic sorting network. In Parallel Processing, 1993. ICPP 1993. International Conference on, Vol. 1. IEEE, 105–108

  27. [27]

    Loe and E

    K. Loe and E. Goto. 1985. Analysis of flux input and output Josephson pair device. IEEE Transactions on Magnetics 21, 2 (March 1985), 884–887. https://doi.org/10.1109/TMAG.1985.1063734

  28. [28]

    Pierre LâĂŹEcuyer. 2012. Random number generation. In Handbook of Computational Statistics. Springer, 35–71

  29. [29]

    Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A 11 unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on . IEEE, 14–26

  30. [30]

    Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst

  31. [31]

    In Solid-State Circuits Conference (ISSCC), 2017 IEEE International

    14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic- voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 246–247

  32. [32]

    Shuichi Nagasawa, Yoshihito Hashimoto, Hideaki Numata, and Shuichi Tahara. 1995. A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield. IEEE Transactions on Applied Superconductivity 5, 2 (1995), 2447–2452

  33. [33]

    Narama, F

    T. Narama, F. China, N. Takeuchi, T. Ortlepp, Y. Yamanashi, and N. Yoshikawa. 2016. Yield evaluation of 83k-junction adiabatic-quantum- flux-parametron circuit. In 2016 Appl. Superconductivity Conference (ASC2016)

  34. [34]

    Harald Niederreiter. 1992. Random number generation and quasi-Monte Carlo methods. Vol. 63. Siam

  35. [35]

    Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . ACM, 26–35

  36. [36]

    Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu- Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267–278

  37. [37]

    Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017), 405–418

  38. [38]

    Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, and Bo Yuan. 2016. Designing reconfigurable large-scale deep learning systems using stochastic com- puting. In Rebooting Computing (ICRC), IEEE International Conference on. IEEE, 1–7

  39. [39]

    Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh

  40. [40]

    In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on

    From high-level deep neural models to FPGAs. In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

  41. [41]

    Hyeonuk Sim and Jongeun Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Proceedings of the 54th Annual Design Automation Conference 2017 . ACM, 29

  42. [42]

    Jaehyeong Sim, Jun-Seok Park, Minhye Kim, Dongmyung Bae, Yeong- jae Choi, and Lee-Sup Kim. 2016. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International . IEEE, 264–265

  43. [43]

    Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, and Tao Li. 2018. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems. In High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on . IEEE, 92–103

  44. [44]

    Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolu- tional neural networks. In Proceedings of the 2016 ACM/SIGDA Interna- tional Symposium on Field-Programmable Gate Arrays . ACM, 16–25

  45. [45]

    Naoki Takeuchi, Shuichi Nagasawa, Fumihiro China, Takumi Ando, Mutsuo Hidaka, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2017. Adia- batic quantum-flux-parametron cell library designed using a 10 kA cm- 2 niobium fabrication process. Superconductor Science and Technology 30, 3 (2017), 035002

  46. [46]

    Naoki Takeuchi, Dan Ozawa, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. An adiabatic quantum flux parametron as an ultra- low-power logic device. Superconductor Science and Technology 26, 3 (2013), 035010

  47. [47]

    Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. Measurement of 10 zJ energy dissipation of adiabatic quantum-flux- parametron logic using a superconducting resonator. Applied Physics Letters 102, 5 (2013), 052602

  48. [48]

    Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2014. Energy efficiency of adiabatic superconductor logic. Superconductor Science and Technology 28, 1 (nov 2014), 015003. https://doi.org/10. 1088/0953-2048/28/1/015003

  49. [49]

    Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2015. Adiabatic quantum-flux-parametron cell library adopting minimalist design. Journal of Applied Physics 117, 17 (2015), 173912

  50. [50]

    Sergey K Tolpygo, Vladimir Bolkhovsky, Terence J Weir, Alex Wynn, Daniel E Oates, Leonard M Johnson, and Mark A Gouker. 2016. Ad- vanced fabrication processes for superconducting very large-scale integrated circuits. IEEE Transactions on Applied Superconductivity 26, 3 (2016), 1–10

  51. [51]

    Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays. ACM, 65–74

  52. [52]

    Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evalu- ating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Arc...

  53. [53]

    Yanzhi Wang, Zheng Zhan, Jiayu Li, Jian Tang, Bo Yuan, Liang Zhao, Wujie Wen, Siyue Wang, and Xue Lin. 2018. On the Universal Approximation Property and Equivalence of Stochastic Computing- based Neural Networks and Binary Neural Networks. arXiv preprint arXiv:1803.05391 (2018)

  54. [54]

    Paul N Whatmough, Sae Kyu Lee, Hyunkwang Lee, Saketh Rama, David Brooks, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for IoT applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International . IEEE, 242–243

  55. [55]

    Dongbin Xiu. 2010. Numerical methods for stochastic computations: a spectral method approach. Princeton university press

  56. [56]

    Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and accel- eration for deep convolutional neural networks. In Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on . IEEE, 1–8

  57. [57]

    Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Ja- son Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Sym- posium on Low Power Electronics and Design . ACM, 326–331

  58. [58]

    Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani B Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Ac- celerating Binarized Convolutional Neural Networks with Software- Programmable FPGAs.. In FPGA. 15–24. 12