pith. sign in

arxiv: 2606.25313 · v1 · pith:JCJ4PNCBnew · submitted 2026-06-24 · 💻 cs.DC · cs.AR

Programmable Probabilistic Computer with 1,000,000 p-bits

Pith reviewed 2026-06-25 20:42 UTC · model grok-4.3

classification 💻 cs.DC cs.AR
keywords p-bitsprobabilistic computingIsing modelsdistributed samplingGibbs samplingspin glassesFPGAMax-Cut
0
0 comments X

The pith

Networking FPGAs creates a programmable probabilistic computer with one million p-bits whose accuracy is controlled by a single communication-to-update frequency ratio.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to connect multiple FPGAs so they act as one large Ising sampler holding a million p-bits, each storing its couplings locally and exchanging only single-bit boundary values. It identifies the ratio eta of boundary-exchange frequency to local p-bit update frequency as the sole parameter that decides whether the partitioned system matches an unpartitioned reference. Above a topology-dependent threshold in eta the residual energy decays exactly as in a monolithic machine; below the threshold the decay remains a power law but with a smaller exponent. A cluster mean-field model reproduces the same threshold and exponent reduction, indicating the tradeoff is a general feature of any partitioned stochastic dynamics rather than an artifact of the hardware or the Edwards-Anderson instances used for testing.

Core claim

The paper claims that partitioned stochastic dynamics are governed by the single timing ratio eta = f_comm / f_p-bit. When eta exceeds a topology-dependent threshold the distributed machine produces the same residual-energy decay as a monolithic GPU reference; when eta lies below the threshold the decay continues as a power law but with a reduced exponent. The cluster mean-field model reproduces both regimes, establishing the tradeoff as a universal property of partitioned stochastic dynamics and supplying a quantitative design rule for scaling probabilistic computers past single-chip limits.

What carries the argument

The timing ratio eta = f_comm / f_p-bit that sets the boundary-state refresh rate required for partitioned Gibbs sampling to reproduce monolithic energy decay.

If this is right

  • The architecture achieves more than a trillion flips per second while keeping every coupling weight in on-chip memory.
  • The same eta rule governs performance on three-dimensional Edwards-Anderson spin glasses, Max-Cut, and Boolean satisfiability.
  • Below the threshold, increasing parallelism still produces power-law energy reduction, but at a lower rate that quantifies the accuracy-throughput tradeoff.
  • The cluster mean-field model supplies a predictive tool for choosing the required communication frequency on any new interaction graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same frequency-ratio criterion is likely to appear in any distributed Monte Carlo sampler whose updates can be partitioned by locality.
  • Hardware networks that obey the eta rule could remove central-memory bottlenecks for sampling problems orders of magnitude larger than current single-chip limits.
  • Direct numerical checks of the mean-field exponent on two-dimensional or small-world graphs would test the claimed topology dependence without new hardware.

Load-bearing premise

That the threshold behavior and reduced-exponent power law seen on three-dimensional Edwards-Anderson spin glasses and reproduced by the cluster mean-field model extend to every partitioned stochastic dynamics and to the Max-Cut and satisfiability problems shown.

What would settle it

A measurement on a different topology or problem class in which the energy-decay exponent changes discontinuously or fails to match the mean-field prediction exactly at the eta value where the model expects the transition.

read the original abstract

Probabilistic computers built from p-bits have been proposed as hardware accelerators for sampling and optimizing Ising models, but existing systems have been confined to a single chip, capped by its capacity and memory bandwidth. Here we break this limit by networking FPGAs into a single Ising machine far larger than any one device could hold, realizing a programmable probabilistic computer with one million p-bits. The machine performs Gibbs sampling at over a trillion flips per second while keeping every coupling weight in local on-chip memory. During execution, devices exchange nothing but 1-bit boundary states. This architecture exposes a question fundamental to any distributed sampler: how frequently boundary information must be refreshed for a partitioned machine to behave as an unpartitioned one. Using three-dimensional Edwards-Anderson spin glasses, we show that the answer is set by a single timing ratio, eta = f_comm/f_p-bit, of the boundary-exchange frequency to the local p-bit update frequency. Above a topology-dependent threshold, the distributed machine matches a monolithic GPU reference. Below it, residual energy still decays as a power law but with a reduced exponent, turning parallelism into a quantifiable throughput-accuracy tradeoff. A theoretical cluster mean-field model reproduces the same behavior, showing that this tradeoff is a universal property of partitioned stochastic dynamics. These results provide a programmable million-p-bit platform, demonstrated across spin glasses, Max-Cut, and Boolean satisfiability, together with a quantitative design rule for scaling probabilistic computers beyond the single-chip limit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a networked FPGA-based probabilistic computer with one million p-bits capable of Gibbs sampling at over a trillion flips per second. It introduces the ratio eta = f_comm / f_p-bit and shows through experiments on 3D Edwards-Anderson spin glasses that above a topology-dependent threshold the distributed system matches a monolithic GPU reference, while below it the energy decay follows a power law with reduced exponent. A cluster mean-field model is claimed to reproduce this and establish the behavior as universal for partitioned stochastic dynamics. The platform is also demonstrated on Max-Cut and Boolean satisfiability problems.

Significance. This work demonstrates a scalable hardware platform for probabilistic computing and provides a quantitative guideline for partitioning such systems. If the universality of the eta-dependent tradeoff holds, it offers a fundamental insight into distributed stochastic dynamics with potential applications in optimization and sampling. The experimental scale (1M p-bits) and the modeling effort are strengths, though the generality of the model remains to be fully established.

major comments (3)
  1. [Abstract and cluster mean-field model] The assertion that the observed eta-dependent behavior is a 'universal property of partitioned stochastic dynamics' relies on a cluster mean-field model that incorporates mean-field closure within partitions and specific boundary update rules. The manuscript does not provide evidence that the reduced-exponent regime survives changes to these modeling choices or for dynamics beyond the 3D Edwards-Anderson spin glasses (e.g., different update schedules or non-Ising Hamiltonians).
  2. [Results on Max-Cut and SAT] The additional demonstrations on Max-Cut and Boolean satisfiability are mentioned but not accompanied by eta-sweep experiments or quantitative comparisons to the monolithic reference, which would be necessary to test whether the same scaling behavior applies.
  3. [Experimental details] The abstract and presumably the methods/results sections lack details on error bars, data exclusion criteria, exact partitioning topology, and quantitative measures of agreement between the mean-field model and hardware data, hindering assessment of the central experimental claims.
minor comments (1)
  1. Consider adding specific numerical values for the eta threshold and the reduced exponent in the abstract for concreteness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications from the existing work and indicating revisions where appropriate to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract and cluster mean-field model] The assertion that the observed eta-dependent behavior is a 'universal property of partitioned stochastic dynamics' relies on a cluster mean-field model that incorporates mean-field closure within partitions and specific boundary update rules. The manuscript does not provide evidence that the reduced-exponent regime survives changes to these modeling choices or for dynamics beyond the 3D Edwards-Anderson spin glasses (e.g., different update schedules or non-Ising Hamiltonians).

    Authors: The cluster mean-field model is formulated with a general structure that applies mean-field closure to intra-partition dynamics and explicit boundary exchanges, independent of the specific 3D Edwards-Anderson Hamiltonian. It reproduces the experimental power-law behavior and eta threshold for the tested case, supporting the claim of universality for partitioned stochastic dynamics under the model's assumptions. We did not perform exhaustive sensitivity tests on alternative closures or non-Ising models, as the focus was on matching the hardware observations. In revision we will qualify the universality statement to explicitly note the model's assumptions and add a limitations paragraph. revision: partial

  2. Referee: [Results on Max-Cut and SAT] The additional demonstrations on Max-Cut and Boolean satisfiability are mentioned but not accompanied by eta-sweep experiments or quantitative comparisons to the monolithic reference, which would be necessary to test whether the same scaling behavior applies.

    Authors: The Max-Cut and SAT results were presented to illustrate the platform's programmability and applicability beyond spin glasses, using the same hardware configuration. The eta-sweep analysis and monolithic GPU comparisons were performed on the 3D Edwards-Anderson instances as the canonical benchmark for the scaling tradeoff. We agree that eta-sweeps on these problems would further test generality and will add a brief note on this scope limitation together with any available supporting data from the existing runs. revision: partial

  3. Referee: [Experimental details] The abstract and presumably the methods/results sections lack details on error bars, data exclusion criteria, exact partitioning topology, and quantitative measures of agreement between the mean-field model and hardware data, hindering assessment of the central experimental claims.

    Authors: We will expand the methods and results sections in revision to include: (i) error bars on all energy-decay and performance plots (computed from multiple independent runs), (ii) explicit statement that no data were excluded beyond standard convergence checks, (iii) the precise FPGA network topology and partitioning (a 4x4x4 grid of 64k-p-bit boards), and (iv) quantitative agreement metrics (e.g., fitted exponents and mean absolute deviation between model and hardware curves). These details were present in the supplementary material but will be moved to the main text. revision: yes

Circularity Check

0 steps flagged

No circularity: eta identified experimentally and reproduced by independent mean-field model

full rationale

The paper identifies the timing ratio eta experimentally from 3D Edwards-Anderson instances and shows that a separate cluster mean-field model reproduces the observed power-law exponent reduction without any indication that the model parameters are fitted to the target data or that the universality claim is defined into the model construction. No self-citations, self-definitional steps, or fitted inputs renamed as predictions appear in the provided text. The additional problem demonstrations are presented as applications rather than load-bearing derivations. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; p-bits and Ising sampling are drawn from prior literature. The mean-field model is introduced to explain the observed tradeoff.

axioms (2)
  • domain assumption Gibbs sampling on the Ising model is correctly realized by the p-bit hardware updates
    Invoked throughout the description of the probabilistic computer operation.
  • ad hoc to paper The cluster mean-field model captures the universal behavior of any partitioned stochastic dynamics
    Used to reproduce the power-law decay and threshold behavior seen in the FPGA experiments.

pith-pipeline@v0.9.1-grok · 5848 in / 1376 out tokens · 24721 ms · 2026-06-25T20:42:18.286887+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 3 linked inside Pith

  1. [1]

    Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017

    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329, 2017

  2. [2]

    Google workloads for consumer devices: Mitigating data movement bottlenecks

    Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, et al. Google workloads for consumer devices: Mitigating data movement bottlenecks. InProceedings of the twenty-third international conference on architectural support for programming languages...

  3. [3]

    King, Jack Raymond, Trevor Lanting, Richard Harris, Alex Zucca, Fabio Altomare, Andrew J

    Andrew D. King, Jack Raymond, Trevor Lanting, Richard Harris, Alex Zucca, Fabio Altomare, Andrew J. Berkley, Kelly Boothby, Sara Ejtemaee, Colin Enderud, Emile Hoskinson, Shuiyuan Huang, Eric Ladizinsky, Allison J. R. MacDonald, Gaelen Marsden, Reza Molavi, Travis Oh, Gabriel Poulin-Lamarre, Mauricio Reis, Chris Rich, Yuki Sato, Nicholas Tsai, Mark V olkm...

  4. [4]

    Beyond-classical computation in quantum simulation.Science, 388 (6743):199–204, 2025

    Andrew D King, Alberto Nocera, Marek M Rams, Jacek Dziarmaga, Roeland Wiersema, William Bernoudy, Jack Raymond, Nitin Kaushal, Niclas Heinsdorf, Richard Harris, et al. Beyond-classical computation in quantum simulation.Science, 388 (6743):199–204, 2025

  5. [5]

    How to build a quantum supercomputer: Scaling from hundreds to millions of qubits.arXiv preprint arXiv:2411.10406, 2024

    Masoud Mohseni, Artur Scherer, K Grace Johnson, Oded Wertheim, Matthew Otten, Namit Anand, Navid Anjum Aadit, Yuri Alexeev, Gilad Ben-Shach, Kirk M Bresniker, et al. How to build a quantum supercomputer: Scaling from hundreds to millions of qubits.arXiv preprint arXiv:2411.10406, 2024

  6. [6]

    100,000-spin coherent Ising machine.Science advances, 7(40): eabh0952, 2021

    Toshimori Honjo, Tomohiro Sonobe, Kensuke Inaba, Takahiro Inagaki, Takuya Ikuta, Yasuhiro Yamada, Takushi Kazama, Koji Enbutsu, Takeshi Umeki, Ryoichi Kasahara, et al. 100,000-spin coherent Ising machine.Science advances, 7(40): eabh0952, 2021

  7. [7]

    4.6 A 144Kb annealing system composed of 9×16Kb annealing processor chips with scalable chip-to-chip connections for large-scale combinatorial optimization problems

    Takashi Takemoto, Kasho Yamamoto, Chihiro Yoshimura, Masato Hayashi, Masafumi Tada, Hiroaki Saito, Mayumi Mashimo, and Masanao Yamaoka. 4.6 A 144Kb annealing system composed of 9×16Kb annealing processor chips with scalable chip-to-chip connections for large-scale combinatorial optimization problems. In2021 IEEE International Solid- State Circuits Confere...

  8. [8]

    1.1 computing’s energy problem (and what we can do about it)

    Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC), pages 10–14. IEEE, 2014

  9. [9]

    Illusion of large on-chip memory by networked computing chips for neural network inference

    Robert M Radway, Andrew Bartolo, Paul C Jolly, Zainab F Khan, Binh Q Le, Pulkit Tandon, Tony F Wu, Yunfeng Xin, Elisa Vianello, Pascal Vivet, et al. Illusion of large on-chip memory by networked computing chips for neural network inference. Nature Electronics, 4(1):71–80, 2021

  10. [10]

    The Future of Memory: Limits and Opportunities.arXiv preprint arXiv:2508.20425, 2025

    Samuel Dayo, Shuhan Liu, Peijing Li, Philip Levis, Subhasish Mitra, Thierry Tambe, David Tennenhouse, and H-S Philip Wong. The Future of Memory: Limits and Opportunities.arXiv preprint arXiv:2508.20425, 2025

  11. [11]

    AI and memory wall

    Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W Mahoney, and Kurt Keutzer. AI and memory wall. IEEE Micro, 44(3):33–39, 2024

  12. [12]

    Optimization by simulated annealing.Science, 220(4598):671–680, 1983

    Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. Optimization by simulated annealing.Science, 220(4598):671–680, 1983

  13. [13]

    Ising formulations of many NP problems.Frontiers in Physics, 2:5, 2014

    Andrew Lucas. Ising formulations of many NP problems.Frontiers in Physics, 2:5, 2014. ISSN 2296-424X

  14. [14]

    Ising machines as hardware solvers of combinatorial optimization problems.Nature Reviews Physics, 4(6):363–379, 2022

    Naeimeh Mohseni, Peter L McMahon, and Tim Byrnes. Ising machines as hardware solvers of combinatorial optimization problems.Nature Reviews Physics, 4(6):363–379, 2022

  15. [15]

    Roadmap for unconventional computing with nanotechnology

    Giovanni Finocchio, Jean Anne C Incorvia, Joseph S Friedman, Qu Yang, Anna Giordano, Julie Grollier, Hyunsoo Yang, Florin Ciubotaru, Andrii V Chumak, Azad J Naeemi, et al. Roadmap for unconventional computing with nanotechnology. Nano Futures, 8(1):012001, 2024

  16. [16]

    CMOS plus stochastic nanomagnets enabling heterogeneous computers for probabilistic inference and learning.Nature Communications, 15(1):2685, 2024

    Nihal Sanjay Singh, Keito Kobayashi, Qixuan Cao, Kemal Selcuk, Tianrui Hu, Shaila Niazi, Navid Anjum Aadit, Shun Kanai, Hideo Ohno, Shunsuke Fukami, and Kerem Y Camsari. CMOS plus stochastic nanomagnets enabling heterogeneous computers for probabilistic inference and learning.Nature Communications, 15(1):2685, 2024

  17. [17]

    Theory of spin glasses.Journal of Physics F: Metal Physics, 5(5):965, 1975

    Samuel Frederick Edwards and Phil W Anderson. Theory of spin glasses.Journal of Physics F: Metal Physics, 5(5):965, 1975

  18. [18]

    On the computational complexity of Ising spin glass models.Journal of Physics A: Mathematical and General, 15(10):3241, 1982

    Francisco Barahona. On the computational complexity of Ising spin glass models.Journal of Physics A: Mathematical and General, 15(10):3241, 1982

  19. [19]

    Infinite number of order parameters for spin-glasses.Physical Review Letters, 43(23):1754, 1979

    Giorgio Parisi. Infinite number of order parameters for spin-glasses.Physical Review Letters, 43(23):1754, 1979

  20. [20]

    World Scientific Publishing Company, 1987

    Marc M ´ezard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987

  21. [21]

    Ordered phase of short-range Ising spin-glasses.Physical review letters, 56(15):1601, 1986

    Daniel S Fisher and David A Huse. Ordered phase of short-range Ising spin-glasses.Physical review letters, 56(15):1601, 1986

  22. [22]

    Solvable model of a spin-glass.Physical review letters, 35(26):1792, 1975

    David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass.Physical review letters, 35(26):1792, 1975

  23. [23]

    Pushing the boundary of quantum advantage in hard combinatorial optimization with probabilistic computers.Nature Communications, 16(1):9193, 2025

    Shuvro Chowdhury, Navid Anjum Aadit, Andrea Grimaldi, Eleonora Raimondo, Atharva Raut, P Aaron Lott, Johan H Mentink, Marek M Rams, Federico Ricci-Tersenghi, Massimo Chiappini, et al. Pushing the boundary of quantum advantage in hard combinatorial optimization with probabilistic computers.Nature Communications, 16(1):9193, 2025

  24. [24]

    Janus: An FPGA-based system for high- performance scientific computing.Computing in Science & Engineering, 11(1):48–58, 2008

    Francesco Belletti, Maria Cotallo, Andr ´es Cruz, Luis Antonio Fernandez, Antonio Gordillo-Guerrero, Marco Guidetti, Andrea Maiorano, Filippo Mantovani, Enzo Marinari, Victor Martin-Mayor, et al. Janus: An FPGA-based system for high- performance scientific computing.Computing in Science & Engineering, 11(1):48–58, 2008. 28

  25. [25]

    Janus II: A new generation application- driven computer for spin-system simulations.Computer Physics Communications, 185(2):550–559, 2014

    Marco Baity-Jesi, Rachel A Ba ˜nos, Andres Cruz, Luis Antonio Fernandez, Jos ´e Miguel Gil-Narvi ´on, Antonio Gordillo- Guerrero, David Iniguez, Andrea Maiorano, Filippo Mantovani, Enzo Marinari, et al. Janus II: A new generation application- driven computer for spin-system simulations.Computer Physics Communications, 185(2):550–559, 2014

  26. [26]

    Highly optimized simulations on single- and multi-GPU systems of the 3D Ising spin glass model.Computer Physics Communications, 196:290–303, 2015

    Matteo Lulli, Massimo Bernaschi, and Giorgio Parisi. Highly optimized simulations on single- and multi-GPU systems of the 3D Ising spin glass model.Computer Physics Communications, 196:290–303, 2015

  27. [27]

    The QISG suite: High- performance codes for studying quantum Ising spin glasses.Computer Physics Communications, 298:109101, 2024

    Massimo Bernaschi, Isidoro Gonz ´alez-Adalid Pemart´ın, V´ıctor Mart´ın-Mayor, and Giorgio Parisi. The QISG suite: High- performance codes for studying quantum Ising spin glasses.Computer Physics Communications, 298:109101, 2024

  28. [28]

    Combinatorial optimization by simulating adiabatic bifurcations in nonlinear Hamiltonian systems.Science advances, 5(4):eaav2372, 2019

    Hayato Goto, Kosuke Tatsumura, and Alexander R Dixon. Combinatorial optimization by simulating adiabatic bifurcations in nonlinear Hamiltonian systems.Science advances, 5(4):eaav2372, 2019

  29. [29]

    High-performance combinatorial optimization based on classical mechanics.Science Advances, 7(6): eabe7953, 2021

    Hayato Goto, Kotaro Endo, Masaru Suzuki, Yoshisato Sakai, Taro Kanao, Yohei Hamakawa, Ryo Hidaka, Masaya Yamasaki, and Kosuke Tatsumura. High-performance combinatorial optimization based on classical mechanics.Science Advances, 7(6): eabe7953, 2021

  30. [30]

    Scaling out Ising machines using a multi-chip architecture for simulated bifurcation.Nature Electronics, 4(3):208–217, 2021

    Kosuke Tatsumura, Masaya Yamasaki, and Hayato Goto. Scaling out Ising machines using a multi-chip architecture for simulated bifurcation.Nature Electronics, 4(3):208–217, 2021

  31. [31]

    Efficient and scalable architecture for multiple- chip implementation of simulated bifurcation machines.IEEE Access, 12:36606–36621, 2024

    Tomoya Kashimata, Masaya Yamasaki, Ryo Hidaka, and Kosuke Tatsumura. Efficient and scalable architecture for multiple- chip implementation of simulated bifurcation machines.IEEE Access, 12:36606–36621, 2024

  32. [32]

    Increasing ising machine capacity with multi-chip architectures

    Anshujit Sharma, Richard Afoakwa, Zeljko Ignjatovic, and Michael Huang. Increasing ising machine capacity with multi-chip architectures. InProceedings of the 49th Annual International Symposium on Computer Architecture, pages 508–521, 2022

  33. [33]

    Next-generation probabilistic computing hardware with 3D MOSAICs, Illusion scale-up, and co-design.arXiv preprint arXiv:2409.11422, 2024

    Tathagata Srimani, Robert Radway, Masoud Mohseni, Kerem C ¸ amsarı, and Subhasish Mitra. Next-generation probabilistic computing hardware with 3D MOSAICs, Illusion scale-up, and co-design.arXiv preprint arXiv:2409.11422, 2024

  34. [34]

    P-bits for probabilistic spin logic.Applied Physics Reviews, 6(1): 011305, 2019

    Kerem Y Camsari, Brian M Sutton, and Supriyo Datta. P-bits for probabilistic spin logic.Applied Physics Reviews, 6(1): 011305, 2019

  35. [35]

    Stochastic p-bits for invertible logic.Physical Review X, 7(3):031014, 2017

    Kerem Yunus Camsari, Rafatul Faria, Brian M Sutton, and Supriyo Datta. Stochastic p-bits for invertible logic.Physical Review X, 7(3):031014, 2017

  36. [36]

    Probabilistic computing with p-bits.Applied Physics Letters, 119(15):150503, 2021

    Jan Kaiser and Supriyo Datta. Probabilistic computing with p-bits.Applied Physics Letters, 119(15):150503, 2021

  37. [37]

    Modular approach to spintronics.Scientific reports, 5:10571, 2015

    Kerem Yunus Camsari, Samiran Ganguly, and Supriyo Datta. Modular approach to spintronics.Scientific reports, 5:10571, 2015

  38. [38]

    Implementing p-bits with embedded MTJ.IEEE Electron Device Letters, 38(12):1767–1770, 2017

    Kerem Yunus Camsari, Sayeef Salahuddin, and Supriyo Datta. Implementing p-bits with embedded MTJ.IEEE Electron Device Letters, 38(12):1767–1770, 2017

  39. [39]

    Integer factorization using stochastic magnetic tunnel junctions.Nature, 573(7774):390–393, 2019

    William A Borders, Ahmed Z Pervaiz, Shunsuke Fukami, Kerem Y Camsari, Hideo Ohno, and Supriyo Datta. Integer factorization using stochastic magnetic tunnel junctions.Nature, 573(7774):390–393, 2019

  40. [40]

    Subnanosecond fluctuations in low-barrier nanomagnets.Physical Review Applied, 12(5):054056, 2019

    Jan Kaiser, Avinash Rustagi, Kerem Y Camsari, Jonathan Z Sun, Supriyo Datta, and Pramey Upadhyaya. Subnanosecond fluctuations in low-barrier nanomagnets.Physical Review Applied, 12(5):054056, 2019

  41. [41]

    Massively parallel probabilistic computing with sparse Ising machines.Nature Electronics, 5(7):460–468, 2022

    Navid Anjum Aadit, Andrea Grimaldi, Mario Carpentieri, Luke Theogarajan, John M Martinis, Giovanni Finocchio, and Kerem Y Camsari. Massively parallel probabilistic computing with sparse Ising machines.Nature Electronics, 5(7):460–468, 2022

  42. [42]

    All-to-all reconfigurability with sparse and higher-order Ising machines.Nature Communications, 15(1):8977, 2024

    Srijan Nikhar, Sidharth Kannan, Navid Anjum Aadit, Shuvro Chowdhury, and Kerem Y Camsari. All-to-all reconfigurability with sparse and higher-order Ising machines.Nature Communications, 15(1):8977, 2024

  43. [43]

    Statistics of the Three-Dimensional Ferromagnet (III).Journal of the Physical Society of Japan, 6(1): 31–35, 1951

    Takehiko Oguchi. Statistics of the Three-Dimensional Ferromagnet (III).Journal of the Physical Society of Japan, 6(1): 31–35, 1951

  44. [44]

    Statistical theory of superlattices.Proceedings of the Royal Society of London

    Hans A Bethe. Statistical theory of superlattices.Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences, 150(871):552–575, 1935

  45. [45]

    Cluster variation method in statistical physics and probabilistic graphical models.Journal of Physics A: Mathematical and General, 38(33):R309–R339, 2005

    Alessandro Pelizzola. Cluster variation method in statistical physics and probabilistic graphical models.Journal of Physics A: Mathematical and General, 38(33):R309–R339, 2005

  46. [46]

    Correlated cluster mean-field theory for spin systems.Physical Review B, 79(14):144427, 2009

    Daisuke Yamamoto. Correlated cluster mean-field theory for spin systems.Physical Review B, 79(14):144427, 2009

  47. [47]

    Xing, Michael I

    Eric P. Xing, Michael I. Jordan, and Stuart Russell. A generalized mean field algorithm for variational inference in exponential families.arXiv preprint arXiv:1212.2512, 2012

  48. [48]

    George Karypis and Vipin Kumar. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices.University of Minnesota, Department of Computer Science and Engineering, Army HPC Research Center, Minneapolis, MN, 38:7–1, 1998

  49. [49]

    KaHIP v3.00–Karlsruhe High Quality Partitioning–User Guide.arXiv preprint arXiv:1311.1714, 2013

    Peter Sanders and Christian Schulz. KaHIP v3.00–Karlsruhe High Quality Partitioning–User Guide.arXiv preprint arXiv:1311.1714, 2013

  50. [50]

    AMD Versal Premium VP1902 Adaptive SoC, 2023

    AMD. AMD Versal Premium VP1902 Adaptive SoC, 2023. Available at https://www.amd.com/en/products/ adaptive-socs-and-fpgas/versal/premium-series/vp1902.html

  51. [51]

    Parallel random numbers: as easy as 1, 2, 3

    John K Salmon, Mark A Moraes, Ron O Dror, and David E Shaw. Parallel random numbers: as easy as 1, 2, 3. InProceedings of 2011 international conference for high performance computing, networking, storage and analysis, pages 1–12, 2011

  52. [52]

    Gset: A benchmark library for graph partitioning and Max-Cut, 2003

    Yinyu Ye et al. Gset: A benchmark library for graph partitioning and Max-Cut, 2003. Available at https://web.stanford.edu/ ∼yyye/yyye/Gset/

  53. [53]

    Performance report of heuristic algorithm that cracked the largest Gset Ising problems (G81 cut = 14060)

    Kenneth M Zick. Performance report of heuristic algorithm that cracked the largest Gset Ising problems (G81 cut = 14060). arXiv preprint arXiv:2505.18508, 2025

  54. [54]

    Cosm: Collective Switched Motion for Fast and Accurate Sparse 29 Ising Optimization.arXiv preprint arXiv:2605.30355, 2026

    Kenneth M Zick, Nikhil Shukla, and Alexander Marakov. Cosm: Collective Switched Motion for Fast and Accurate Sparse 29 Ising Optimization.arXiv preprint arXiv:2605.30355, 2026

  55. [55]

    Navid Anjum Aadit, Andrea Grimaldi, Giovanni Finocchio, and Kerem Y . Camsari. Physics-inspired Ising Computing with Ring Oscillator Activated p-bits. In2022 IEEE 22nd International Conference on Nanotechnology (NANO), pages 393–396, 2022

  56. [56]

    Equation planting: a tool for benchmarking Ising machines.Physical Review Applied, 12(1):011003, 2019

    Itay Hen. Equation planting: a tool for benchmarking Ising machines.Physical Review Applied, 12(1):011003, 2019

  57. [57]

    Analytic and algorithmic solution of random satisfiability problems

    Marc M ´ezard, Giorgio Parisi, and Riccardo Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002

  58. [58]

    3SAT on an all-to-all-connected CMOS Ising solver chip.Scientific reports, 14(1):10757, 2024

    H ¨usrev Cılasun, Ziqing Zeng, Ramprasath S, Abhimanyu Kumar, Hao Lo, William Cho, William Moy, Chris H Kim, Ulya R Karpuzcu, and Sachin S Sapatnekar. 3SAT on an all-to-all-connected CMOS Ising solver chip.Scientific reports, 14(1):10757, 2024

  59. [59]

    An 8K-Spin Ising Machine IC with Reconfigurable Many-Body Spin Interactions and Limitless Multichip Extension

    Jaeyeong Kim and Jae-Yoon Sim. An 8K-Spin Ising Machine IC with Reconfigurable Many-Body Spin Interactions and Limitless Multichip Extension. In2025 Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pages 1–3. IEEE, 2025

  60. [60]

    Analyzing Hogwild parallel Gaussian Gibbs sampling.Advances in neural information processing systems, 26:2715–2723, 2013

    Matthew J Johnson, James Saunderson, and Alan Willsky. Analyzing Hogwild parallel Gaussian Gibbs sampling.Advances in neural information processing systems, 26:2715–2723, 2013

  61. [61]

    ASAP7: A 7-nm finFET predictive process design kit.Microelectronics Journal, 53:105– 115, 2016

    Lawrence T Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. ASAP7: A 7-nm finFET predictive process design kit.Microelectronics Journal, 53:105– 115, 2016

  62. [62]

    Universal Chiplet Interconnect Express (UCIe) Specification.UCIe Consortium, Technical Specification

    UCIe Consortium et al. Universal Chiplet Interconnect Express (UCIe) Specification.UCIe Consortium, Technical Specification. July, 2022

  63. [63]

    Bunch of wires: An open die-to-die interface

    Shahab Ardalan, Halil Cirit, Ramin Farjad, Mark Kuemerle, Ken Poulton, Suresh Subramanian, and Bapiraju Vinnakota. Bunch of wires: An open die-to-die interface. In2020 IEEE Symposium on High-Performance Interconnects (HOTI), pages 9–16. IEEE, 2020

  64. [64]

    Foundry Monolithic 3D Unlocks Large Throughput Benefits: 3D Memory with Tucked Sense Amplifiers + Logic using Heterogeneous Silicon CMOS + Resistive RAM + Carbon Nanotube FETs

    S Choi, A Raut, T Wu, S Dayo, A Bechdolt, G Dutta, S Li, DT Rich, RH Yang, AC Yu, et al. Foundry Monolithic 3D Unlocks Large Throughput Benefits: 3D Memory with Tucked Sense Amplifiers + Logic using Heterogeneous Silicon CMOS + Resistive RAM + Carbon Nanotube FETs. In2025 IEEE International Electron Devices Meeting (IEDM), pages 1–4. IEEE, 2025

  65. [65]

    NVIDIA Blackwell Platform: Advancing Generative AI and Accelerated Computing

    Ajay Tirumala and Raymond Wong. NVIDIA Blackwell Platform: Advancing Generative AI and Accelerated Computing. In 2024 IEEE Hot Chips 36 Symposium (HCS), pages 1–33. IEEE Computer Society, 2024

  66. [66]

    DAC-Free p-bits: Asynchronous Self-Coloring and On-Chip Annealing

    Kemal Selcuk, Navid Anjum Aadit, Corentin Delacour, Jared Quintana Silva, Nihal Sanjay Singh, Haruna Kaneko, Shun Kanai, Yu-Jui Wu, Yi-Hsuan Chen, Yu-Sheng Chen, et al. DAC-Free p-bits: Asynchronous Self-Coloring and On-Chip Annealing. In2025 IEEE International Electron Devices Meeting (IEDM), pages 1–4. IEEE, 2025

  67. [67]

    An integrated-circuit-based probabilistic computer that uses voltage- controlled magnetic tunnel junctions as its entropy source.Nature Electronics, 8(9):784–793, 2025

    Christian Duffee, Jordan Athas, Yixin Shao, Noraica Davila Melendez, Eleonora Raimondo, Jordan A Katine, Kerem Y Camsari, Giovanni Finocchio, and Pedram Khalili Amiri. An integrated-circuit-based probabilistic computer that uses voltage- controlled magnetic tunnel junctions as its entropy source.Nature Electronics, 8(9):784–793, 2025

  68. [68]

    Experimental demonstration of an on-chip p-bit core based on stochastic magnetic tunnel junctions and 2D MoS2 transistors.Nature Communications, 15(1):4098, 2024

    John Daniel, Zheng Sun, Xuejian Zhang, Yuanqiu Tan, Neil Dilley, Zhihong Chen, and Joerg Appenzeller. Experimental demonstration of an on-chip p-bit core based on stochastic magnetic tunnel junctions and 2D MoS2 transistors.Nature Communications, 15(1):4098, 2024

  69. [69]

    Andrea Grimaldi, Luis S ´anchez-Tejerina, Navid Anjum Aadit, Stefano Chiappini, Mario Carpentieri, Kerem Camsari, and Giovanni Finocchio. Spintronics-compatible Approach to Solving Maximum-Satisfiability Problems with Probabilistic Computing, Invertible Logic, and Parallel Tempering.Physical Review Applied, 17:024052, Feb 2022

  70. [70]

    L’hypoth`ese du champ mol´eculaire et la propri´et´e ferromagn´etique.Journal de Physique Th´eorique et Appliqu´ee, 6(1):661–690, 1907

    Pierre Weiss. L’hypoth`ese du champ mol´eculaire et la propri´et´e ferromagn´etique.Journal de Physique Th´eorique et Appliqu´ee, 6(1):661–690, 1907

  71. [71]

    Quantum correlated cluster mean-field theory applied to the transverse Ising model.Physical Review E, 93(6):062116, 2016

    FM Zimmer, M Schmidt, and Jonas Maziero. Quantum correlated cluster mean-field theory applied to the transverse Ising model.Physical Review E, 93(6):062116, 2016

  72. [72]

    Accelerating Adaptive Parallel Tempering with FPGA-based p-bits

    Navid Anjum Aadit, Masoud Mohseni, and Kerem Y Camsari. Accelerating Adaptive Parallel Tempering with FPGA-based p-bits. In2023 VLSI Technology and Circuits Symposium, pages 1–2. IEEE, 2023

  73. [73]

    Algorithm portfolios and teams in parallel optimization

    V olodymyr P Shylo and Oleg V Shylo. Algorithm portfolios and teams in parallel optimization. InOptimization Methods and Applications: In Honor of Ivan V . Sergienko’s 80th Birthday, pages 481–493. Springer, 2017

  74. [74]

    Teams of global equilibrium search algorithms for solving the weighted maximum cut problem in parallel.Cybernetics and Systems Analysis, 51(1):16–24, 2015

    VP Shylo, Fred Glover, and IV Sergienko. Teams of global equilibrium search algorithms for solving the weighted maximum cut problem in parallel.Cybernetics and Systems Analysis, 51(1):16–24, 2015

  75. [75]

    CNFgen: A generator of crafted benchmarks

    Massimo Lauria, Jan Elffers, Jakob Nordstr ¨om, and Marc Vinyals. CNFgen: A generator of crafted benchmarks. In International Conference on Theory and Applications of Satisfiability Testing, pages 464–473. Springer, 2017

  76. [76]

    In-datacenter performance analysis of a tensor processing unit

    Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In-datacenter performance analysis of a tensor processing unit. InProceedings of the 44th annual international symposium on computer architecture, pages 1–12, 2017

  77. [77]

    NVIDIA Tesla P100: The Most Advanced Datacenter Accelerator Ever Built

    NVIDIA Corporation. NVIDIA Tesla P100: The Most Advanced Datacenter Accelerator Ever Built. Technical Report WP- 08019-001 v01.1, NVIDIA Corporation, 2016

  78. [78]

    Cerebras architecture deep dive: First look inside the hardware/software co-design for deep learning.IEEE Micro, 43(3):18–30, 2023

    Sean Lie. Cerebras architecture deep dive: First look inside the hardware/software co-design for deep learning.IEEE Micro, 43(3):18–30, 2023

  79. [79]

    Simba: Scaling deep-learning inference with multi-chip-module-based architecture

    Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia 30 Klinefelter, Nathaniel Pinckney, Priyanka Raina, et al. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. InProceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, pages 14–27, 2019

  80. [80]

    Kartik Prabhu, Albert Gural, Zainab F Khan, Robert M Radway, Massimo Giordano, Kalhan Koul, Rohan Doshi, John W Kustin, Timothy Liu, Gregorio B Lopes, et al. CHIMERA: A 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-MByte on-chip foundry resistive RAM for efficient training and inference.IEEE Journal of Solid-State Circuits, 57(4):1013–1026, 2022