A 65 nm Multi-Modal Bayesian Inference Engine with 16.3 fJ/Sample Calibration-Free GRNG for Risk-Aware At-Home Skin Lesion Screening

Boyang Cheng; Danny Z. Chen; Jianbo Liu; Likai Pei; Ningyuan Cao; Steven Davis; Xueji Zhao; Zephan M. Enciso

arxiv: 2606.07439 · v2 · pith:RZL5SGUAnew · submitted 2026-06-05 · 💻 cs.AR

A 65 nm Multi-Modal Bayesian Inference Engine with 16.3 fJ/Sample Calibration-Free GRNG for Risk-Aware At-Home Skin Lesion Screening

Steven Davis , Likai Pei , Jianbo Liu , Zephan M. Enciso , Boyang Cheng , Xueji Zhao , Danny Z. Chen , Ningyuan Cao This is my paper

Pith reviewed 2026-06-27 20:16 UTC · model grok-4.3

classification 💻 cs.AR

keywords Bayesian neural networkscompute-in-memoryGaussian random number generatorskin lesion screeningedge AIprocess variationmultimodal uncertainty65 nm CMOS

0 comments

The pith

A 65 nm compute-in-memory chip performs in-word Mixture-of-Gaussian sampling to model uncertainty more expressively than unimodal Bayesian neural networks for at-home skin lesion screening.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a 65 nm hardware engine for fully on-device, privacy-preserving skin lesion screening that operates under variable at-home conditions. It introduces a compute-in-memory architecture that executes in-word Mixture-of-Gaussian sampling, enabling multi-modal uncertainty modeling beyond the single-Gaussian approach of conventional Bayesian neural networks. This change produces concrete gains: 1.4 times greater equal-risk operating coverage, more than 1.5 times better robustness to user-data perturbations, 5.5 times higher resilience to process variations, and 1.8 percent higher balanced accuracy. The design is completed by a calibration-free Gaussian random-number generator that exploits complementary process variation to reach 16.3 fJ per sample.

Core claim

The 65-nm multimodal Bayesian inference engine uses in-word Mixture-of-Gaussian sampling inside its compute-in-memory architecture. This improves uncertainty modeling beyond conventional unimodal Bayesian neural networks, increasing equal-risk operating coverage by 1.4x, improving robustness to user-data perturbations by >1.5x, enhancing process-variation resilience by 5.5x, and improving balanced accuracy by 1.8%. Hardware support comes from a calibration-free Gaussian random-number generator using complementary process variation at 16.3 fJ/sample and 168.6 GSa/s/mm² efficiency.

What carries the argument

in-word Mixture-of-Gaussian sampling performed by the compute-in-memory architecture, which enables multi-modal uncertainty modeling and is supported by a calibration-free process-variation Gaussian random-number generator.

If this is right

Equal-risk operating coverage increases by 1.4x over unimodal Bayesian neural networks.
Robustness to user-data perturbations improves by more than 1.5x.
Process-variation resilience is enhanced by 5.5x.
Balanced accuracy improves by 1.8%.
The architecture supplies an energy-efficient, privacy-preserving edge-AI solution for medical screening.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The multi-modal sampling technique could extend to other on-device medical imaging tasks where better uncertainty handling reduces over- or under-diagnosis in variable environments.
Using process variation directly for random numbers may enable similar low-energy stochastic hardware in additional sensor-based AI applications.
Wider adoption of such risk-aware engines might lower unnecessary clinical referrals by flagging only high-uncertainty cases for further review.

Load-bearing premise

The reported performance gains assume that the in-word Mixture-of-Gaussian sampling and complementary process-variation GRNG are correctly realized in silicon and that the measured improvements reflect actual hardware behavior under at-home conditions rather than simulation or post-selection.

What would settle it

Fabricate the 65 nm chip, run it on real skin lesion images captured in uncontrolled home lighting and user conditions, and directly compare its equal-risk coverage and accuracy against a unimodal Bayesian network implemented on the same silicon.

Figures

Figures reproduced from arXiv: 2606.07439 by Boyang Cheng, Danny Z. Chen, Jianbo Liu, Likai Pei, Ningyuan Cao, Steven Davis, Xueji Zhao, Zephan M. Enciso.

**Figure 2.** Figure 2: MoG Sampling requires two steps: (1) Categorically selecting one of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 5.** Figure 5: (a) Distribution selector circuit architecture with mixing ratio ( [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: (a) SOTA in-word GRNG hardware suffers from device-to-device [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: (a) This work eliminates calibration by leveraging D2D variation by [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 10.** Figure 10: At-home screening offers rapid results and enhances user privacy, [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗

**Figure 11.** Figure 11: Annotated die photo of this work’s MoG BNN hardware. [PITH_FULL_IMAGE:figures/full_fig_p006_11.png] view at source ↗

**Figure 12.** Figure 12: Area and energy breakdown of the 64×8 prototype MoG BNN tile [PITH_FULL_IMAGE:figures/full_fig_p007_12.png] view at source ↗

**Figure 14.** Figure 14: (a) GRNG time pulse samples (49 distinct points per GRNG per [PITH_FULL_IMAGE:figures/full_fig_p007_14.png] view at source ↗

**Figure 13.** Figure 13: (a) Tuning the GRNG bias voltages (VBC and VBD) impacts the sample’s latency and time-pulse (TD) standard deviation (SD). This impacts the (b) GRNG sample energy, (c) maximum system operating frequency, and (d) system’s network efficiency (TOPS/W). A. Area and Energy Breakdown [PITH_FULL_IMAGE:figures/full_fig_p007_13.png] view at source ↗

**Figure 15.** Figure 15: (a) Simulated model resilience to increasing process variation within [PITH_FULL_IMAGE:figures/full_fig_p008_15.png] view at source ↗

read the original abstract

We present a 65-nm risk-aware multimodal Bayesian inference engine for privacy-preserving, fully on-device skin lesion screening under uncontrolled at-home conditions. The proposed compute-in-memory architecture performs in-word Mixture-of-Gaussian sampling, improving uncertainty modeling beyond conventional unimodal Bayesian neural networks. This added probabilistic expressiveness increases equal-risk operating coverage by 1.4x, improves robustness to user-data perturbations by >1.5x, enhances process-variation resilience by 5.5x, and improves balanced accuracy by 1.8% over state-of-the-art unimodal Bayesian neural networks. Hardware robustness is further supported by calibration-free Gaussian random-number generation using complementary process variation, achieving 16.3 fJ/sample and 168.6 GSa/s/mm^2 efficiency. These results demonstrate a practical, energy-efficient, and risk-aware edge-AI solution for privacy-conscious medical screening.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hardware claims for multimodal BNN with process-variation GRNG are specific but rest entirely on abstract-level assertions without visible data or methods.

read the letter

The main takeaway is a 65 nm compute-in-memory design that runs multimodal Bayesian inference for on-device skin lesion screening, using in-word mixture-of-Gaussians sampling and a calibration-free GRNG drawn from complementary process variation.

What is new is the combination of multimodal sampling inside the memory array with that particular GRNG approach, plus the reported efficiency of 16.3 fJ per sample. The paper states clear numeric gains over unimodal baselines: 1.4x equal-risk coverage, >1.5x robustness to perturbations, 5.5x better process-variation resilience, and 1.8% higher balanced accuracy.

The work is straightforward about targeting privacy-preserving, energy-efficient medical screening under uncontrolled conditions, and it avoids obvious circular modeling.

The soft spot is that everything rests on the abstract. No measurement setup, chip photos, error bars, or raw results are available here, so it is impossible to judge whether the gains reflect actual silicon behavior or simulation choices. That single gap makes the central claims unverifiable for now.

This paper is aimed at hardware groups working on probabilistic edge AI and medical devices. Someone already building CIM accelerators or GRNG circuits could extract useful implementation details if the full text supports the numbers.

It deserves peer review because the application and the hardware choices are concrete enough to merit expert scrutiny of the measurements, even if revisions follow.

Referee Report

1 major / 1 minor

Summary. The manuscript describes a 65 nm compute-in-memory architecture implementing a multi-modal Bayesian inference engine for on-device skin lesion screening. It performs in-word Mixture-of-Gaussian sampling to improve uncertainty modeling over conventional unimodal Bayesian neural networks and incorporates a calibration-free GRNG based on complementary process variation. The work claims measured improvements of 1.4x in equal-risk operating coverage, >1.5x in robustness to user-data perturbations, 5.5x in process-variation resilience, and 1.8% in balanced accuracy, with the GRNG achieving 16.3 fJ/sample and 168.6 GSa/s/mm².

Significance. If the silicon measurements hold, the result would demonstrate a practical hardware path to more expressive probabilistic models at the edge, with direct relevance to privacy-preserving medical screening under uncontrolled conditions. The combination of in-word MoG sampling and process-variation GRNG is a concrete engineering contribution that could influence future risk-aware CIM designs.

major comments (1)

[Abstract] The central claims rest on measured hardware results from the 65 nm implementation (abstract). Without the methods and results sections detailing the test setup, number of dies measured, statistical analysis, and direct silicon-vs-simulation comparison for the reported 1.4x, 5.5x, and 1.8% gains, it is not possible to confirm that the improvements reflect actual fabricated behavior rather than simulation or post-selection effects.

minor comments (1)

[Abstract] The abstract states performance numbers without naming the exact unimodal BNN baselines or datasets used for the comparisons; adding these references would improve traceability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for emphasizing the need for transparent hardware validation details. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] The central claims rest on measured hardware results from the 65 nm implementation (abstract). Without the methods and results sections detailing the test setup, number of dies measured, statistical analysis, and direct silicon-vs-simulation comparison for the reported 1.4x, 5.5x, and 1.8% gains, it is not possible to confirm that the improvements reflect actual fabricated behavior rather than simulation or post-selection effects.

Authors: The referee correctly notes that the abstract's performance claims require explicit supporting evidence from the hardware characterization. While the manuscript body contains a results section describing the 65 nm test chip, probe-station setup, and measured GRNG and inference metrics, it does not currently provide the requested specifics (exact die count, statistical tests, or side-by-side silicon-simulation tables for the 1.4x/5.5x/1.8% figures). We therefore agree that additional detail is needed to allow independent verification and will expand the results section and add a summary table in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a hardware implementation report presenting measured silicon results from a 65 nm CIM chip. The abstract and available text contain no equations, derivations, fitted parameters, or self-citations that reduce any performance claim to an input by construction. All reported gains (1.4x coverage, >1.5x robustness, 5.5x resilience, 1.8% accuracy, 16.3 fJ/sample) are framed as empirical outcomes of the fabricated design rather than analytic predictions. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; review cannot enumerate design constants or background assumptions beyond the high-level architecture description.

pith-pipeline@v0.9.1-grok · 5724 in / 1378 out tokens · 26644 ms · 2026-06-27T20:16:47.020848+00:00 · methodology

A 65 nm Multi-Modal Bayesian Inference Engine with 16.3 fJ/Sample Calibration-Free GRNG for Risk-Aware At-Home Skin Lesion Screening

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)