Learning Compact Boolean Networks

Martin Vechev; Shengpu Wang; Yani Zhang; Yuhao Mao

arxiv: 2602.05830 · v2 · submitted 2026-02-05 · 💻 cs.AI · cs.LG

Learning Compact Boolean Networks

Shengpu Wang , Yuhao Mao , Yani Zhang , Martin Vechev This is my paper

Pith reviewed 2026-05-16 06:55 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords Boolean networkscompact architecturesadaptive discretizationvision benchmarksFPGA inferenceneural network compressiondiscrete optimization

0 comments

The pith

A training method for Boolean networks achieves higher accuracy than prior work while using up to 47 times fewer Boolean operations on vision benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to train compact Boolean networks that deliver strong accuracy for resource-constrained hardware where only Boolean operations are allowed. It combines a parameter-free strategy for selecting connections, a compact convolutional architecture that exploits spatial locality with fewer operations than standard kernels, and an adaptive discretization step that limits the accuracy loss when moving from a relaxed continuous network to a discrete Boolean one. These elements together shift the accuracy-versus-cost trade-off, letting Boolean networks reach or surpass earlier results at far lower operation counts. A reader would care because Boolean networks support nanosecond-scale inference on FPGAs and similar platforms, which matters for edge devices and real-time systems. The gains hold on standard vision benchmarks and extend to other data modalities.

Core claim

The authors show that a parameter-free connection-learning strategy, a compact convolutional Boolean architecture, and an adaptive discretization procedure together produce Boolean networks that improve the Pareto frontier over prior state-of-the-art methods, delivering higher accuracy at up to 47 times fewer Boolean operations on vision benchmarks; the same models also yield a 7 times smaller circuit on an FPGA while reaching 99.38 percent accuracy on MNIST at 6.48 nanoseconds latency.

What carries the argument

The adaptive discretization procedure that converts a continuously relaxed network into a discrete Boolean network while limiting accuracy loss.

If this is right

Boolean networks become practical for nanosecond-scale inference on FPGAs and similar hardware.
The resulting models produce substantially smaller circuits than earlier Boolean approaches.
The accuracy-cost trade-off improves across vision tasks and extends to other data modalities.
Resource-constrained deployments can use Boolean networks without sacrificing much accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The low-latency FPGA results point toward use in high-speed embedded control or signal-processing loops.
A parameter-free connection strategy may simplify training pipelines for other discrete or combinatorial network families.
The compact architecture could be combined with existing quantization or pruning tools to further reduce hardware footprint.

Load-bearing premise

The adaptive discretization step reliably shrinks the accuracy drop from continuous relaxation to discrete Boolean form without large biases or dataset-specific tuning.

What would settle it

A standard vision benchmark in which the adaptive discretization causes a large accuracy drop relative to the relaxed network or to prior Boolean methods.

read the original abstract

Floating-point neural networks dominate modern machine learning but incur substantial inference costs, motivating emerging interest in Boolean networks for resource-constrained deployments. Since Boolean networks use only Boolean operations, they can achieve nanosecond-scale inference latency. However, learning Boolean networks that are both compact and accurate remains challenging because of their discrete, combinatorial structure. In this work we address this challenge via three novel, complementary contributions: (i) a new parameter-free strategy for learning effective connections, (ii) a novel compact convolutional Boolean architecture that exploits spatial locality while requiring fewer Boolean operations than existing convolutional kernels, and (iii) an adaptive discretization procedure that reduces the accuracy drop incurred when converting a continuously relaxed network into a discrete Boolean network. Across standard vision benchmarks, our method improves the Pareto frontier over prior state-of-the-art methods, achieving higher accuracy with up to $47\times$ fewer Boolean operations. This advantage also extends to other modalities. Further, on an FPGA, our model on MNIST achieves 99.38\% accuracy with 6.48 ns latency, surpassing the prior state-of-the-art in both accuracy and runtime, while generating a $7\times$ smaller circuit. Code and models are available at https://github.com/eth-sri/CompactLogic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces three contributions for learning compact Boolean networks: (i) a parameter-free strategy for learning connections, (ii) a novel compact convolutional Boolean architecture exploiting spatial locality, and (iii) an adaptive discretization procedure to minimize accuracy loss when converting relaxed networks to discrete Boolean form. It reports empirical improvements on vision benchmarks, advancing the Pareto frontier with higher accuracy at up to 47× fewer Boolean operations, plus FPGA results on MNIST achieving 99.38% accuracy at 6.48 ns latency with a 7× smaller circuit. Code and models are released.

Significance. If the empirical Pareto gains and FPGA measurements hold under scrutiny, the work offers a practical advance for resource-constrained inference by enabling nanosecond-latency Boolean networks that outperform prior art in both accuracy and operation count. The open release of code strengthens the contribution by supporting direct reproduction and extension.

major comments (2)

[§4] §4 (Experimental results): The reported accuracy and operation-count improvements lack error bars, standard deviations, or results across multiple random seeds and data splits, which is necessary to establish that the Pareto-frontier gains are robust rather than artifacts of a single run.
[§3.3] §3.3 (Adaptive discretization): The procedure is presented as consistently reducing accuracy drop without dataset-specific tuning or large biases, yet no ablation isolating its contribution (e.g., comparison with fixed-threshold discretization) is provided to confirm it is load-bearing for the claimed gains.

minor comments (2)

[Abstract] The abstract states 'up to 47× fewer Boolean operations' without specifying the exact model, dataset, and baseline pair; the main text should make this correspondence explicit.
[Figures] Figure captions and legends would benefit from explicit mention of the number of runs or seeds used for each plotted point.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation for minor revision. We address the major comments below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses

Referee: [§4] §4 (Experimental results): The reported accuracy and operation-count improvements lack error bars, standard deviations, or results across multiple random seeds and data splits, which is necessary to establish that the Pareto-frontier gains are robust rather than artifacts of a single run.

Authors: We agree that providing error bars and results from multiple runs would strengthen the evidence for the robustness of our Pareto-frontier improvements. In the revised version of the manuscript, we will include experiments run with multiple random seeds (at least 3-5) and report mean accuracy and operation counts with standard deviations and error bars where appropriate. This will demonstrate that the gains are not due to a single favorable run. revision: yes
Referee: [§3.3] §3.3 (Adaptive discretization): The procedure is presented as consistently reducing accuracy drop without dataset-specific tuning or large biases, yet no ablation isolating its contribution (e.g., comparison with fixed-threshold discretization) is provided to confirm it is load-bearing for the claimed gains.

Authors: We recognize the value of an ablation study to isolate the impact of the adaptive discretization procedure. Although the benefits are reflected in the end-to-end performance improvements, we will add a dedicated ablation in the revised manuscript comparing our adaptive discretization against a fixed-threshold baseline across the benchmarks. This will confirm its contribution to minimizing accuracy loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical ML contribution whose central claims are measured accuracy and operation-count improvements on vision benchmarks, supported by released code and models. No derivation chain exists that reduces a claimed prediction or first-principles result to a quantity defined solely by the paper's own fitted parameters, self-citations, or ansatzes. The three listed contributions (parameter-free connection learning, compact conv architecture, adaptive discretization) are presented as algorithmic procedures whose performance is externally validated rather than internally forced by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of the three proposed techniques. The parameter-free strategy implies few or no additional fitted hyperparameters for connection learning. No new physical entities are postulated.

axioms (1)

domain assumption Boolean operations can approximate the computations performed by floating-point neural networks with acceptable accuracy loss.
Implicit in the motivation and discretization step described in the abstract.

pith-pipeline@v0.9.0 · 5513 in / 1157 out tokens · 27867 ms · 2026-05-16T06:55:17.964908+00:00 · methodology

Learning Compact Boolean Networks

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)