The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions

Gorgi Pavlov

arxiv: 2605.01637 · v1 · submitted 2026-05-02 · 💻 cs.LG · cs.CC· cs.DM· math.CO

The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions

Gorgi Pavlov This is my paper

Pith reviewed 2026-05-09 14:14 UTC · model grok-4.3

classification 💻 cs.LG cs.CCcs.DMmath.CO

keywords Banach-Butterfly InvariantWalsh-Hadamard butterflyBoolean functionscoordinate influencesSchur-convexityternary threshold functionsminimum supportFourier analysis

0 comments

The pith

Influence-dependent exponents on Walsh-Hadamard butterfly layers produce a strictly Schur-convex contraction invariant μ(f) for Boolean functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs the Banach-Butterfly Invariant by adapting the exponents in each layer of the Walsh-Hadamard butterfly factorization according to the coordinate influences of a Boolean function. This yields the explicit product formula μ(f) = ∏ 2^{-Inf_ℓ(f)/(1+Inf_ℓ(f))}, which is shown to be strictly Schur-convex in the influence vector and to obey the Jensen lower bound log₂μ(f) ≥ -I(f)/(1 + I(f)/n). Exhaustive enumeration at n=4 and sampling at n=5 demonstrates that μ separates functions sharing the same total influence while exhibiting a correlation with minimum support size that reverses direction between these two dimensions.

Core claim

The Banach-Butterfly Invariant μ(f) arises from assigning exponent p_ℓ = 1 + Inf_ℓ(f) to each layer ℓ of the Walsh-Hadamard butterfly factorization, producing the contraction invariant μ(f) = ∏_ℓ 2^{-Inf_ℓ(f)/(1+Inf_ℓ(f))}. This quantity is strictly Schur-convex in the influence vector (up to permutation), satisfies the Jensen lower bound on its base-2 logarithm, is algebraic in the Fourier coefficients, and separates 122 pairs of functions at n=3 that share identical total influence. It also displays distinct asymptotic scaling: ∼2^{-n/2} for parity, 2^{-Θ(√n)} for majority, and 2^{-1/2} for dictators.

What carries the argument

the influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization, with each layer ℓ receiving exponent p_ℓ = 1 + Inf_ℓ(f) to form the product contraction invariant μ(f)

If this is right

μ separates 122 pairs of n=3 functions that share identical total influence.
All 65,536 Boolean functions at n=4 have odd minimum support size with mean 6.42.
At fixed total influence, the conditional Spearman correlation of μ with minimum support equals +0.571 at n=4 but reverses to -0.38 at n=5.
μ obeys distinct scaling regimes: 2^{-n/2} for parity functions, 2^{-Θ(√n)} for majority, and 2^{-1/2} for dictators.
log₂μ is rational but not polynomial in the Fourier coefficients while μ itself remains algebraic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sign reversal in the μ-support correlation between n=4 and n=5 indicates that any predictive relationship between the invariant and support size is dimension-dependent rather than universal.
Because μ is algebraic, it supplies an exact, non-approximate classifier for distinguishing functions within the same total-influence class that could extend to structured enumeration tasks beyond the computed range.
The Schur-convexity and scaling classes together imply that influence concentration directly controls the value of μ, offering a potential handle for balancing influence vectors in related discrete-geometric problems.

Load-bearing premise

Assigning the exponent 1 plus the influence of each coordinate to the corresponding butterfly layer produces a contraction invariant that captures meaningful geometric properties of the Boolean function and its threshold representations.

What would settle it

A Boolean function at any n where the numerically computed μ violates the claimed Jensen lower bound log₂μ(f) ≥ -I(f)/(1 + I(f)/n) or where μ fails to be strictly Schur-convex under reordering of the influence vector.

Figures

Figures reproduced from arXiv: 2605.01637 by Gorgi Pavlov.

**Figure 1.** Figure 1: BBT contraction invariant log2 µ(f) versus number of variables n for five canonical function families. Parity decays linearly (−n/2), majority subexponentially (−Θ(√ n)), dictator is constant (−1/2), tribes grows logarithmically, and AND approaches 0. All curves match the analytical predictions of Corollary 3.9. 8.2 Cancellation Index At n = 3 with proven optimal masks, the cancellation index ρ shows Pears… view at source ↗

**Figure 2.** Figure 2: Left: Per-layer cancellation index at n = 3 (proven optimal masks). Right: Mean cancellation before and after repair at n = 4, showing repair reduces destructive butterfly interference. 8.4 Does µ(f) predict minimum support? An MILP-grounded answer The headline question for the contraction invariant is whether it correlates with the synthesis difficulty it was designed to diagnose — specifically, whether… view at source ↗

**Figure 3.** Figure 3: Distribution of algebraic degrees of µ(f) over all 256 Boolean functions at n = 3. The degree-35 peak (96 functions, 37.5%) corresponds to functions whose influence vectors produce the richest product structure. The bin at I(f) = 1.75 contains 20% of all functions and exhibits a Spearman correlation ρ = +0.571 between µ(f) and minimum support, with a mirror-image ρ = −0.564 between influence entropy and mi… view at source ↗

read the original abstract

We introduce the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization. For a Boolean function $f:\{-1,+1\}^n\to\{-1,+1\}$ with coordinate influences $\mathrm{Inf}_\ell(f)$, BBT assigns exponent $p_\ell = 1+\mathrm{Inf}_\ell(f)$ to butterfly layer $\ell$, yielding the contraction invariant $\mu(f)=\prod_\ell 2^{-\mathrm{Inf}_\ell/(1+\mathrm{Inf}_\ell)}$. We prove a Jensen lower bound $\log_2\mu(f) \ge -I(f)/(1+I(f)/n)$ and that $\mu$ is strictly Schur-convex in the influence vector (modulo permutation), giving scaling classes $\mu\sim 2^{-n/2}$ (parity), $2^{-\Theta(\sqrt{n})}$ (majority), $2^{-1/2}$ (dictators). $\log_2\mu$ is rational but not polynomial in the Fourier coefficients while $\mu$ is algebraic, and $\mu$ separates functions with identical total influence (122 pairs at $n=3$). Using the certified $n \le 4$ ternary Walsh-threshold universe from a companion synthesis manuscript as a finite testbed, we compute exact MILP minimum-support certificates for all 65,536 Boolean functions at $n=4$ (mean 6.42, max 9, all-odd by a parity argument) and on 10,000 of the 616,126 NPN-canonical representatives we enumerate at $n=5$ (matching OEIS A000370). Conditional Spearman $\rho(\mu,|\mathrm{supp}|)$ at fixed total influence is $+0.571$ in the largest stratum at $n=4$ but reverses to $-0.38$ at $n=5$ under both function-uniform and NPN-canonical sampling: $\mu$ is a valid Schur-convex concentration invariant, not a universal monotone predictor of minimum support across $n$. A companion application paper validates a real-valued WHT activation-energy proxy inspired by this theory on five pretrained LLMs at W2A16, cutting wikitext-2 perplexity by 15-58% versus vanilla auto-round; the transfer from Boolean theory to the real-valued proxy is qualitative, not formal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a new Schur-convex invariant by weighting Walsh-Hadamard butterfly layers with per-coordinate influences, proves a Jensen bound, and reports an n-dependent reversal in correlation with minimum support.

read the letter

The core idea is straightforward: assign each butterfly layer an exponent 1 plus its influence, then form the product mu(f) = product 2^{-Inf_l / (1+Inf_l)}. This gives a contraction that is strictly Schur-convex in the influence vector and satisfies the stated Jensen lower bound on log mu. The scaling classes line up with the known total-influence asymptotics for parity, majority, and dictators, which is reassuring rather than surprising once the concavity is in place. The construction itself does not appear in earlier Boolean-analysis literature, and the separation of equal-total-influence functions at n=3 is a direct consequence worth noting.

Referee Report

0 major / 3 minor

Summary. The paper claims to introduce the Banach-Butterfly Invariant μ(f) = ∏_ℓ 2^{-Inf_ℓ(f)/(1+Inf_ℓ(f))} as an influence-adaptive contraction invariant on the Walsh-Hadamard butterfly factorization of Boolean functions. It proves the Jensen lower bound log₂ μ(f) ≥ -I(f)/(1 + I(f)/n) and strict Schur-convexity of μ in the influence vector. Scaling classes are derived for parity, majority, and dictator functions. Using MILP, it computes minimum supports for all 65536 functions at n=4 and samples at n=5, reporting conditional Spearman correlations that change sign between n=4 and n=5, concluding that μ is a Schur-convex concentration invariant but not a universal predictor of minimum support. A qualitative application to LLM quantization is noted in a companion paper.

Significance. Should the results hold, this work offers a new geometric invariant for Boolean functions that is directly constructed from influences without free parameters, with proofs relying on standard concavity and majorization arguments. The explicit scaling classes match known influence behaviors, and the small-n exhaustive and sampled computations provide concrete, verifiable data including the correlation reversal. This highlights the invariant's properties as a concentration measure while cautioning against overgeneralization. The machine-verified MILP certificates for n=4 add rigor to the empirical component.

minor comments (3)

The title references 'Ternary Polynomial Threshold Functions' but the abstract and claims focus exclusively on Boolean functions; the connection to ternary representations should be clarified early in the manuscript.
The companion synthesis manuscript for the n≤4 testbed is referenced but lacks a citation; include the arXiv identifier or title for completeness.
The abstract states that log₂μ is rational but not polynomial in the Fourier coefficients; this distinction would benefit from a brief explanation or example in the main text to aid readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

Thank you for your careful review and positive recommendation for minor revision. We are glad that the referee summary correctly captures the introduction of the Banach-Butterfly Invariant μ(f), its properties such as the Jensen lower bound and strict Schur-convexity, the derived scaling classes, and the computational results on minimum polynomial threshold supports for small n, including the sign change in conditional Spearman correlations. No major comments were specified in the report, so we have no specific rebuttals. We are prepared to make any minor revisions as needed.

Circularity Check

0 steps flagged

No significant circularity; derivation applies standard inequalities to explicit definition

full rationale

The paper defines μ(f) explicitly as the product ∏_ℓ 2^{-Inf_ℓ/(1+Inf_ℓ)} from the given influences Inf_ℓ(f). The claimed Jensen lower bound log₂μ(f) ≥ -I(f)/(1+I(f)/n) is obtained by applying the standard concavity of g(x)=x/(1+x) and Jensen's inequality to the vector of influences; strict Schur-convexity likewise follows from the majorization inequality for strictly concave g. These steps are direct applications of known inequalities to the definition and do not reduce by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. Scaling classes match known total-influence asymptotics for parity/majority/dictators, and n=4/5 enumerations are presented as concrete finite checks. The derivation chain is self-contained against external mathematical facts.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced definition of the invariant together with the standard mathematical axiom of Jensen's inequality; no free parameters are fitted and the only invented entity is the invariant itself.

axioms (1)

standard math Jensen's inequality applies to the relevant convex function in the log bound derivation
Invoked for the lower bound log₂μ(f) ≥ -I(f)/(1+I(f)/n)

invented entities (1)

Banach-Butterfly Invariant μ(f) no independent evidence
purpose: To serve as a contraction invariant adapting to coordinate influences in the Walsh-Hadamard factorization
Newly defined in this work with no external falsifiable prediction provided in the abstract

pith-pipeline@v0.9.0 · 5763 in / 1600 out tokens · 88273 ms · 2026-05-09T14:14:19.319757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

G. Pavlov. Differentiable logic synthesis: spectral coefficient selection via Sinkhorn-constrained composition. Manuscript under review, 2026. Companion paper establishing the certifiedn≤4 ternary Walsh-threshold representability used as the testbed in Section 7

work page 2026
[2]

G. Pavlov. Influence-weighted spectral rotations for extreme low-bit LLM quantization. Com- panion application paper, 2026. Empirical validation of an influence-adaptive WHT pre- quantization transform; reports the LLM PPL numbers summarised in Section 8.5

work page 2026
[3]

Bergh and J

J. Bergh and J. Löfström.Interpolation Spaces: An Introduction. Springer, 1976

work page 1976
[4]

O’Donnell.Analysis of Boolean Functions

R. O’Donnell.Analysis of Boolean Functions. Cambridge University Press, 2014

work page 2014
[5]

Linial, Y

N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform, and learn- ability.Journal of the ACM, 40(3):607–620, 1993

work page 1993
[6]

Kushilevitz and Y

E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum.SIAM Journal on Computing, 22(6):1331–1348, 1993

work page 1993
[7]

R. Beigel. Perceptrons, PP, and the polynomial hierarchy.Computational Complexity, 4:339– 349, 1994

work page 1994
[8]

A. A. Sherstov. The intersection of two halfspaces has high threshold degree. InFOCS, 2009

work page 2009
[9]

Diakonikolas, P

I. Diakonikolas, P. Harsha, A. Klivans, R. Meka, P. Raghavendra, R. Servedio, and L.-Y. Tan. Bounding the average sensitivity and noise sensitivity of polynomial threshold functions. In STOC, 2010

work page 2010
[10]

J. A. Clarkson. Uniformly convex spaces.Transactions of the AMS, 40(3):396–414, 1936

work page 1936
[11]

A. W. Marshall, I. Olkin, and B. C. Arnold.Inequalities: Theory of Majorization and Its Applications. Springer, 2nd edition, 2011. 19

work page 2011
[12]

Aaronson and A

S. Aaronson and A. Wigderson. Algebrization: A new barrier in complexity theory.ACM Transactions on Computation Theory, 1(1):1–54, 2009

work page 2009
[13]

Baker, J

T. Baker, J. Gill, and R. Solovay. Relativizations of theP=?N Pquestion.SIAM Journal on Computing, 4(4):431–442, 1975

work page 1975
[14]

A. A. Razborov and S. Rudich. Natural proofs. InSTOC, pages 204–213, 1994

work page 1994
[15]

Courbariaux, I

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. InNeurIPS, 2016

work page 2016
[16]

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han. AWQ: Activation-aware weight quantization for LLM compression and acceleration. InMLSys, 2024

work page 2024
[17]

Opti- mize weight rounding via signed gradient descent for the quantization of llms,

W. Cheng, W. Zhang, H. Shen, Y. Cai, X. He, and K. Lv. Optimize weight rounding via signed gradient descent for the quantization of LLMs.arXiv preprint arXiv:2309.05516, 2023

work page arXiv 2023
[18]

C. Zhu, S. Han, H. Mao, and W. J. Dally. Trained ternary quantization. InICLR, 2017

work page 2017
[19]

K. G. Beauchamp.Applications of Walsh and Related Functions. Academic Press, 1984

work page 1984
[20]

J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. InFOCS, pages 68–80, 1988

work page 1988
[21]

Friedgut

E. Friedgut. Boolean functions with low average sensitivity depend on few coordinates.Com- binatorica, 18(1):27–35, 1998

work page 1998
[22]

Gorji, A

A. Gorji, A. Amini, and M. Kowalski. A scalable Walsh-Hadamard regularizer to overcome the low-degree spectral bias of neural networks. InUAI, 2023

work page 2023
[23]

B. K. Petersen, M. L. Larma, T. N. Mundhenk, et al. Deep differentiable logic gate networks. InNeurIPS, 2022. A Reproducibility of the Computational Universality Theorem Theorem 7.1, imported from the companion synthesis manuscript [1], is a computational theorem verified by explicit certificate. For completeness, we describe the integer verification proc...

work page 2022
[24]

ConstructH n =H ⊗n 1 withH 1 = 1 1 1−1 (integer matrix, no floating-point)

work page
[25]

Computey=H nw(integer matrix-vector product). 20

work page
[26]

Atn= 4, this is2 4 ×2 4 = 256integer multiplications per function, with65,536functions total: approximately16.8×10 6 operations, completing in under 1 second on any modern CPU

Verifysign(y i) =f i for alli∈[2 n]. Atn= 4, this is2 4 ×2 4 = 256integer multiplications per function, with65,536functions total: approximately16.8×10 6 operations, completing in under 1 second on any modern CPU. Reproduction.The complete verification can be reproduced by running: cd boolean_fourier python3 -m bbt.exhaustive_validation --n 4 --repair Thi...

work page

[1] [1]

G. Pavlov. Differentiable logic synthesis: spectral coefficient selection via Sinkhorn-constrained composition. Manuscript under review, 2026. Companion paper establishing the certifiedn≤4 ternary Walsh-threshold representability used as the testbed in Section 7

work page 2026

[2] [2]

G. Pavlov. Influence-weighted spectral rotations for extreme low-bit LLM quantization. Com- panion application paper, 2026. Empirical validation of an influence-adaptive WHT pre- quantization transform; reports the LLM PPL numbers summarised in Section 8.5

work page 2026

[3] [3]

Bergh and J

J. Bergh and J. Löfström.Interpolation Spaces: An Introduction. Springer, 1976

work page 1976

[4] [4]

O’Donnell.Analysis of Boolean Functions

R. O’Donnell.Analysis of Boolean Functions. Cambridge University Press, 2014

work page 2014

[5] [5]

Linial, Y

N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform, and learn- ability.Journal of the ACM, 40(3):607–620, 1993

work page 1993

[6] [6]

Kushilevitz and Y

E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum.SIAM Journal on Computing, 22(6):1331–1348, 1993

work page 1993

[7] [7]

R. Beigel. Perceptrons, PP, and the polynomial hierarchy.Computational Complexity, 4:339– 349, 1994

work page 1994

[8] [8]

A. A. Sherstov. The intersection of two halfspaces has high threshold degree. InFOCS, 2009

work page 2009

[9] [9]

Diakonikolas, P

I. Diakonikolas, P. Harsha, A. Klivans, R. Meka, P. Raghavendra, R. Servedio, and L.-Y. Tan. Bounding the average sensitivity and noise sensitivity of polynomial threshold functions. In STOC, 2010

work page 2010

[10] [10]

J. A. Clarkson. Uniformly convex spaces.Transactions of the AMS, 40(3):396–414, 1936

work page 1936

[11] [11]

A. W. Marshall, I. Olkin, and B. C. Arnold.Inequalities: Theory of Majorization and Its Applications. Springer, 2nd edition, 2011. 19

work page 2011

[12] [12]

Aaronson and A

S. Aaronson and A. Wigderson. Algebrization: A new barrier in complexity theory.ACM Transactions on Computation Theory, 1(1):1–54, 2009

work page 2009

[13] [13]

Baker, J

T. Baker, J. Gill, and R. Solovay. Relativizations of theP=?N Pquestion.SIAM Journal on Computing, 4(4):431–442, 1975

work page 1975

[14] [14]

A. A. Razborov and S. Rudich. Natural proofs. InSTOC, pages 204–213, 1994

work page 1994

[15] [15]

Courbariaux, I

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. InNeurIPS, 2016

work page 2016

[16] [16]

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han. AWQ: Activation-aware weight quantization for LLM compression and acceleration. InMLSys, 2024

work page 2024

[17] [17]

Opti- mize weight rounding via signed gradient descent for the quantization of llms,

W. Cheng, W. Zhang, H. Shen, Y. Cai, X. He, and K. Lv. Optimize weight rounding via signed gradient descent for the quantization of LLMs.arXiv preprint arXiv:2309.05516, 2023

work page arXiv 2023

[18] [18]

C. Zhu, S. Han, H. Mao, and W. J. Dally. Trained ternary quantization. InICLR, 2017

work page 2017

[19] [19]

K. G. Beauchamp.Applications of Walsh and Related Functions. Academic Press, 1984

work page 1984

[20] [20]

J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. InFOCS, pages 68–80, 1988

work page 1988

[21] [21]

Friedgut

E. Friedgut. Boolean functions with low average sensitivity depend on few coordinates.Com- binatorica, 18(1):27–35, 1998

work page 1998

[22] [22]

Gorji, A

A. Gorji, A. Amini, and M. Kowalski. A scalable Walsh-Hadamard regularizer to overcome the low-degree spectral bias of neural networks. InUAI, 2023

work page 2023

[23] [23]

B. K. Petersen, M. L. Larma, T. N. Mundhenk, et al. Deep differentiable logic gate networks. InNeurIPS, 2022. A Reproducibility of the Computational Universality Theorem Theorem 7.1, imported from the companion synthesis manuscript [1], is a computational theorem verified by explicit certificate. For completeness, we describe the integer verification proc...

work page 2022

[24] [24]

ConstructH n =H ⊗n 1 withH 1 = 1 1 1−1 (integer matrix, no floating-point)

work page

[25] [25]

Computey=H nw(integer matrix-vector product). 20

work page

[26] [26]

Atn= 4, this is2 4 ×2 4 = 256integer multiplications per function, with65,536functions total: approximately16.8×10 6 operations, completing in under 1 second on any modern CPU

Verifysign(y i) =f i for alli∈[2 n]. Atn= 4, this is2 4 ×2 4 = 256integer multiplications per function, with65,536functions total: approximately16.8×10 6 operations, completing in under 1 second on any modern CPU. Reproduction.The complete verification can be reproduced by running: cd boolean_fourier python3 -m bbt.exhaustive_validation --n 4 --repair Thi...

work page