The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions
Pith reviewed 2026-05-09 14:14 UTC · model grok-4.3
The pith
Influence-dependent exponents on Walsh-Hadamard butterfly layers produce a strictly Schur-convex contraction invariant μ(f) for Boolean functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Banach-Butterfly Invariant μ(f) arises from assigning exponent p_ℓ = 1 + Inf_ℓ(f) to each layer ℓ of the Walsh-Hadamard butterfly factorization, producing the contraction invariant μ(f) = ∏_ℓ 2^{-Inf_ℓ(f)/(1+Inf_ℓ(f))}. This quantity is strictly Schur-convex in the influence vector (up to permutation), satisfies the Jensen lower bound on its base-2 logarithm, is algebraic in the Fourier coefficients, and separates 122 pairs of functions at n=3 that share identical total influence. It also displays distinct asymptotic scaling: ∼2^{-n/2} for parity, 2^{-Θ(√n)} for majority, and 2^{-1/2} for dictators.
What carries the argument
the influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization, with each layer ℓ receiving exponent p_ℓ = 1 + Inf_ℓ(f) to form the product contraction invariant μ(f)
If this is right
- μ separates 122 pairs of n=3 functions that share identical total influence.
- All 65,536 Boolean functions at n=4 have odd minimum support size with mean 6.42.
- At fixed total influence, the conditional Spearman correlation of μ with minimum support equals +0.571 at n=4 but reverses to -0.38 at n=5.
- μ obeys distinct scaling regimes: 2^{-n/2} for parity functions, 2^{-Θ(√n)} for majority, and 2^{-1/2} for dictators.
- log₂μ is rational but not polynomial in the Fourier coefficients while μ itself remains algebraic.
Where Pith is reading between the lines
- The sign reversal in the μ-support correlation between n=4 and n=5 indicates that any predictive relationship between the invariant and support size is dimension-dependent rather than universal.
- Because μ is algebraic, it supplies an exact, non-approximate classifier for distinguishing functions within the same total-influence class that could extend to structured enumeration tasks beyond the computed range.
- The Schur-convexity and scaling classes together imply that influence concentration directly controls the value of μ, offering a potential handle for balancing influence vectors in related discrete-geometric problems.
Load-bearing premise
Assigning the exponent 1 plus the influence of each coordinate to the corresponding butterfly layer produces a contraction invariant that captures meaningful geometric properties of the Boolean function and its threshold representations.
What would settle it
A Boolean function at any n where the numerically computed μ violates the claimed Jensen lower bound log₂μ(f) ≥ -I(f)/(1 + I(f)/n) or where μ fails to be strictly Schur-convex under reordering of the influence vector.
Figures
read the original abstract
We introduce the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization. For a Boolean function $f:\{-1,+1\}^n\to\{-1,+1\}$ with coordinate influences $\mathrm{Inf}_\ell(f)$, BBT assigns exponent $p_\ell = 1+\mathrm{Inf}_\ell(f)$ to butterfly layer $\ell$, yielding the contraction invariant $\mu(f)=\prod_\ell 2^{-\mathrm{Inf}_\ell/(1+\mathrm{Inf}_\ell)}$. We prove a Jensen lower bound $\log_2\mu(f) \ge -I(f)/(1+I(f)/n)$ and that $\mu$ is strictly Schur-convex in the influence vector (modulo permutation), giving scaling classes $\mu\sim 2^{-n/2}$ (parity), $2^{-\Theta(\sqrt{n})}$ (majority), $2^{-1/2}$ (dictators). $\log_2\mu$ is rational but not polynomial in the Fourier coefficients while $\mu$ is algebraic, and $\mu$ separates functions with identical total influence (122 pairs at $n=3$). Using the certified $n \le 4$ ternary Walsh-threshold universe from a companion synthesis manuscript as a finite testbed, we compute exact MILP minimum-support certificates for all 65,536 Boolean functions at $n=4$ (mean 6.42, max 9, all-odd by a parity argument) and on 10,000 of the 616,126 NPN-canonical representatives we enumerate at $n=5$ (matching OEIS A000370). Conditional Spearman $\rho(\mu,|\mathrm{supp}|)$ at fixed total influence is $+0.571$ in the largest stratum at $n=4$ but reverses to $-0.38$ at $n=5$ under both function-uniform and NPN-canonical sampling: $\mu$ is a valid Schur-convex concentration invariant, not a universal monotone predictor of minimum support across $n$. A companion application paper validates a real-valued WHT activation-energy proxy inspired by this theory on five pretrained LLMs at W2A16, cutting wikitext-2 perplexity by 15-58% versus vanilla auto-round; the transfer from Boolean theory to the real-valued proxy is qualitative, not formal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce the Banach-Butterfly Invariant μ(f) = ∏_ℓ 2^{-Inf_ℓ(f)/(1+Inf_ℓ(f))} as an influence-adaptive contraction invariant on the Walsh-Hadamard butterfly factorization of Boolean functions. It proves the Jensen lower bound log₂ μ(f) ≥ -I(f)/(1 + I(f)/n) and strict Schur-convexity of μ in the influence vector. Scaling classes are derived for parity, majority, and dictator functions. Using MILP, it computes minimum supports for all 65536 functions at n=4 and samples at n=5, reporting conditional Spearman correlations that change sign between n=4 and n=5, concluding that μ is a Schur-convex concentration invariant but not a universal predictor of minimum support. A qualitative application to LLM quantization is noted in a companion paper.
Significance. Should the results hold, this work offers a new geometric invariant for Boolean functions that is directly constructed from influences without free parameters, with proofs relying on standard concavity and majorization arguments. The explicit scaling classes match known influence behaviors, and the small-n exhaustive and sampled computations provide concrete, verifiable data including the correlation reversal. This highlights the invariant's properties as a concentration measure while cautioning against overgeneralization. The machine-verified MILP certificates for n=4 add rigor to the empirical component.
minor comments (3)
- The title references 'Ternary Polynomial Threshold Functions' but the abstract and claims focus exclusively on Boolean functions; the connection to ternary representations should be clarified early in the manuscript.
- The companion synthesis manuscript for the n≤4 testbed is referenced but lacks a citation; include the arXiv identifier or title for completeness.
- The abstract states that log₂μ is rational but not polynomial in the Fourier coefficients; this distinction would benefit from a brief explanation or example in the main text to aid readers.
Simulated Author's Rebuttal
Thank you for your careful review and positive recommendation for minor revision. We are glad that the referee summary correctly captures the introduction of the Banach-Butterfly Invariant μ(f), its properties such as the Jensen lower bound and strict Schur-convexity, the derived scaling classes, and the computational results on minimum polynomial threshold supports for small n, including the sign change in conditional Spearman correlations. No major comments were specified in the report, so we have no specific rebuttals. We are prepared to make any minor revisions as needed.
Circularity Check
No significant circularity; derivation applies standard inequalities to explicit definition
full rationale
The paper defines μ(f) explicitly as the product ∏_ℓ 2^{-Inf_ℓ/(1+Inf_ℓ)} from the given influences Inf_ℓ(f). The claimed Jensen lower bound log₂μ(f) ≥ -I(f)/(1+I(f)/n) is obtained by applying the standard concavity of g(x)=x/(1+x) and Jensen's inequality to the vector of influences; strict Schur-convexity likewise follows from the majorization inequality for strictly concave g. These steps are direct applications of known inequalities to the definition and do not reduce by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. Scaling classes match known total-influence asymptotics for parity/majority/dictators, and n=4/5 enumerations are presented as concrete finite checks. The derivation chain is self-contained against external mathematical facts.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Jensen's inequality applies to the relevant convex function in the log bound derivation
invented entities (1)
-
Banach-Butterfly Invariant μ(f)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
G. Pavlov. Differentiable logic synthesis: spectral coefficient selection via Sinkhorn-constrained composition. Manuscript under review, 2026. Companion paper establishing the certifiedn≤4 ternary Walsh-threshold representability used as the testbed in Section 7
work page 2026
-
[2]
G. Pavlov. Influence-weighted spectral rotations for extreme low-bit LLM quantization. Com- panion application paper, 2026. Empirical validation of an influence-adaptive WHT pre- quantization transform; reports the LLM PPL numbers summarised in Section 8.5
work page 2026
-
[3]
J. Bergh and J. Löfström.Interpolation Spaces: An Introduction. Springer, 1976
work page 1976
-
[4]
O’Donnell.Analysis of Boolean Functions
R. O’Donnell.Analysis of Boolean Functions. Cambridge University Press, 2014
work page 2014
- [5]
-
[6]
E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum.SIAM Journal on Computing, 22(6):1331–1348, 1993
work page 1993
-
[7]
R. Beigel. Perceptrons, PP, and the polynomial hierarchy.Computational Complexity, 4:339– 349, 1994
work page 1994
-
[8]
A. A. Sherstov. The intersection of two halfspaces has high threshold degree. InFOCS, 2009
work page 2009
-
[9]
I. Diakonikolas, P. Harsha, A. Klivans, R. Meka, P. Raghavendra, R. Servedio, and L.-Y. Tan. Bounding the average sensitivity and noise sensitivity of polynomial threshold functions. In STOC, 2010
work page 2010
-
[10]
J. A. Clarkson. Uniformly convex spaces.Transactions of the AMS, 40(3):396–414, 1936
work page 1936
-
[11]
A. W. Marshall, I. Olkin, and B. C. Arnold.Inequalities: Theory of Majorization and Its Applications. Springer, 2nd edition, 2011. 19
work page 2011
-
[12]
S. Aaronson and A. Wigderson. Algebrization: A new barrier in complexity theory.ACM Transactions on Computation Theory, 1(1):1–54, 2009
work page 2009
- [13]
-
[14]
A. A. Razborov and S. Rudich. Natural proofs. InSTOC, pages 204–213, 1994
work page 1994
-
[15]
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks. InNeurIPS, 2016
work page 2016
-
[16]
J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han. AWQ: Activation-aware weight quantization for LLM compression and acceleration. InMLSys, 2024
work page 2024
-
[17]
Opti- mize weight rounding via signed gradient descent for the quantization of llms,
W. Cheng, W. Zhang, H. Shen, Y. Cai, X. He, and K. Lv. Optimize weight rounding via signed gradient descent for the quantization of LLMs.arXiv preprint arXiv:2309.05516, 2023
-
[18]
C. Zhu, S. Han, H. Mao, and W. J. Dally. Trained ternary quantization. InICLR, 2017
work page 2017
-
[19]
K. G. Beauchamp.Applications of Walsh and Related Functions. Academic Press, 1984
work page 1984
-
[20]
J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. InFOCS, pages 68–80, 1988
work page 1988
- [21]
- [22]
-
[23]
B. K. Petersen, M. L. Larma, T. N. Mundhenk, et al. Deep differentiable logic gate networks. InNeurIPS, 2022. A Reproducibility of the Computational Universality Theorem Theorem 7.1, imported from the companion synthesis manuscript [1], is a computational theorem verified by explicit certificate. For completeness, we describe the integer verification proc...
work page 2022
-
[24]
ConstructH n =H ⊗n 1 withH 1 = 1 1 1−1 (integer matrix, no floating-point)
-
[25]
Computey=H nw(integer matrix-vector product). 20
-
[26]
Verifysign(y i) =f i for alli∈[2 n]. Atn= 4, this is2 4 ×2 4 = 256integer multiplications per function, with65,536functions total: approximately16.8×10 6 operations, completing in under 1 second on any modern CPU. Reproduction.The complete verification can be reproduced by running: cd boolean_fourier python3 -m bbt.exhaustive_validation --n 4 --repair Thi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.