Recognition: 2 theorem links
· Lean TheoremNeural Information Causality
Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3
The pith
Query-separated neural architectures induce random-access communication tasks bounded by information causality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B). Any independently certified physical capacity bound on the interface then implies the stricter bound I_N-RAC ≤ C_H. This separation treats the representation as a message whose performance is limited by communication constraints rather than by post-hoc capacity definitions. For CHSH-type correlation layers the same embedding produces nested Neural-RAC protocols whose biases multiply across depth, and stability of a one-bit bottleneck at arbitrary depth selects the Tsirelson threshold. The paper also gives the exact one-bit classical RAC benchmark and an
What carries the argument
The embedding inequality I_N-RAC ≤ I(⃗a:H,B) that maps any query-separated neural computation onto a random-access communication experiment.
Load-bearing premise
Query-separated computation in neural architectures directly induces a random-access communication experiment to which information causality applies without further assumptions on the encoding or decoding maps.
What would settle it
A simulation or experiment in which a strictly query-separated network achieves measured I_N-RAC strictly larger than the simultaneously measured I(⃗a:H,B) while the interface mutual information is accurately estimated.
Figures
read the original abstract
Query-separated computation forces a representation to play an operational role: data are encoded before a query is known, and a later decoder can answer only through the intermediate interface. In this regime the representation functions as a message rather than merely as a feature map. We formalize this observation by embedding information causality (IC) into representation learning, obtaining a framework called neural information causality (Neural-IC). The revised formulation separates two logically distinct statements. First, every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality $I_{\mathrm{N\text{-}RAC}}\le I(\vec a:H,B)$. Second, any independently certified physical capacity bound on the interface, such as a hard $m$-bit alphabet, a finite-precision register, or a power-constrained noisy channel, implies $I_{\mathrm{N\text{-}RAC}}\le C_H$. This separation avoids treating capacity as a post hoc definition and makes Neural-IC an operational diagnostic for query leakage, precision leakage, and episode-specific memory. We also provide an exact one-bit classical RAC benchmark, showing explicitly that the relevant quantum enhancement is not total information beyond the bottleneck, but fair query-conditioned access. For CHSH-type correlation layers, nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold. We extend the analysis to asymmetric seed biases, to multi-capacity finite-depth phase diagrams, and to correlated data via a conditional information score. Controlled simulations, including straight-through binary bottlenecks and deliberately leaky ablations, verify that apparent violations are accounted for by broken query separation or undercounted capacity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Neural Information Causality (Neural-IC), a framework that embeds information causality into query-separated neural architectures. It separates two claims: (1) every query-separated architecture induces a random-access communication experiment obeying the embedding inequality I_N-RAC ≤ I(⃗a:H,B), and (2) any independently certified physical capacity bound on the interface implies I_N-RAC ≤ C_H. The paper supplies an exact one-bit classical RAC benchmark, shows that nested CHSH-type correlation layers with one-bit bottleneck stability select the Tsirelson threshold, extends the analysis to asymmetric seed biases, multi-capacity finite-depth phase diagrams, and correlated data via a conditional information score, and reports controlled simulations (including straight-through binary bottlenecks and leaky ablations) that attribute apparent violations to broken query separation or undercounted capacity.
Significance. If the embedding is shown to be rigorous, the work supplies an operational diagnostic for query leakage, precision leakage, and episode-specific memory in neural representations by importing information-causality constraints. The explicit separation of the embedding inequality from capacity bounds avoids post-hoc definitions, the one-bit RAC benchmark clarifies that quantum enhancement concerns fair query-conditioned access rather than total information, and the stability argument for Tsirelson selection plus the ablation simulations provide concrete, falsifiable tests. These elements could usefully constrain capacity analyses in deep learning and quantum-inspired models.
major comments (3)
- [Abstract and formalization of embedding inequality] Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.
- [CHSH-type correlation layers and stability analysis] Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.
- [Controlled simulations and ablations] Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.
minor comments (2)
- [Abstract] Notation: the vector ⃗a and the subscript N-RAC should be defined at first use with an explicit reference to the corresponding RAC parties and message alphabet.
- [One-bit classical RAC benchmark] The one-bit classical RAC benchmark is presented as 'exact'; a short appendix deriving the classical bound from first principles (rather than citing it) would improve self-containedness.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications and indicating where revisions will strengthen the presentation.
read point-by-point responses
-
Referee: Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.
Authors: The formalization section defines query-separated architectures as those in which the encoder has no access to the query vector and the decoder receives only the interface representation H together with the query. This definition directly supplies the required mapping: H is the non-adaptive message and the decoder is the RAC receiver. The equality of the mutual-information terms follows immediately from the operational definition of the induced experiment. Nevertheless, to eliminate any ambiguity regarding possible query-dependent pathways introduced by joint optimization, we will insert an explicit theorem together with its proof in the revised manuscript. The proof will show that the architectural constraints (encoder independence from query and decoder access limited to H) preclude the leakage mechanisms the referee identifies. revision: partial
-
Referee: Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.
Authors: The stability criterion is obtained by applying the embedding inequality recursively to the nested CHSH-type layers while enforcing that the one-bit bottleneck capacity remains finite at every depth. The maximum sustainable correlation bias is thereby fixed by the information-causality constraint alone; the Tsirelson value emerges as the unique number satisfying this recurrence. No external quantum bound is inserted. We will expand the relevant section with the full inductive derivation from the Neural-IC axioms to make the independence explicit and to forestall any appearance of circularity. revision: yes
-
Referee: Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.
Authors: We agree that full reproducibility requires these details. In the revised manuscript we will add a dedicated reproducibility subsection that states: (i) the operational definition of query separation (encoder output statistically independent of query, verified by zero mutual information under gradient flow), (ii) the data-exclusion rule (a run is discarded if any detected leakage exceeds a pre-specified threshold of 0.01 bits), and (iii) the error-analysis procedure (bootstrap resampling with 10 000 iterations to obtain 95 % confidence intervals on all reported information scores). These additions will render the simulation claims directly falsifiable. revision: yes
Circularity Check
No significant circularity in the derivation of Neural-IC
full rationale
The paper's central derivation embeds the established information causality principle into query-separated neural architectures by observing that such architectures correspond to random-access communication experiments. This embedding inequality I_N-RAC ≤ I(⃗a:H,B) is presented as a direct consequence of the query separation, which is an operational definition rather than a derived result. The further implication I_N-RAC ≤ C_H follows from applying the independent IC bound to the interface capacity. The discussion of Tsirelson threshold arises from requiring stability under arbitrary depth in one-bit bottlenecks, supported by an explicit classical RAC benchmark and simulations. Since the core claims rely on the mapping to a known physical principle and new operational interpretations without self-referential reductions or fitted predictions masquerading as derivations, the chain is self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Information causality holds for random-access communication experiments induced by any query-separated architecture.
invented entities (1)
-
Neural-IC framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck selects the Tsirelson threshold
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Therefore (−1)stEst = 1/ √ 2for every input pair. Lemma 2 then gives Eq. (84). Theorem 11 is the tightness complement to Theorem 8. Neural-IC rules outE >1/ √ 2under arbitrary-depth nesting, while quantum mechanics attainsE= 1/ √
-
[2]
The boundary is therefore not merely a formal upper bound; it is a physically realized frontier. C. Classical communication semantics, dense coding, and Holevo limits IC is normally phrased for classical communication be- cause the capacity unit is then unambiguous. If the transmit- ted bottleneck is quantum and entanglement-assisted, dense coding can cha...
-
[3]
Finally, the controlled Neural-RAC ablations in Fig
The large-napproximation Ecrit(n;C H)≃ 1√ 2 (2CH ln 2)1/(2n) (94) explains the slow convergence of finite-depth scans near criti- cality. Finally, the controlled Neural-RAC ablations in Fig. 11 test the diagnostic interpretation directly atN= 8. The strict models use a straight-through binary bottleneck trained end- to-end with the encoder blind to the qu...
-
[4]
Industrial Technology Infrastructure Program
The finite-depth boundary approaches the quantum threshold from above, clarifying why slightly supercritical correlations may require sufficiently large nesting depth before violating the one-bit Neural-IC bound. FIG. 9. Capacity accounting checks for finite-precision and noisy interfaces. The diagonal is the ideal accounting lineIN-RAC =C H. Lossless har...
work page 2025
-
[5]
winning” pairs each occur with probability(1 +E)/4and the two “losing
Isotropic CHSH cells as biased XOR constraints We recall the isotropic CHSH cell (Definition 9): it is a no-signaling box with inputs(s, t)∈ {0,1} 2 and outputs(A, B)∈ {0,1} 2 such that the local outcomes are uniformly random and Pr[A⊕B=st|s, t] = 1 +E 2 ,∀s, t,(A2) whereE∈[0,1]is the bias (correlation strength). The key to all subsequent calculations is ...
-
[6]
The pictorial “pyramid” of Fig
Tree notation for the nested protocol We now define the nested(N, m) = (2 n,1)protocol in a form convenient for proofs. The pictorial “pyramid” of Fig. 3 in Ref. [11] is precisely a full binary tree of depthnwhose internal nodes correspond to CHSH cells. Indexing.Let{0,1} ≤n−1 denote the set of all binary strings (“words”) of length at mostn−1. Each wordw...
-
[7]
Decoding (Bob).Bob receivesxand the queryb=b 1b2 · · ·b n ∈ {0,1} n
Encoding and decoding rules Encoding (Alice).For each internal nodew∈ {0,1} ≤n−1, once the children’s messagesx w0 andx w1 are available, Alice sets sw :=x w0 ⊕x w1, x w :=x w0 ⊕A w.(A10) At the end of this upward recursion, Alice transmits the single-bit bottleneck message x:=x ϵ,(A11) whereϵdenotes the empty word (the root). Decoding (Bob).Bob receivesx...
-
[8]
One-step correctness identity and error propagation We begin with a single-node identity that expresses exactly what Bob recovers fromwwhen he applies the update Eq. (A13). Lemma 7(One-step child recovery up to the local CHSH error).Fix an internal nodew. Lete w be the CHSH error bit of the cell atw, ew := (Aw ⊕B w)⊕s wtw.(A15) Then, the encoding/decoding...
-
[9]
Even-parity probability and the bias multiplicationE7→E n We now compute the probability that the parity in Eq. (A22) is zero. For completeness we prove the required identity in its most general form. Lemma 8(Parity of i.i.d. biased bits).Lete 1, . . . , en ∈ {0,1}be i.i.d. such that Pr[ei = 0] = 1 +E 2 ,Pr[e i = 1] = 1−E 2 ,(A29) whereE∈[−1,1]. LetE ⊕ :=...
-
[10]
Mutual information of a binary channel: exact formulas In Neural-RAC the relevant object is the conditional channelaK →βgiven the event(b=K). Becausea K andβare binary, mutual information can be written in closed form in terms of the (conditional) confusion matrix. This subsection records these formulas explicitly. B.1.1. General binary channel under an u...
-
[11]
From data toI N-RAC: estimators and bias We now address the practical question: given empirical samples from a Neural-RAC device (classical neural model, QNN, or a simulator), how does one estimateI N-RAC? B.2.1. Two experimental designs There are two natural sampling designs. Design A (per-query batching).For eachK∈ {0, . . . , N−1}, runT K trials with t...
-
[12]
Here we describe a simple and robust method
Confidence intervals and error propagation A practical report ofI N-RAC should include an uncertainty statement. Here we describe a simple and robust method. B.3.1. Binomial confidence intervals for bP Under the symmetry design Eq. (B16), the number of successesS:= PT t=1 1{β(t) =a (t) b(t) }is Binomial(T, P). Hence, classical binomial confidence interval...
-
[13]
Critical-regime sample complexity: why Tsirelson is numerically delicate A subtle point arises near the Tsirelson threshold. Even when the total information score isO(1), the per-query advantage can be exponentially small in the depthn. This section quantifies the phenomenon. Lemma 9(Small-bias expansion of1−h).Letδ∈[−1,1]and setp= (1 +δ)/2. Then, 1−h(p) ...
-
[14]
From±1correlators to CHSH winning probabilities The CHSH game is most transparently described in the bit language: the winning predicate is A⊕B=st.(D1) Quantum measurement outcomes, however, are naturally±1random variables. The conversion is elementary but worth stat- ing carefully because it is the algebraic hinge that connects quantum correlators to the...
-
[15]
CHSH twirling: reduction to an isotropic one-parameter family The nesting analysis in Appendix A assumes an isotropic CHSH cell: Pr[A⊕B=st|s, t] = 1 +E 2 ∀(s, t),(D9) for some biasE∈[0,1]. A natural question is: what if the raw correlations are not isotropic? Ref. [11] emphasizes that one can apply a purely local randomization (no communication) that pres...
-
[16]
A one-parameter quantum family: tuningE iso up to Tsirelson We now exhibit a simple quantum construction where a single angle parameter controls the effective isotropic biasEiso. This is exactly the form one would like in a QNN module: a small set of trainable parameters controlling correlation strength. We use the Bell state|Φ +⟩= (|00⟩+|11⟩)/ √
-
[17]
Letˆσx,ˆσz be Pauli matrices. Alice’s CHSH settings are fixed as ˆA0 := ˆσz, ˆA1 := ˆσx.(D20) Bob’s settings form a one-parameter family in thex-zplane: ˆB0(φ) := cosφˆσz + sinφˆσx, ˆB1(φ) := cosφˆσz −sinφˆσx,(D21) whereφ∈[0, π/4]. Theorem 16(Closed-form CHSH correlator and effective isotropic bias for the family Eq. (D21)).LetE st(φ) :=⟨Φ +| ˆAs ⊗ ˆBt(φ)...
-
[18]
Feeding this single scalar into the nested isotropic analysis of Appendix A immediately yields the predicted Neural-RAC behavior at depthn, namely, P= 1 +E iso(φ)n 2 ,(D26) and IN-RAC(n, Eiso(φ)) = 2n 1−h 1 +E iso(φ)n 2 .(D27) Fig. 14 visualizes this dependence for representative anglesφ= 0,π/8, andπ/4against the Neural-IC limitm= 1(dashed), showing that ...
-
[19]
A. Graves, G. Wayne, and I. Danihelka, Neural turing machines (2014), arXiv:1410.5401 [cs.NE]. 32
work page internal anchor Pith review arXiv 2014
- [20]
- [21]
-
[22]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems(2017)
work page 2017
-
[23]
J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed experiment to test local hidden-variable theories, Physical Review Letters23, 880 (1969)
work page 1969
-
[24]
N. Brunner, D. Cavalcanti, S. Pironio, V . Scarani, and S. Wehner, Bell nonlocality, Reviews of Modern Physics86, 419 (2014)
work page 2014
-
[25]
B. S. Cirel’son, Quantum generalizations of bell’s inequality, Letters in Mathematical Physics4, 93 (1980)
work page 1980
-
[26]
M. Navascu ´es, S. Pironio, and A. Ac´ın, Bounding the set of quantum correlations, Physical Review Letters98, 010401 (2007)
work page 2007
-
[27]
A. Tavakoli, A. Pozas-Kerstjens, P. Brown, and M. Ara´ujo, Semidefinite programming relaxations for quantum correlations, Reviews of Modern Physics96, 045006 (2024)
work page 2024
- [28]
-
[29]
M. Pawłowski, T. Paterek, D. Kaszlikowski, V . Scarani, A. Winter, and M. ˙Zukowski, Information causality as a physical principle, Nature461, 1101 (2009)
work page 2009
-
[30]
S. Popescu and D. Rohrlich, Quantum nonlocality as an axiom, Foundations of Physics24, 379 (1994)
work page 1994
-
[31]
van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159
W. van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159
-
[32]
G. Brassard, H. Buhrman, N. Linden, A. A. M ´ethot, A. Tapp, and F. Unger, Limit on nonlocality in any world in which communication complexity is not trivial, Physical Review Letters96, 250401 (2006)
work page 2006
-
[33]
P. Jain, M. Gachechiladze, and N. Miklin, Information causality as a tool for bounding the set of quantum correlations, Physical Review Letters133, 160201 (2024)
work page 2024
-
[34]
The information bottleneck method
N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method (2000), arXiv:physics/0004057
work page internal anchor Pith review Pith/arXiv arXiv 2000
-
[35]
A. A. Alemi, I. Fischer, J. V . Dillon, and K. Murphy, Deep variational information bottleneck, inInternational Conference on Learning Representations(2017)
work page 2017
- [36]
-
[37]
A. M. Saxe, Y . Bansal, J. Dapello, M. S. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, On the information bottleneck theory of deep learning, Journal of Statistical Mechanics: Theory and Experiment2019, 124020 (2019)
work page 2019
-
[38]
K. Kawaguchi, Z. Deng, X. Ji, and J. Huang, How does information bottleneck help deep learning?, inProceedings of the 40th Interna- tional Conference on Machine Learning(2023)
work page 2023
-
[39]
J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature549, 195 (2017)
work page 2017
- [40]
-
[41]
J. R. McClean, S. Boixo, V . N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature Communications9, 4812 (2018)
work page 2018
-
[42]
M. Schuld and N. Killoran, Quantum machine learning in feature hilbert spaces, Physical Review Letters122, 040504 (2019)
work page 2019
-
[43]
E. H. Lieb and M. B. Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy, Journal of Mathematical Physics14, 1938 (1973)
work page 1938
-
[44]
C. H. Bennett and S. J. Wiesner, Communication via one- and two-particle operators on einstein-podolsky-rosen states, Physical Review Letters69, 2881 (1992)
work page 1992
-
[45]
A. S. Holevo, Bounds for the quantity of information transmitted by a quantum communication channel, Problems of Information Trans- mission9, 177 (1973)
work page 1973
-
[46]
Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)
L. Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)
work page 2003
-
[47]
L. D. Brown, T. T. Cai, and A. DasGupta, Interval estimation for a binomial proportion, Statistical Science16, 101 (2001)
work page 2001
-
[48]
S. Boucheron, G. Lugosi, and P. Massart,Concentration Inequalities: A Nonasymptotic Theory of Independence(Oxford University Press, 2013)
work page 2013
-
[49]
G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, Sponge functions, inECRYPT Hash Workshop(2007)
work page 2007
-
[50]
G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, On the indifferentiability of the sponge construction, inAdvances in Cryptology – EUROCRYPT 2008, Lecture Notes in Computer Science, V ol. 4965 (Springer, 2008) pp. 181–197
work page 2008
-
[51]
G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, Keccak implementation overview (2012), version 3.2
work page 2012
-
[52]
J. Wetzels and W. Bokslag, Sponges and engines: An introduction to keccak and keyak (2015), arXiv:1510.02856 [cs.CR]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.