arxiv: 2605.09316 · v1 · submitted 2026-05-10 · 🪐 quant-ph · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Neural Information Causality

Jeongho Bang , Marcin Paw{\l}owski

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.AI

keywords neural information causalityquery-separated architecturesrandom-access communicationinformation causalityrepresentation learningCHSH correlationsTsirelson bound

0 comments

The pith

Query-separated neural architectures induce random-access communication tasks bounded by information causality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Query-separated computation requires encoding data before the query arrives, so the intermediate representation must function as a communication message rather than a simple feature map. The paper embeds information causality into this regime to produce Neural-IC, which separates an embedding inequality relating random-access performance to interface mutual information from any additional physical capacity bound on that interface. A reader would care because the resulting framework supplies an operational diagnostic for query leakage, precision loss, and episode memory without defining capacity after the fact. It also supplies an exact one-bit classical benchmark and shows that the relevant quantum gain is fair query-conditioned access rather than exceeding the total information limit. Controlled simulations confirm that apparent violations arise only when query separation is broken or capacity is undercounted.

Core claim

Every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B). Any independently certified physical capacity bound on the interface then implies the stricter bound I_N-RAC ≤ C_H. This separation treats the representation as a message whose performance is limited by communication constraints rather than by post-hoc capacity definitions. For CHSH-type correlation layers the same embedding produces nested Neural-RAC protocols whose biases multiply across depth, and stability of a one-bit bottleneck at arbitrary depth selects the Tsirelson threshold. The paper also gives the exact one-bit classical RAC benchmark and an

What carries the argument

The embedding inequality I_N-RAC ≤ I(⃗a:H,B) that maps any query-separated neural computation onto a random-access communication experiment.

Load-bearing premise

Query-separated computation in neural architectures directly induces a random-access communication experiment to which information causality applies without further assumptions on the encoding or decoding maps.

What would settle it

A simulation or experiment in which a strictly query-separated network achieves measured I_N-RAC strictly larger than the simultaneously measured I(⃗a:H,B) while the interface mutual information is accurately estimated.

Figures

Figures reproduced from arXiv: 2605.09316 by Jeongho Bang, Marcin Paw{\l}owski.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Classical one-bit benchmark against nested correlation protocols. The optimal classical majority code has a nonzero large- [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Closed-form information score [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Bias scan at fixed depth [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Finite-depth critical bias [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Capacity accounting checks for finite-precision and noisy interfaces. The diagonal is the ideal accounting line [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. Finite-depth phase boundaries for several effective capacities [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. Controlled leakage and capacity-accounting simulations for [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: FIG. 12. Quantum correlation-layer sweep at [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: as φ increases from 0 to π/4, the effective isotropic bias rises smoothly from the classical edge 1/2 to the Tsirelson value 1/ √ 2. Feeding this single scalar into the nested isotropic analysis of Appendix A immediately yields the predicted Neural-RAC behavior at depth n, namely, P = 1 + Eiso(φ) n 2 , (D26) and IN-RAC(n, Eiso(φ)) = 2n 1 − h 1 + Eiso(φ) n 2 . (D27) [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 13.** Figure 13: FIG. 13. Effective isotropic bias [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗

**Figure 14.** Figure 14: FIG. 14. Predicted Neural-RAC information score [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

**Figure 15.** Figure 15: FIG. 15. Regularized optimization of the quantum correlation-layer angle for the toy utility in Eq. (D28). With no penalty the optimum is the [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗

read the original abstract

Query-separated computation forces a representation to play an operational role: data are encoded before a query is known, and a later decoder can answer only through the intermediate interface. In this regime the representation functions as a message rather than merely as a feature map. We formalize this observation by embedding information causality (IC) into representation learning, obtaining a framework called neural information causality (Neural-IC). The revised formulation separates two logically distinct statements. First, every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality $I_{\mathrm{N\text{-}RAC}}\le I(\vec a:H,B)$. Second, any independently certified physical capacity bound on the interface, such as a hard $m$-bit alphabet, a finite-precision register, or a power-constrained noisy channel, implies $I_{\mathrm{N\text{-}RAC}}\le C_H$. This separation avoids treating capacity as a post hoc definition and makes Neural-IC an operational diagnostic for query leakage, precision leakage, and episode-specific memory. We also provide an exact one-bit classical RAC benchmark, showing explicitly that the relevant quantum enhancement is not total information beyond the bottleneck, but fair query-conditioned access. For CHSH-type correlation layers, nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold. We extend the analysis to asymmetric seed biases, to multi-capacity finite-depth phase diagrams, and to correlated data via a conditional information score. Controlled simulations, including straight-through binary bottlenecks and deliberately leaky ablations, verify that apparent violations are accounted for by broken query separation or undercounted capacity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps query-separated neural reps onto RAC tasks to import information causality bounds, with a clean split between embedding and capacity, but the training-to-RAC mapping still needs tighter verification.

read the letter

The main takeaway is that query-separated computation turns a neural representation into a message in a random-access code, letting information causality give operational bounds on leakage and precision. They split this into two statements: the embedding inequality that follows from separation alone, and the capacity bound that follows from any hard limit on the interface. That split is the useful part because it stops you from smuggling capacity into the definition of the representation itself. They also give an exact one-bit classical RAC benchmark and show that depth-stable one-bit bottlenecks on CHSH layers recover the Tsirelson threshold rather than a looser classical limit. The simulations with straight-through binary bottlenecks and deliberate leaky ablations are the practical check: they claim apparent violations trace back to broken separation or under-counted capacity rather than true breaches of the inequality. That is concrete enough to be falsifiable. The stress-test worry about side channels is reasonable on its face. Joint training can create effective query dependence through batch-norm statistics or attention even when the forward pass looks separated. The paper says its ablations catch these cases, and if the mutual-information terms really map identically rather than by analogy, the embedding holds. Without the full derivations in front of me I cannot confirm the exact equality, but the controlled simulations reduce the risk that the claim is only hand-waving. The Tsirelson selection via arbitrary-depth stability is a nice concrete result, though it would be cleaner if the argument did not lean on the known quantum bound. This is aimed at the quantum-information and constrained-representation crowd. A reader who already cares about information-theoretic limits on models will get usable benchmarks and a diagnostic tool. It is new enough and has enough checkable pieces that it deserves a serious referee rather than a desk reject. I would send it out for review; the separation of the two statements is a genuine clarification and the simulations make the central claims testable against the literature on both sides.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Neural Information Causality (Neural-IC), a framework that embeds information causality into query-separated neural architectures. It separates two claims: (1) every query-separated architecture induces a random-access communication experiment obeying the embedding inequality I_N-RAC ≤ I(⃗a:H,B), and (2) any independently certified physical capacity bound on the interface implies I_N-RAC ≤ C_H. The paper supplies an exact one-bit classical RAC benchmark, shows that nested CHSH-type correlation layers with one-bit bottleneck stability select the Tsirelson threshold, extends the analysis to asymmetric seed biases, multi-capacity finite-depth phase diagrams, and correlated data via a conditional information score, and reports controlled simulations (including straight-through binary bottlenecks and leaky ablations) that attribute apparent violations to broken query separation or undercounted capacity.

Significance. If the embedding is shown to be rigorous, the work supplies an operational diagnostic for query leakage, precision leakage, and episode-specific memory in neural representations by importing information-causality constraints. The explicit separation of the embedding inequality from capacity bounds avoids post-hoc definitions, the one-bit RAC benchmark clarifies that quantum enhancement concerns fair query-conditioned access rather than total information, and the stability argument for Tsirelson selection plus the ablation simulations provide concrete, falsifiable tests. These elements could usefully constrain capacity analyses in deep learning and quantum-inspired models.

major comments (3)

[Abstract and formalization of embedding inequality] Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.
[CHSH-type correlation layers and stability analysis] Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.
[Controlled simulations and ablations] Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.

minor comments (2)

[Abstract] Notation: the vector ⃗a and the subscript N-RAC should be defined at first use with an explicit reference to the corresponding RAC parties and message alphabet.
[One-bit classical RAC benchmark] The one-bit classical RAC benchmark is presented as 'exact'; a short appendix deriving the classical bound from first principles (rather than citing it) would improve self-containedness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications and indicating where revisions will strengthen the presentation.

read point-by-point responses

Referee: Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.

Authors: The formalization section defines query-separated architectures as those in which the encoder has no access to the query vector and the decoder receives only the interface representation H together with the query. This definition directly supplies the required mapping: H is the non-adaptive message and the decoder is the RAC receiver. The equality of the mutual-information terms follows immediately from the operational definition of the induced experiment. Nevertheless, to eliminate any ambiguity regarding possible query-dependent pathways introduced by joint optimization, we will insert an explicit theorem together with its proof in the revised manuscript. The proof will show that the architectural constraints (encoder independence from query and decoder access limited to H) preclude the leakage mechanisms the referee identifies. revision: partial
Referee: Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.

Authors: The stability criterion is obtained by applying the embedding inequality recursively to the nested CHSH-type layers while enforcing that the one-bit bottleneck capacity remains finite at every depth. The maximum sustainable correlation bias is thereby fixed by the information-causality constraint alone; the Tsirelson value emerges as the unique number satisfying this recurrence. No external quantum bound is inserted. We will expand the relevant section with the full inductive derivation from the Neural-IC axioms to make the independence explicit and to forestall any appearance of circularity. revision: yes
Referee: Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.

Authors: We agree that full reproducibility requires these details. In the revised manuscript we will add a dedicated reproducibility subsection that states: (i) the operational definition of query separation (encoder output statistically independent of query, verified by zero mutual information under gradient flow), (ii) the data-exclusion rule (a run is discarded if any detected leakage exceeds a pre-specified threshold of 0.01 bits), and (iii) the error-analysis procedure (bootstrap resampling with 10 000 iterations to obtain 95 % confidence intervals on all reported information scores). These additions will render the simulation claims directly falsifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation of Neural-IC

full rationale

The paper's central derivation embeds the established information causality principle into query-separated neural architectures by observing that such architectures correspond to random-access communication experiments. This embedding inequality I_N-RAC ≤ I(⃗a:H,B) is presented as a direct consequence of the query separation, which is an operational definition rather than a derived result. The further implication I_N-RAC ≤ C_H follows from applying the independent IC bound to the interface capacity. The discussion of Tsirelson threshold arises from requiring stability under arbitrary depth in one-bit bottlenecks, supported by an explicit classical RAC benchmark and simulations. Since the core claims rely on the mapping to a known physical principle and new operational interpretations without self-referential reductions or fitted predictions masquerading as derivations, the chain is self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the applicability of the information causality principle to induced random-access experiments and the existence of independently certifiable physical capacity bounds such as m-bit alphabets or power-constrained channels.

axioms (1)

domain assumption Information causality holds for random-access communication experiments induced by any query-separated architecture.
Invoked to obtain the embedding inequality I_N-RAC ≤ I(⃗a:H,B) from the representation acting as a message.

invented entities (1)

Neural-IC framework no independent evidence
purpose: To embed information causality into representation learning and separate embedding from capacity bounds for operational diagnostics.
Newly introduced construct in the paper; no independent evidence provided beyond the framework itself.

pith-pipeline@v0.9.0 · 5590 in / 1341 out tokens · 53853 ms · 2026-05-12T04:59:16.362081+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck selects the Tsirelson threshold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

Lemma 2 then gives Eq

Therefore (−1)stEst = 1/ √ 2for every input pair. Lemma 2 then gives Eq. (84). Theorem 11 is the tightness complement to Theorem 8. Neural-IC rules outE >1/ √ 2under arbitrary-depth nesting, while quantum mechanics attainsE= 1/ √

work page
[2]

quantum is more expressive

The boundary is therefore not merely a formal upper bound; it is a physically realized frontier. C. Classical communication semantics, dense coding, and Holevo limits IC is normally phrased for classical communication be- cause the capacity unit is then unambiguous. If the transmit- ted bottleneck is quantum and entanglement-assisted, dense coding can cha...

work page
[3]

Finally, the controlled Neural-RAC ablations in Fig

The large-napproximation Ecrit(n;C H)≃ 1√ 2 (2CH ln 2)1/(2n) (94) explains the slow convergence of finite-depth scans near criti- cality. Finally, the controlled Neural-RAC ablations in Fig. 11 test the diagnostic interpretation directly atN= 8. The strict models use a straight-through binary bottleneck trained end- to-end with the encoder blind to the qu...

work page
[4]

Industrial Technology Infrastructure Program

The finite-depth boundary approaches the quantum threshold from above, clarifying why slightly supercritical correlations may require sufficiently large nesting depth before violating the one-bit Neural-IC bound. FIG. 9. Capacity accounting checks for finite-precision and noisy interfaces. The diagonal is the ideal accounting lineIN-RAC =C H. Lossless har...

work page 2025
[5]

winning” pairs each occur with probability(1 +E)/4and the two “losing

Isotropic CHSH cells as biased XOR constraints We recall the isotropic CHSH cell (Definition 9): it is a no-signaling box with inputs(s, t)∈ {0,1} 2 and outputs(A, B)∈ {0,1} 2 such that the local outcomes are uniformly random and Pr[A⊕B=st|s, t] = 1 +E 2 ,∀s, t,(A2) whereE∈[0,1]is the bias (correlation strength). The key to all subsequent calculations is ...

work page
[6]

The pictorial “pyramid” of Fig

Tree notation for the nested protocol We now define the nested(N, m) = (2 n,1)protocol in a form convenient for proofs. The pictorial “pyramid” of Fig. 3 in Ref. [11] is precisely a full binary tree of depthnwhose internal nodes correspond to CHSH cells. Indexing.Let{0,1} ≤n−1 denote the set of all binary strings (“words”) of length at mostn−1. Each wordw...

work page
[7]

Decoding (Bob).Bob receivesxand the queryb=b 1b2 · · ·b n ∈ {0,1} n

Encoding and decoding rules Encoding (Alice).For each internal nodew∈ {0,1} ≤n−1, once the children’s messagesx w0 andx w1 are available, Alice sets sw :=x w0 ⊕x w1, x w :=x w0 ⊕A w.(A10) At the end of this upward recursion, Alice transmits the single-bit bottleneck message x:=x ϵ,(A11) whereϵdenotes the empty word (the root). Decoding (Bob).Bob receivesx...

work page
[8]

error bits

One-step correctness identity and error propagation We begin with a single-node identity that expresses exactly what Bob recovers fromwwhen he applies the update Eq. (A13). Lemma 7(One-step child recovery up to the local CHSH error).Fix an internal nodew. Lete w be the CHSH error bit of the cell atw, ew := (Aw ⊕B w)⊕s wtw.(A15) Then, the encoding/decoding...

work page
[9]

bottom-to-top

Even-parity probability and the bias multiplicationE7→E n We now compute the probability that the parity in Eq. (A22) is zero. For completeness we prove the required identity in its most general form. Lemma 8(Parity of i.i.d. biased bits).Lete 1, . . . , en ∈ {0,1}be i.i.d. such that Pr[ei = 0] = 1 +E 2 ,Pr[e i = 1] = 1−E 2 ,(A29) whereE∈[−1,1]. LetE ⊕ :=...

work page
[10]

Becausea K andβare binary, mutual information can be written in closed form in terms of the (conditional) confusion matrix

Mutual information of a binary channel: exact formulas In Neural-RAC the relevant object is the conditional channelaK →βgiven the event(b=K). Becausea K andβare binary, mutual information can be written in closed form in terms of the (conditional) confusion matrix. This subsection records these formulas explicitly. B.1.1. General binary channel under an u...

work page
[11]

random access

From data toI N-RAC: estimators and bias We now address the practical question: given empirical samples from a Neural-RAC device (classical neural model, QNN, or a simulator), how does one estimateI N-RAC? B.2.1. Two experimental designs There are two natural sampling designs. Design A (per-query batching).For eachK∈ {0, . . . , N−1}, runT K trials with t...

work page
[12]

Here we describe a simple and robust method

Confidence intervals and error propagation A practical report ofI N-RAC should include an uncertainty statement. Here we describe a simple and robust method. B.3.1. Binomial confidence intervals for bP Under the symmetry design Eq. (B16), the number of successesS:= PT t=1 1{β(t) =a (t) b(t) }is Binomial(T, P). Hence, classical binomial confidence interval...

work page
[13]

correlation layer

Critical-regime sample complexity: why Tsirelson is numerically delicate A subtle point arises near the Tsirelson threshold. Even when the total information score isO(1), the per-query advantage can be exponentially small in the depthn. This section quantifies the phenomenon. Lemma 9(Small-bias expansion of1−h).Letδ∈[−1,1]and setp= (1 +δ)/2. Then, 1−h(p) ...

work page
[14]

From±1correlators to CHSH winning probabilities The CHSH game is most transparently described in the bit language: the winning predicate is A⊕B=st.(D1) Quantum measurement outcomes, however, are naturally±1random variables. The conversion is elementary but worth stat- ing carefully because it is the algebraic hinge that connects quantum correlators to the...

work page
[15]

correlation capacity

CHSH twirling: reduction to an isotropic one-parameter family The nesting analysis in Appendix A assumes an isotropic CHSH cell: Pr[A⊕B=st|s, t] = 1 +E 2 ∀(s, t),(D9) for some biasE∈[0,1]. A natural question is: what if the raw correlations are not isotropic? Ref. [11] emphasizes that one can apply a purely local randomization (no communication) that pres...

work page
[16]

This is exactly the form one would like in a QNN module: a small set of trainable parameters controlling correlation strength

A one-parameter quantum family: tuningE iso up to Tsirelson We now exhibit a simple quantum construction where a single angle parameter controls the effective isotropic biasEiso. This is exactly the form one would like in a QNN module: a small set of trainable parameters controlling correlation strength. We use the Bell state|Φ +⟩= (|00⟩+|11⟩)/ √

work page
[17]

Letˆσx,ˆσz be Pauli matrices. Alice’s CHSH settings are fixed as ˆA0 := ˆσz, ˆA1 := ˆσx.(D20) Bob’s settings form a one-parameter family in thex-zplane: ˆB0(φ) := cosφˆσz + sinφˆσx, ˆB1(φ) := cosφˆσz −sinφˆσx,(D21) whereφ∈[0, π/4]. Theorem 16(Closed-form CHSH correlator and effective isotropic bias for the family Eq. (D21)).LetE st(φ) :=⟨Φ +| ˆAs ⊗ ˆBt(φ)...

work page
[18]

quantum knob

Feeding this single scalar into the nested isotropic analysis of Appendix A immediately yields the predicted Neural-RAC behavior at depthn, namely, P= 1 +E iso(φ)n 2 ,(D26) and IN-RAC(n, Eiso(φ)) = 2n 1−h 1 +E iso(φ)n 2 .(D27) Fig. 14 visualizes this dependence for representative anglesφ= 0,π/8, andπ/4against the Neural-IC limitm= 1(dashed), showing that ...

work page
[19]

Neural Turing Machines

A. Graves, G. Wayne, and I. Danihelka, Neural turing machines (2014), arXiv:1410.5401 [cs.NE]. 32

work page internal anchor Pith review arXiv 2014
[20]

Graves, G

A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou,et al., Hybrid computing using a neural network with dynamic external memory, Nature538, 471 (2016)

work page 2016
[21]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K”uttler, M. Lewis, W.-t. Yih, T. Rockt”aschel, S. Riedel, and D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp tasks, inAdvances in Neural Information Processing Systems (2020)

work page 2020
[22]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems(2017)

work page 2017
[23]

J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed experiment to test local hidden-variable theories, Physical Review Letters23, 880 (1969)

work page 1969
[24]

Brunner, D

N. Brunner, D. Cavalcanti, S. Pironio, V . Scarani, and S. Wehner, Bell nonlocality, Reviews of Modern Physics86, 419 (2014)

work page 2014
[25]

B. S. Cirel’son, Quantum generalizations of bell’s inequality, Letters in Mathematical Physics4, 93 (1980)

work page 1980
[26]

Navascu ´es, S

M. Navascu ´es, S. Pironio, and A. Ac´ın, Bounding the set of quantum correlations, Physical Review Letters98, 010401 (2007)

work page 2007
[27]

Tavakoli, A

A. Tavakoli, A. Pozas-Kerstjens, P. Brown, and M. Ara´ujo, Semidefinite programming relaxations for quantum correlations, Reviews of Modern Physics96, 045006 (2024)

work page 2024
[28]

Fritz, A

T. Fritz, A. B. Sainz, R. Augusiak, J. B. Brask, R. Chaves, A. Leverrier, and A. Ac ´ın, Local orthogonality as a multipartite principle for quantum correlations, Nature Communications4, 2263 (2013)

work page 2013
[29]

Pawłowski, T

M. Pawłowski, T. Paterek, D. Kaszlikowski, V . Scarani, A. Winter, and M. ˙Zukowski, Information causality as a physical principle, Nature461, 1101 (2009)

work page 2009
[30]

Popescu and D

S. Popescu and D. Rohrlich, Quantum nonlocality as an axiom, Foundations of Physics24, 379 (1994)

work page 1994
[31]

van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159

W. van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159

work page arXiv 2005
[32]

Brassard, H

G. Brassard, H. Buhrman, N. Linden, A. A. M ´ethot, A. Tapp, and F. Unger, Limit on nonlocality in any world in which communication complexity is not trivial, Physical Review Letters96, 250401 (2006)

work page 2006
[33]

P. Jain, M. Gachechiladze, and N. Miklin, Information causality as a tool for bounding the set of quantum correlations, Physical Review Letters133, 160201 (2024)

work page 2024
[34]

The information bottleneck method

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method (2000), arXiv:physics/0004057

work page internal anchor Pith review Pith/arXiv arXiv 2000
[35]

A. A. Alemi, I. Fischer, J. V . Dillon, and K. Murphy, Deep variational information bottleneck, inInternational Conference on Learning Representations(2017)

work page 2017
[36]

Bengio, A

Y . Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence35, 1798 (2013)

work page 2013
[37]

A. M. Saxe, Y . Bansal, J. Dapello, M. S. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, On the information bottleneck theory of deep learning, Journal of Statistical Mechanics: Theory and Experiment2019, 124020 (2019)

work page 2019
[38]

Kawaguchi, Z

K. Kawaguchi, Z. Deng, X. Ji, and J. Huang, How does information bottleneck help deep learning?, inProceedings of the 40th Interna- tional Conference on Machine Learning(2023)

work page 2023
[39]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature549, 195 (2017)

work page 2017
[40]

Cerezo, A

M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, Variational quantum algorithms, Nature Reviews Physics3, 625 (2021)

work page 2021
[41]

J. R. McClean, S. Boixo, V . N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature Communications9, 4812 (2018)

work page 2018
[42]

Schuld and N

M. Schuld and N. Killoran, Quantum machine learning in feature hilbert spaces, Physical Review Letters122, 040504 (2019)

work page 2019
[43]

E. H. Lieb and M. B. Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy, Journal of Mathematical Physics14, 1938 (1973)

work page 1938
[44]

C. H. Bennett and S. J. Wiesner, Communication via one- and two-particle operators on einstein-podolsky-rosen states, Physical Review Letters69, 2881 (1992)

work page 1992
[45]

A. S. Holevo, Bounds for the quantity of information transmitted by a quantum communication channel, Problems of Information Trans- mission9, 177 (1973)

work page 1973
[46]

Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)

L. Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)

work page 2003
[47]

L. D. Brown, T. T. Cai, and A. DasGupta, Interval estimation for a binomial proportion, Statistical Science16, 101 (2001)

work page 2001
[48]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart,Concentration Inequalities: A Nonasymptotic Theory of Independence(Oxford University Press, 2013)

work page 2013
[49]

Bertoni, J

G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, Sponge functions, inECRYPT Hash Workshop(2007)

work page 2007
[50]

Bertoni, J

G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, On the indifferentiability of the sponge construction, inAdvances in Cryptology – EUROCRYPT 2008, Lecture Notes in Computer Science, V ol. 4965 (Springer, 2008) pp. 181–197

work page 2008
[51]

Bertoni, J

G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, Keccak implementation overview (2012), version 3.2

work page 2012
[52]

Wetzels and W

J. Wetzels and W. Bokslag, Sponges and engines: An introduction to keccak and keyak (2015), arXiv:1510.02856 [cs.CR]

work page arXiv 2015