Differentiable Conditional Mutual Information for Multi-Terminal Linear Gaussian Wireless Networks

Siqi Na; Tadashi Wadayama

arxiv: 2606.22301 · v1 · pith:T6JLUHSCnew · submitted 2026-06-21 · 💻 cs.IT · math.IT

Differentiable Conditional Mutual Information for Multi-Terminal Linear Gaussian Wireless Networks

Tadashi Wadayama , Siqi Na This is my paper

Pith reviewed 2026-06-26 10:11 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords conditional mutual informationGaussian DAGSchur complementdifferentiable optimizationmulti-terminal channelsautomatic differentiationrate regionwireless networks

0 comments

The pith

Conditional mutual information in linear Gaussian networks is a log-determinant difference of Schur complements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a closed-form expression for any conditional mutual information I(V_A; V_B | V_C) among groups of nodes in a multi-terminal linear Gaussian wireless network. By representing the network as a linear Gaussian directed acyclic graph, all required node-pair covariances follow from a single forward recursion, after which the mutual information is obtained directly as a difference of two log-determinants of sub-block Schur complements. The entire construction uses only automatic-differentiation primitives, so any objective composed of finitely many such terms, including weighted sum-rates, secrecy rates, and full rate-region functions, is end-to-end differentiable with respect to all controllable parameters. A single reverse-mode sweep then supplies the gradient for all parameters at once.

Core claim

We obtain I(V_A;V_B | V_C) in closed form: from the node-pair covariances produced by one K-recursion forward pass, it is a log-determinant difference of two sub-block Schur complements of the support covariance. The construction is built entirely from automatic-differentiation primitives, so any differentiable function of finitely many conditional MIs is end-to-end differentiable in the design parameters; this broad class includes linear objectives (weighted sum-rate, secrecy), the rate functions of standard multi-terminal rate regions, and non-linear composites of these.

What carries the argument

The K-recursion forward pass on a linear Gaussian DAG that produces all node-pair covariances, from which conditional MI is extracted as the log-determinant difference of two Schur complements.

If this is right

Weighted sum-rate and secrecy-rate objectives become directly optimizable by projected gradient methods on the design parameters.
Rate-region maximization for MIMO multiple-access, broadcast, and interference channels can use the same gradient procedure.
The same construction applies without change to larger multi-hop networks.
Any non-linear but differentiable composite of several conditional MIs remains end-to-end differentiable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same covariance recursion might support joint optimization of discrete coding choices if they can be relaxed into the differentiable graph.
Feedback or time-varying channels could be handled by unfolding the recursion over multiple time steps while preserving the DAG structure.
Similar closed-form extractions may exist for other information quantities such as directed information in Gaussian settings.

Load-bearing premise

The multi-terminal network can be represented exactly as a linear Gaussian directed acyclic graph whose node-pair covariances are obtained by a single K-recursion forward pass.

What would settle it

For a two-user MIMO multiple-access channel with known closed-form rate region, compute the formula's value for a chosen covariance and compare it to the standard mutual-information expression evaluated on the same covariance.

Figures

Figures reproduced from arXiv: 2606.22301 by Siqi Na, Tadashi Wadayama.

**Figure 2.** Figure 2: Rate-region maximization on a fixed two-user MIMO MAC ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Secure precoding on a fixed MIMO wiretap channel ( [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Rate-region maximization on a fixed multi-hop Gaussian MAC network [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

The rate regions of multi-terminal Gaussian channels (multiple-access, broadcast, interference, relay) are delimited by conditional mutual informations $I(V_A;V_B\,|\,V_C)$ among groups of input and output nodes; bringing such channels under differentiable physical-layer design therefore hinges on evaluating any such conditional MI, and its gradient, on a unified computation graph. Modeling the network as a linear Gaussian directed acyclic graph (Gaussian-DAG), we obtain $I(V_A;V_B\,|\,V_C)$ in closed form: from the node-pair covariances produced by one K-recursion forward pass, it is a log-determinant difference of two sub-block Schur complements of the support covariance. The construction is built entirely from automatic-differentiation (AD) primitives, so any differentiable function of finitely many conditional MIs is end-to-end differentiable in the design parameters; this broad class includes linear objectives (weighted sum-rate, secrecy), the rate functions of standard multi-terminal rate regions, and non-linear composites of these. A single reverse-mode AD sweep yields the Wirtinger gradient with respect to all controllable factors at once, so any such objective can be handled by projected gradient iterations without problem-specific gradient derivation. We demonstrate the framework on three experiments: rate-region maximization for a two-user MIMO multiple-access channel, secure precoding on a MIMO wiretap channel, and the same rate-region objective applied to a larger multi-hop multiple-access network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gives a closed-form, AD-compatible expression for conditional MI on Gaussian DAGs via Schur complements after one covariance recursion, which removes the need for hand-derived gradients on standard multi-terminal rate objectives.

read the letter

The paper's core move is to treat the multi-terminal linear Gaussian network as a Gaussian DAG, run a single K-recursion to obtain all node-pair covariances, then express I(V_A; V_B | V_C) directly as the log-det difference of two sub-block Schur complements of the joint covariance. Once that is granted, any differentiable function of finitely many such terms becomes end-to-end differentiable, so weighted sum-rate, secrecy rate, or composite objectives can be optimized by projected gradient steps without writing a new gradient expression for each topology.

That construction is new in the wireless information-theory literature. Prior work either left the gradients to be derived case-by-case or used black-box estimators; here the expression is algebraic and sits inside an AD graph. The three reported experiments (two-user MIMO MAC, MIMO wiretap, and a larger multi-hop MAC) are consistent with the modeling scope and show that the same code path handles different rate-region problems.

The main limitation is the modeling premise itself: the network must be exactly representable as a linear Gaussian DAG whose covariances come from one forward K-recursion. Topologies with cycles or essential nonlinearities fall outside the stated guarantee, though the paper does not claim to cover them. The derivation steps themselves appear to be standard covariance identities, so the algebraic risk looks low once the DAG representation is accepted.

The work is aimed at people doing physical-layer optimization on Gaussian multi-user channels who already use gradient methods. It supplies a reusable primitive rather than a single new rate region. A serious referee should see it because the primitive is cleanly stated, the experiments are on-point, and the scope is explicit.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that multi-terminal linear Gaussian wireless networks can be modeled exactly as linear Gaussian DAGs, allowing node-pair covariances to be obtained via a single K-recursion forward pass; conditional mutual information I(V_A; V_B | V_C) is then given in closed form as the log-determinant difference of two sub-block Schur complements of the support covariance. This construction uses only AD primitives, so any differentiable function of finitely many such CMIs (including standard rate-region objectives) is end-to-end differentiable, with gradients obtained from a single reverse-mode AD sweep. The framework is illustrated on rate-region maximization for a two-user MIMO MAC, secure precoding for a MIMO wiretap channel, and rate-region optimization on a multi-hop MAC network.

Significance. If the modeling choice and closed-form derivation hold, the work supplies a unified, parameter-free route to gradient-based physical-layer design for multi-terminal Gaussian channels that avoids problem-specific gradient derivations. Credit is due for the explicit reliance on standard covariance algebra and Schur-complement identities (no fitted parameters or circular definitions) together with the AD compatibility that directly yields Wirtinger gradients for linear and composite objectives. The three experiments stay within the stated scope of acyclic linear networks and are consistent with the claimed computational path.

major comments (1)

[Modeling section (K-recursion forward pass)] The K-recursion forward pass (described in the modeling section preceding the closed-form MI expression) is load-bearing for the central claim that covariances are obtained exactly from one pass; an explicit algebraic verification or small worked example confirming the recursion for a multi-terminal DAG (e.g., the two-user MIMO MAC case) would allow direct checking for gaps or edge cases, consistent with the low soundness rating on the derivation.

minor comments (2)

[Notation and preliminaries] Notation for the support covariance matrix and the sub-block Schur complements should be introduced with a short definitional paragraph before the log-det expression to improve readability for readers outside the immediate subfield.
[Experiments] Figure captions in the experimental sections would benefit from listing the precise channel dimensions, noise variances, and power constraints used, aiding reproducibility of the reported rate-region curves.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comment. We address the major comment below and will incorporate the requested verification in the revision.

read point-by-point responses

Referee: [Modeling section (K-recursion forward pass)] The K-recursion forward pass (described in the modeling section preceding the closed-form MI expression) is load-bearing for the central claim that covariances are obtained exactly from one pass; an explicit algebraic verification or small worked example confirming the recursion for a multi-terminal DAG (e.g., the two-user MIMO MAC case) would allow direct checking for gaps or edge cases, consistent with the low soundness rating on the derivation.

Authors: We agree that an explicit worked example strengthens verifiability of the K-recursion. The recursion follows directly from the standard covariance propagation rule for linear Gaussian DAGs (each node's covariance is the sum of contributions from its parents plus independent noise), which is applied once per node in topological order. In the revised manuscript we will insert a short algebraic verification for the two-user MIMO MAC: we explicitly compute the node-pair covariances via the K-recursion for a 2x2 MIMO MAC with given channel matrices and noise variances, then confirm that the resulting joint covariance matrix matches the direct (non-recursive) block-matrix expression obtained from the linear model. This addition will be placed immediately after the general K-recursion statement and before the closed-form CMI expression. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via standard identities

full rationale

The central result obtains I(V_A;V_B|V_C) as a log-det difference of Schur complements on covariances produced by a single forward K-recursion on a linear Gaussian DAG. This follows directly from elementary multivariate Gaussian algebra (covariance propagation and Schur complement identities) once the DAG representation is chosen; no parameter is fitted to data and then renamed as a prediction, no load-bearing premise reduces to a self-citation, and the AD compatibility is immediate from the closed-form expression being composed of differentiable matrix operations. The modeling choice itself is presented as representational rather than derived, and the three experiments lie inside the stated linear-Gaussian scope without requiring additional unstated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling assumption that the network is exactly a linear Gaussian DAG and that node covariances are produced exactly by one K-recursion; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)

domain assumption The network is exactly representable as a linear Gaussian directed acyclic graph.
Stated in the modeling sentence of the abstract.

pith-pipeline@v0.9.1-grok · 5790 in / 1215 out tokens · 28329 ms · 2026-06-26T10:11:00.974039+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 linked inside Pith

[1]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006

2006
[2]

El Gamal and Y .-H

A. El Gamal and Y .-H. Kim,Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011

2011
[3]

Capacity of multi-antenna Gaussian channels,

˙I. E. Telatar, “Capacity of multi-antenna Gaussian channels,”Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov. 1999

1999
[4]

Gradient of mutual information in linear vector Gaussian channels,

D. P. Palomar and S. Verd ´u, “Gradient of mutual information in linear vector Gaussian channels,”IEEE Trans. Inf. Theory, vol. 52, no. 1, pp. 141–154, Jan. 2006

2006
[5]

Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,

D. N. C. Tse and S. V . Hanly, “Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,”IEEE Trans. Inf. Theory, vol. 44, no. 7, pp. 2796–2815, Nov. 1998

1998
[6]

PyTorch: An imperative style, high-performance deep learning library,

A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” inProc. NeurIPS, 2019, pp. 8024–8035

2019
[7]

A new achievable rate region for the interference channel,

T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,”IEEE Trans. Inf. Theory, vol. 27, no. 1, pp. 49–60, Jan. 1981

1981
[8]

Capacity theorems for the relay channel,

T. M. Cover and A. A. El Gamal, “Capacity theorems for the relay channel,”IEEE Trans. Inf. Theory, vol. 25, no. 5, pp. 572–584, Sep. 1979

1979
[9]

On the achievable throughput of a multiantenna Gaussian broadcast channel,

G. Caire and S. Shamai (Shitz), “On the achievable throughput of a multiantenna Gaussian broadcast channel,”IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, Jul. 2003

2003
[10]

Mutual information optimization via K- recursion and automatic differentiation for linear Gaussian wireless networks,

T. Wadayama and S. Na, “Mutual information optimization via K- recursion and automatic differentiation for linear Gaussian wireless networks,”arXiv preprint arXiv:2606.06982, Jun. 2026

Pith/arXiv arXiv 2026
[11]

Information gradient for directed acyclic graphs: A score-based framework for end-to-end mutual information maximization,

T. Wadayama, “Information gradient for directed acyclic graphs: A score-based framework for end-to-end mutual information maximization,” arXiv preprint arXiv:2601.01789, Jan. 2026

arXiv 2026
[12]

Mutual information neural estimation,

M. I. Belghaziet al., “Mutual information neural estimation,” inProc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 531–540

2018
[13]

Representation learning with contrastive predictive coding,

A. van den Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, Jul. 2018

Pith/arXiv arXiv 2018
[14]

Automatic differentiation in machine learning: A survey,

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: A survey,”J. Mach. Learn. Res., vol. 18, pp. 1–43, 2018

2018
[15]

P. J. Schreier and L. L. Scharf,Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals. Cambridge, U.K.: Cambridge Univ. Press, 2010

2010
[16]

Gaussian influence diagrams,

R. D. Shachter and C. R. Kenley, “Gaussian influence diagrams,”Manage. Sci., vol. 35, no. 5, pp. 527–550, May 1989

1989
[17]

Learning Gaussian networks,

D. Geiger and D. Heckerman, “Learning Gaussian networks,” inProc. 10th Conf. Uncertainty Artif. Intell. (UAI), 1994, pp. 235–243

1994
[18]

Trek separation for Gaussian graphical models,

S. Sullivant, K. Talaska, and J. Draisma, “Trek separation for Gaussian graphical models,”Ann. Statist., vol. 38, no. 3, pp. 1665–1685, 2010

2010

[1] [1]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006

2006

[2] [2]

El Gamal and Y .-H

A. El Gamal and Y .-H. Kim,Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011

2011

[3] [3]

Capacity of multi-antenna Gaussian channels,

˙I. E. Telatar, “Capacity of multi-antenna Gaussian channels,”Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov. 1999

1999

[4] [4]

Gradient of mutual information in linear vector Gaussian channels,

D. P. Palomar and S. Verd ´u, “Gradient of mutual information in linear vector Gaussian channels,”IEEE Trans. Inf. Theory, vol. 52, no. 1, pp. 141–154, Jan. 2006

2006

[5] [5]

Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,

D. N. C. Tse and S. V . Hanly, “Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,”IEEE Trans. Inf. Theory, vol. 44, no. 7, pp. 2796–2815, Nov. 1998

1998

[6] [6]

PyTorch: An imperative style, high-performance deep learning library,

A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” inProc. NeurIPS, 2019, pp. 8024–8035

2019

[7] [7]

A new achievable rate region for the interference channel,

T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,”IEEE Trans. Inf. Theory, vol. 27, no. 1, pp. 49–60, Jan. 1981

1981

[8] [8]

Capacity theorems for the relay channel,

T. M. Cover and A. A. El Gamal, “Capacity theorems for the relay channel,”IEEE Trans. Inf. Theory, vol. 25, no. 5, pp. 572–584, Sep. 1979

1979

[9] [9]

On the achievable throughput of a multiantenna Gaussian broadcast channel,

G. Caire and S. Shamai (Shitz), “On the achievable throughput of a multiantenna Gaussian broadcast channel,”IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, Jul. 2003

2003

[10] [10]

Mutual information optimization via K- recursion and automatic differentiation for linear Gaussian wireless networks,

T. Wadayama and S. Na, “Mutual information optimization via K- recursion and automatic differentiation for linear Gaussian wireless networks,”arXiv preprint arXiv:2606.06982, Jun. 2026

Pith/arXiv arXiv 2026

[11] [11]

Information gradient for directed acyclic graphs: A score-based framework for end-to-end mutual information maximization,

T. Wadayama, “Information gradient for directed acyclic graphs: A score-based framework for end-to-end mutual information maximization,” arXiv preprint arXiv:2601.01789, Jan. 2026

arXiv 2026

[12] [12]

Mutual information neural estimation,

M. I. Belghaziet al., “Mutual information neural estimation,” inProc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 531–540

2018

[13] [13]

Representation learning with contrastive predictive coding,

A. van den Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, Jul. 2018

Pith/arXiv arXiv 2018

[14] [14]

Automatic differentiation in machine learning: A survey,

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: A survey,”J. Mach. Learn. Res., vol. 18, pp. 1–43, 2018

2018

[15] [15]

P. J. Schreier and L. L. Scharf,Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals. Cambridge, U.K.: Cambridge Univ. Press, 2010

2010

[16] [16]

Gaussian influence diagrams,

R. D. Shachter and C. R. Kenley, “Gaussian influence diagrams,”Manage. Sci., vol. 35, no. 5, pp. 527–550, May 1989

1989

[17] [17]

Learning Gaussian networks,

D. Geiger and D. Heckerman, “Learning Gaussian networks,” inProc. 10th Conf. Uncertainty Artif. Intell. (UAI), 1994, pp. 235–243

1994

[18] [18]

Trek separation for Gaussian graphical models,

S. Sullivant, K. Talaska, and J. Draisma, “Trek separation for Gaussian graphical models,”Ann. Statist., vol. 38, no. 3, pp. 1665–1685, 2010

2010