Differentiable Conditional Mutual Information for Multi-Terminal Linear Gaussian Wireless Networks
Pith reviewed 2026-06-26 10:11 UTC · model grok-4.3
The pith
Conditional mutual information in linear Gaussian networks is a log-determinant difference of Schur complements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We obtain I(V_A;V_B | V_C) in closed form: from the node-pair covariances produced by one K-recursion forward pass, it is a log-determinant difference of two sub-block Schur complements of the support covariance. The construction is built entirely from automatic-differentiation primitives, so any differentiable function of finitely many conditional MIs is end-to-end differentiable in the design parameters; this broad class includes linear objectives (weighted sum-rate, secrecy), the rate functions of standard multi-terminal rate regions, and non-linear composites of these.
What carries the argument
The K-recursion forward pass on a linear Gaussian DAG that produces all node-pair covariances, from which conditional MI is extracted as the log-determinant difference of two Schur complements.
If this is right
- Weighted sum-rate and secrecy-rate objectives become directly optimizable by projected gradient methods on the design parameters.
- Rate-region maximization for MIMO multiple-access, broadcast, and interference channels can use the same gradient procedure.
- The same construction applies without change to larger multi-hop networks.
- Any non-linear but differentiable composite of several conditional MIs remains end-to-end differentiable.
Where Pith is reading between the lines
- The same covariance recursion might support joint optimization of discrete coding choices if they can be relaxed into the differentiable graph.
- Feedback or time-varying channels could be handled by unfolding the recursion over multiple time steps while preserving the DAG structure.
- Similar closed-form extractions may exist for other information quantities such as directed information in Gaussian settings.
Load-bearing premise
The multi-terminal network can be represented exactly as a linear Gaussian directed acyclic graph whose node-pair covariances are obtained by a single K-recursion forward pass.
What would settle it
For a two-user MIMO multiple-access channel with known closed-form rate region, compute the formula's value for a chosen covariance and compare it to the standard mutual-information expression evaluated on the same covariance.
Figures
read the original abstract
The rate regions of multi-terminal Gaussian channels (multiple-access, broadcast, interference, relay) are delimited by conditional mutual informations $I(V_A;V_B\,|\,V_C)$ among groups of input and output nodes; bringing such channels under differentiable physical-layer design therefore hinges on evaluating any such conditional MI, and its gradient, on a unified computation graph. Modeling the network as a linear Gaussian directed acyclic graph (Gaussian-DAG), we obtain $I(V_A;V_B\,|\,V_C)$ in closed form: from the node-pair covariances produced by one K-recursion forward pass, it is a log-determinant difference of two sub-block Schur complements of the support covariance. The construction is built entirely from automatic-differentiation (AD) primitives, so any differentiable function of finitely many conditional MIs is end-to-end differentiable in the design parameters; this broad class includes linear objectives (weighted sum-rate, secrecy), the rate functions of standard multi-terminal rate regions, and non-linear composites of these. A single reverse-mode AD sweep yields the Wirtinger gradient with respect to all controllable factors at once, so any such objective can be handled by projected gradient iterations without problem-specific gradient derivation. We demonstrate the framework on three experiments: rate-region maximization for a two-user MIMO multiple-access channel, secure precoding on a MIMO wiretap channel, and the same rate-region objective applied to a larger multi-hop multiple-access network.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that multi-terminal linear Gaussian wireless networks can be modeled exactly as linear Gaussian DAGs, allowing node-pair covariances to be obtained via a single K-recursion forward pass; conditional mutual information I(V_A; V_B | V_C) is then given in closed form as the log-determinant difference of two sub-block Schur complements of the support covariance. This construction uses only AD primitives, so any differentiable function of finitely many such CMIs (including standard rate-region objectives) is end-to-end differentiable, with gradients obtained from a single reverse-mode AD sweep. The framework is illustrated on rate-region maximization for a two-user MIMO MAC, secure precoding for a MIMO wiretap channel, and rate-region optimization on a multi-hop MAC network.
Significance. If the modeling choice and closed-form derivation hold, the work supplies a unified, parameter-free route to gradient-based physical-layer design for multi-terminal Gaussian channels that avoids problem-specific gradient derivations. Credit is due for the explicit reliance on standard covariance algebra and Schur-complement identities (no fitted parameters or circular definitions) together with the AD compatibility that directly yields Wirtinger gradients for linear and composite objectives. The three experiments stay within the stated scope of acyclic linear networks and are consistent with the claimed computational path.
major comments (1)
- [Modeling section (K-recursion forward pass)] The K-recursion forward pass (described in the modeling section preceding the closed-form MI expression) is load-bearing for the central claim that covariances are obtained exactly from one pass; an explicit algebraic verification or small worked example confirming the recursion for a multi-terminal DAG (e.g., the two-user MIMO MAC case) would allow direct checking for gaps or edge cases, consistent with the low soundness rating on the derivation.
minor comments (2)
- [Notation and preliminaries] Notation for the support covariance matrix and the sub-block Schur complements should be introduced with a short definitional paragraph before the log-det expression to improve readability for readers outside the immediate subfield.
- [Experiments] Figure captions in the experimental sections would benefit from listing the precise channel dimensions, noise variances, and power constraints used, aiding reproducibility of the reported rate-region curves.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comment. We address the major comment below and will incorporate the requested verification in the revision.
read point-by-point responses
-
Referee: [Modeling section (K-recursion forward pass)] The K-recursion forward pass (described in the modeling section preceding the closed-form MI expression) is load-bearing for the central claim that covariances are obtained exactly from one pass; an explicit algebraic verification or small worked example confirming the recursion for a multi-terminal DAG (e.g., the two-user MIMO MAC case) would allow direct checking for gaps or edge cases, consistent with the low soundness rating on the derivation.
Authors: We agree that an explicit worked example strengthens verifiability of the K-recursion. The recursion follows directly from the standard covariance propagation rule for linear Gaussian DAGs (each node's covariance is the sum of contributions from its parents plus independent noise), which is applied once per node in topological order. In the revised manuscript we will insert a short algebraic verification for the two-user MIMO MAC: we explicitly compute the node-pair covariances via the K-recursion for a 2x2 MIMO MAC with given channel matrices and noise variances, then confirm that the resulting joint covariance matrix matches the direct (non-recursive) block-matrix expression obtained from the linear model. This addition will be placed immediately after the general K-recursion statement and before the closed-form CMI expression. revision: yes
Circularity Check
No significant circularity; derivation is self-contained via standard identities
full rationale
The central result obtains I(V_A;V_B|V_C) as a log-det difference of Schur complements on covariances produced by a single forward K-recursion on a linear Gaussian DAG. This follows directly from elementary multivariate Gaussian algebra (covariance propagation and Schur complement identities) once the DAG representation is chosen; no parameter is fitted to data and then renamed as a prediction, no load-bearing premise reduces to a self-citation, and the AD compatibility is immediate from the closed-form expression being composed of differentiable matrix operations. The modeling choice itself is presented as representational rather than derived, and the three experiments lie inside the stated linear-Gaussian scope without requiring additional unstated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The network is exactly representable as a linear Gaussian directed acyclic graph.
Reference graph
Works this paper leans on
-
[1]
T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley-Interscience, 2006
2006
-
[2]
El Gamal and Y .-H
A. El Gamal and Y .-H. Kim,Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011
2011
-
[3]
Capacity of multi-antenna Gaussian channels,
˙I. E. Telatar, “Capacity of multi-antenna Gaussian channels,”Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov. 1999
1999
-
[4]
Gradient of mutual information in linear vector Gaussian channels,
D. P. Palomar and S. Verd ´u, “Gradient of mutual information in linear vector Gaussian channels,”IEEE Trans. Inf. Theory, vol. 52, no. 1, pp. 141–154, Jan. 2006
2006
-
[5]
Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,
D. N. C. Tse and S. V . Hanly, “Multiaccess fading channels—Part I: Polymatroid structure, optimal resource allocation, and throughput capacities,”IEEE Trans. Inf. Theory, vol. 44, no. 7, pp. 2796–2815, Nov. 1998
1998
-
[6]
PyTorch: An imperative style, high-performance deep learning library,
A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” inProc. NeurIPS, 2019, pp. 8024–8035
2019
-
[7]
A new achievable rate region for the interference channel,
T. S. Han and K. Kobayashi, “A new achievable rate region for the interference channel,”IEEE Trans. Inf. Theory, vol. 27, no. 1, pp. 49–60, Jan. 1981
1981
-
[8]
Capacity theorems for the relay channel,
T. M. Cover and A. A. El Gamal, “Capacity theorems for the relay channel,”IEEE Trans. Inf. Theory, vol. 25, no. 5, pp. 572–584, Sep. 1979
1979
-
[9]
On the achievable throughput of a multiantenna Gaussian broadcast channel,
G. Caire and S. Shamai (Shitz), “On the achievable throughput of a multiantenna Gaussian broadcast channel,”IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, Jul. 2003
2003
-
[10]
T. Wadayama and S. Na, “Mutual information optimization via K- recursion and automatic differentiation for linear Gaussian wireless networks,”arXiv preprint arXiv:2606.06982, Jun. 2026
Pith/arXiv arXiv 2026
-
[11]
T. Wadayama, “Information gradient for directed acyclic graphs: A score-based framework for end-to-end mutual information maximization,” arXiv preprint arXiv:2601.01789, Jan. 2026
arXiv 2026
-
[12]
Mutual information neural estimation,
M. I. Belghaziet al., “Mutual information neural estimation,” inProc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 531–540
2018
-
[13]
Representation learning with contrastive predictive coding,
A. van den Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, Jul. 2018
Pith/arXiv arXiv 2018
-
[14]
Automatic differentiation in machine learning: A survey,
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: A survey,”J. Mach. Learn. Res., vol. 18, pp. 1–43, 2018
2018
-
[15]
P. J. Schreier and L. L. Scharf,Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals. Cambridge, U.K.: Cambridge Univ. Press, 2010
2010
-
[16]
Gaussian influence diagrams,
R. D. Shachter and C. R. Kenley, “Gaussian influence diagrams,”Manage. Sci., vol. 35, no. 5, pp. 527–550, May 1989
1989
-
[17]
Learning Gaussian networks,
D. Geiger and D. Heckerman, “Learning Gaussian networks,” inProc. 10th Conf. Uncertainty Artif. Intell. (UAI), 1994, pp. 235–243
1994
-
[18]
Trek separation for Gaussian graphical models,
S. Sullivant, K. Talaska, and J. Draisma, “Trek separation for Gaussian graphical models,”Ann. Statist., vol. 38, no. 3, pp. 1665–1685, 2010
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.