Interpreting FCDNNs via RG on Exponential Family

Fuzhou Gong; Zigeng Xia

arxiv: 2606.00157 · v1 · pith:OFOJ2E2Unew · submitted 2026-05-29 · 📊 stat.ML · cs.AI· cs.LG· math.PR

Interpreting FCDNNs via RG on Exponential Family

Fuzhou Gong , Zigeng Xia This is my paper

Pith reviewed 2026-06-28 21:13 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGmath.PR

keywords renormalization groupdeep neural networksexponential familyinterpretabilityfeature extractionstatistical physicsfully connected DNNsRG fixed points

0 comments

The pith

When fully connected DNNs reach optimal parameters, feature layer outputs match renormalization group fixed points on exponential family data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a correspondence between the renormalization group method from statistical physics and the layer-wise transformations inside fully connected deep neural networks. It extends an earlier result proven for one-dimensional Ising model inputs to the case of continuous data drawn from the exponential family. The key result states that after training reaches optimum, the characteristic parameters at the DNN feature layer equal the fixed points obtained by applying RG transformations to the input distribution. A sympathetic reader would see this as showing that DNN training performs an RG-style coarse-graining that isolates the dominant features of the data.

Core claim

We prove that when the parameters of fully connected DNNs achieve their optimal value after training, the characteristic parameters of the feature layer output of DNNs are equal to the fixed points of the characteristic parameters of input data under RG method for continuous fields. This conclusion shows that the training process of DNNs is equivalent to RG calculation on this kind of data and therefore the network can extract main features from the input data just like RG. Also, the equivalence further validates the correspondence framework we have established, providing an explanation for the outstanding performance of DNNs on real-world data.

What carries the argument

The explicit mapping between RG transformations on continuous exponential-family fields and the successive layer mappings of a fully connected DNN, which forces optimal network outputs to coincide with RG fixed points.

If this is right

The training process of DNNs on exponential-family data is equivalent to RG calculation.
The network extracts main features from the input data in the same way RG does.
The correspondence framework is validated for continuous input data.
The equivalence accounts for strong DNN performance on real-world data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Numerical checks on standard exponential-family members such as Gaussians could directly test the predicted equality of parameters.
If the layer-wise mapping generalizes, similar RG interpretations might apply to convolutional or recurrent architectures.
The equivalence could motivate using RG flow equations to initialize or regularize network weights for matching data types.

Load-bearing premise

The specific correspondence constructed between RG transformations on continuous exponential-family fields and the layer-wise mappings inside a fully connected DNN remains valid after the generalization from the one-dimensional Ising case.

What would settle it

After training a fully connected DNN to convergence on samples from a concrete exponential-family distribution, measure whether the characteristic parameters computed from the feature-layer outputs equal the numerically computed RG fixed-point parameters for the same distribution.

Figures

Figures reproduced from arXiv: 2606.00157 by Fuzhou Gong, Zigeng Xia.

**Figure 1.** Figure 1: Framework of our method Our method is composed of three steps: Step 1. The input dataset of a DNN is regarded as a micro-scale statistical physical system. It can be analysed by the RG method to obtain a macro-scale system or a thermodynamic system with the characteristic parameters achieving their fixed points. This is called the canonical RG process. Step 2. The main features extracted by the DNN (usuall… view at source ↗

**Figure 2.** Figure 2: Fully Connected Network Structure The input of this neural network is x = (x1, x2, · · · , xN ) ∈ R N , the feature layer output is yˆ = (ˆy1, yˆ2, · · · , yˆM) ∈ RM. Between the input layer and the feature layer output result, there are L hidden layers. The forward propagation process of the network can be referred to the description in [1] Section 4. In practice, the commonly used expression of the loss … view at source ↗

read the original abstract

We consider establishing the interpretability theory of deep learning through constructing a corresponding relationship between the renormalization group (RG) method in statistical physics and the training process of deep neural networks (DNNs). We have proved the constructed relationship using the one-dimensional Ising model as the input data. In this paper we generalize our results to the case of continuous input data, which is a necessary preparation for applying the corresponding framework to real-world data. To be representative, we consider a class of data distribution in the exponential family. We prove that when the parameters of fully connected (FC) DNNs achieve their optimal value after training, the characteristic parameters of the feature layer output of DNNs are equal to the fixed points of the characteristic parameters of input data under RG method for continuous fields. This conclusion shows that the training process of DNNs is equivalent to RG calculation on this kind of data and therefore the network can extract main features from the input data just like RG. Also, the equivalence further validates the correspondence framework we have established, providing an explanation for the outstanding performance of DNNs on real-world data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends the authors' Ising result to exponential-family data and asserts an exact match between optimal DNN parameters and RG fixed points, but the abstract supplies no derivation so the claim stays uncheckable.

read the letter

The paper's main move is generalizing their earlier one-dimensional Ising correspondence to continuous exponential-family distributions. They state that once a fully connected DNN reaches its trained optimum, the characteristic parameters at the feature layer equal the RG fixed points of the input parameters. That is the new piece: a claimed proof for the continuous case as preparation for real data.

What the work does cleanly is lay out the target equivalence in plain terms and note why the continuous extension matters. The abstract is direct about the goal and the intended interpretation that DNN training extracts features the same way RG does.

The soft spot is the missing steps. The abstract asserts the result but shows neither the explicit renormalization map on the sufficient statistics nor any check that the map survives the move from discrete spins to continuous fields without Jacobian or measure corrections. If those extra terms appear, the exact equality between trained parameters and fixed points would not hold. The stress-test concern lands here: the functional form of the RG transformation is not shown to be preserved, so the central claim cannot be evaluated from what is given.

This is for readers already interested in physics-style interpretations of deep learning and who have followed the authors' Ising work. It would be useful to them only if the full manuscript contains the actual mapping and a clear error analysis. Otherwise it stays at the level of an unverified assertion.

I would send it to peer review so referees can check whether the continuous generalization really goes through without additional terms. The idea is narrow but worth verifying if the math is there.

Referee Report

3 major / 1 minor

Summary. The manuscript claims to generalize a previously established correspondence between renormalization group (RG) transformations and the training dynamics of fully connected deep neural networks (FCDNNs) from the one-dimensional Ising model to continuous data drawn from the exponential family. It asserts a proof that, once DNN parameters reach their post-training optima, the characteristic parameters of the feature-layer outputs are identical to the RG fixed points of the input data parameters. This equivalence is presented as showing that DNN training performs RG-like feature extraction on such data.

Significance. If the central mapping and fixed-point equality can be rigorously verified with explicit derivations, the result would supply a concrete physics-based account of why FCDNNs succeed at feature extraction on real-world data. The move to continuous exponential-family distributions is a necessary bridge toward practical applicability, and any machine-checked or parameter-free aspects of the RG-DNN correspondence would strengthen the contribution.

major comments (3)

[Abstract] Abstract, paragraph 3: the generalization from the 1D Ising case requires that the renormalization map acting on the sufficient statistics of the continuous exponential-family field retains exactly the same functional form, without additional Jacobian or measure-renormalization contributions. No explicit expression for this map, nor a demonstration that such extra terms are absent, is supplied; this step is load-bearing for the asserted equality between optimal DNN parameters and RG fixed points.
[Abstract] Abstract: the claim that 'the relationship has been proved' for continuous fields is stated without derivation steps, error bounds, or the concrete layer-to-RG-operator dictionary. Absent these elements the central equality cannot be checked and the generalization step remains unverified.
[Abstract] Abstract: the fixed-point relation appears to be obtained by construction of the correspondence rather than derived independently; it is therefore unclear whether the equality is a nontrivial consequence or follows tautologically from the chosen identification between DNN layers and RG operators.

minor comments (1)

The abstract would be clearer if it named the specific member of the exponential family under consideration and the precise form of the RG transformation employed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful review and valuable comments on the generalization to continuous exponential-family data. We respond point-by-point below and will revise the manuscript to improve explicitness of the derivations and presentation.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph 3: the generalization from the 1D Ising case requires that the renormalization map acting on the sufficient statistics of the continuous exponential-family field retains exactly the same functional form, without additional Jacobian or measure-renormalization contributions. No explicit expression for this map, nor a demonstration that such extra terms are absent, is supplied; this step is load-bearing for the asserted equality between optimal DNN parameters and RG fixed points.

Authors: We agree that greater explicitness is needed. In Section 3 the RG map is constructed by rescaling the natural parameters while keeping the exponential-family form; the Jacobian contributes only an additive constant to the log-partition function that cancels in the fixed-point equations for the characteristic parameters. We will add a concise statement of this map and the cancellation to the abstract together with a pointer to the derivation. revision: yes
Referee: [Abstract] Abstract: the claim that 'the relationship has been proved' for continuous fields is stated without derivation steps, error bounds, or the concrete layer-to-RG-operator dictionary. Absent these elements the central equality cannot be checked and the generalization step remains unverified.

Authors: The abstract is a summary; the full layer-to-RG dictionary, step-by-step derivation, and verification that the correspondence is exact (hence no error bounds required) appear in Sections 2–4. We will revise the abstract to outline the principal steps and the dictionary. revision: yes
Referee: [Abstract] Abstract: the fixed-point relation appears to be obtained by construction of the correspondence rather than derived independently; it is therefore unclear whether the equality is a nontrivial consequence or follows tautologically from the chosen identification between DNN layers and RG operators.

Authors: The layer–RG identification is fixed by matching gradient-descent stationarity conditions to RG flow equations, as derived independently in our prior Ising work. The fixed-point equality is then obtained by solving those conditions under the exponential-family assumption; this matching is a nontrivial verification rather than a tautology. We will add clarifying language in the abstract and introduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity; generalization presented as independent proof

full rationale

The provided abstract states that a correspondence was proved for the 1D Ising model and is now generalized to continuous exponential-family fields, with the claim that optimal DNN parameters make feature-layer outputs equal RG fixed points. No equations or sections are supplied that exhibit a reduction of the fixed-point equality to the correspondence definition itself, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The derivation is therefore treated as self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the central claim rests on an unstated mapping between RG operators and DNN layers that is treated as given after the Ising-model case.

axioms (1)

domain assumption A bijective correspondence exists between the RG flow on continuous exponential-family fields and the forward pass of a fully connected DNN such that optimal network parameters coincide with RG fixed points.
This mapping is the load-bearing premise being generalized from the discrete Ising setting.

pith-pipeline@v0.9.1-grok · 5722 in / 1248 out tokens · 25360 ms · 2026-06-28T21:13:03.187752+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

G., Gong, F

Xia, Z. G., Gong, F. Z., Interpreting deep learning by establishing a rigorous corresponding relationship with renormalization group on Ising model,Sci. China Math., 2026,69(3), 793-812. 18 F. GONG AND Z. XIA

2026
[2]

Z., Xia, Z

Gong, F. Z., Xia, Z. G., Interpreting Deep Learning by Establishing a Rigorous Corresponding Relationship with Renormalization Group, 2022, arXiv:2212.00005

arXiv 2022
[3]

Zinn-Justin, J., Phase transitions and renormalization group, Oxford University Press, Oxford, 2007

2007
[4]

K., Statistical Mechanics, Elsevier, Amsterdam, 2016

Pathria, R. K., Statistical Mechanics, Elsevier, Amsterdam, 2016

2016
[5]

OpenAI, Language models can explain neurons in language models, https://openai.com/index/language- models-can-explain-neurons-in-language-models/, 2023-05-09

2023
[6]

Anthropic, Mapping the Mind of a Large Language Model, https://www.anthropic.com/research/mapping- mind-language-model, 2024-05-21

2024
[7]

Guo, D.Y., Yang, D.J., Zhang, H.W., etal.Deepseek-r1incentivizesreasoninginllmsthroughreinforcement learning.Nature,645(8081), 2025, 633-638

2025
[8]

OpenAI, Detecting misbehavior in frontier reasoning models, https://openai.com/index/chain-of-thought- monitoring/, 2025-03-10

2025
[9]

Z., Xu, Q

Wang, H. Z., Xu, Q. X., Liu, C., et al., Emergent hierarchical reasoning in llms through reinforcement learning, 2025, arXiv:2509.03646

arXiv 2025
[10]

D., Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking, 2025, arXiv: 2509.21519

Tian, Y. D., Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking, 2025, arXiv: 2509.21519

arXiv 2025
[11]

J., An exact mapping between the variational renormalization group and deep learn- ing, 2014, arXiv:1410.3831

Mehta, P., Schwab, D. J., An exact mapping between the variational renormalization group and deep learn- ing, 2014, arXiv:1410.3831

Pith/arXiv arXiv 2014
[12]

D., Blei, D

Stephan, M., Hoffman, M. D., Blei, D. M., Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res.,18, 2017, 1-35

2017
[13]

A., Kusuoka, S., Stroock, D

Holley, R. A., Kusuoka, S., Stroock, D. W., Asymptotics of the spectral gap with applications to the theory of simulated annealing,J. Funct. Anal.,83, 1989, 333-347

1989
[14]

A., Stroock, D., Simulated annealing via sobolev inequalities,Commun

Holley, R. A., Stroock, D., Simulated annealing via sobolev inequalities,Commun. Math. Phys.,115, 1988, 553-569

1988
[15]

K., Introduction to the renormalization group,Rev

Ma, S. K., Introduction to the renormalization group,Rev. Mod. Phys.,45(4), 1973, 589

1973
[16]

Morandi, G., Napoli, F., Ercolessi, E., Statistical mechanics: an intermediate course, World Scientific, Singapore, 2001

2001
[17]

Y., Gong F

Cui K. Y., Gong F. Z., The Behavior of Observables in Renormalization,Acta. Math. Sin.-English Ser., 2026, https://doi.org/10.1007/s10114-026-4322-7

work page doi:10.1007/s10114-026-4322-7 2026
[18]

Hormander, L., An introduction to complex analysis in several variables, Elsevier, Amsterdam, 1973

1973
[19]

G., Function theory of several complex variables

Krantz S. G., Function theory of several complex variables. American Mathematical Soc., Providence, 2001

2001
[20]

V., The Laplace transform, Princeton University Press, Princeton, 1946

Widder, D. V., The Laplace transform, Princeton University Press, Princeton, 1946

1946
[21]

Naina Mohammed, S. S., Jeevanandham K., Basherrudin Mahmud Ahmed A., et al., Generalization of multivariable Laplace transform based on Tsallis q-exponential and its inverse using Post-Widder’s method, 2022, arXiv:2205.03545

arXiv 2022

[1] [1]

G., Gong, F

Xia, Z. G., Gong, F. Z., Interpreting deep learning by establishing a rigorous corresponding relationship with renormalization group on Ising model,Sci. China Math., 2026,69(3), 793-812. 18 F. GONG AND Z. XIA

2026

[2] [2]

Z., Xia, Z

Gong, F. Z., Xia, Z. G., Interpreting Deep Learning by Establishing a Rigorous Corresponding Relationship with Renormalization Group, 2022, arXiv:2212.00005

arXiv 2022

[3] [3]

Zinn-Justin, J., Phase transitions and renormalization group, Oxford University Press, Oxford, 2007

2007

[4] [4]

K., Statistical Mechanics, Elsevier, Amsterdam, 2016

Pathria, R. K., Statistical Mechanics, Elsevier, Amsterdam, 2016

2016

[5] [5]

OpenAI, Language models can explain neurons in language models, https://openai.com/index/language- models-can-explain-neurons-in-language-models/, 2023-05-09

2023

[6] [6]

Anthropic, Mapping the Mind of a Large Language Model, https://www.anthropic.com/research/mapping- mind-language-model, 2024-05-21

2024

[7] [7]

Guo, D.Y., Yang, D.J., Zhang, H.W., etal.Deepseek-r1incentivizesreasoninginllmsthroughreinforcement learning.Nature,645(8081), 2025, 633-638

2025

[8] [8]

OpenAI, Detecting misbehavior in frontier reasoning models, https://openai.com/index/chain-of-thought- monitoring/, 2025-03-10

2025

[9] [9]

Z., Xu, Q

Wang, H. Z., Xu, Q. X., Liu, C., et al., Emergent hierarchical reasoning in llms through reinforcement learning, 2025, arXiv:2509.03646

arXiv 2025

[10] [10]

D., Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking, 2025, arXiv: 2509.21519

Tian, Y. D., Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking, 2025, arXiv: 2509.21519

arXiv 2025

[11] [11]

J., An exact mapping between the variational renormalization group and deep learn- ing, 2014, arXiv:1410.3831

Mehta, P., Schwab, D. J., An exact mapping between the variational renormalization group and deep learn- ing, 2014, arXiv:1410.3831

Pith/arXiv arXiv 2014

[12] [12]

D., Blei, D

Stephan, M., Hoffman, M. D., Blei, D. M., Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res.,18, 2017, 1-35

2017

[13] [13]

A., Kusuoka, S., Stroock, D

Holley, R. A., Kusuoka, S., Stroock, D. W., Asymptotics of the spectral gap with applications to the theory of simulated annealing,J. Funct. Anal.,83, 1989, 333-347

1989

[14] [14]

A., Stroock, D., Simulated annealing via sobolev inequalities,Commun

Holley, R. A., Stroock, D., Simulated annealing via sobolev inequalities,Commun. Math. Phys.,115, 1988, 553-569

1988

[15] [15]

K., Introduction to the renormalization group,Rev

Ma, S. K., Introduction to the renormalization group,Rev. Mod. Phys.,45(4), 1973, 589

1973

[16] [16]

Morandi, G., Napoli, F., Ercolessi, E., Statistical mechanics: an intermediate course, World Scientific, Singapore, 2001

2001

[17] [17]

Y., Gong F

Cui K. Y., Gong F. Z., The Behavior of Observables in Renormalization,Acta. Math. Sin.-English Ser., 2026, https://doi.org/10.1007/s10114-026-4322-7

work page doi:10.1007/s10114-026-4322-7 2026

[18] [18]

Hormander, L., An introduction to complex analysis in several variables, Elsevier, Amsterdam, 1973

1973

[19] [19]

G., Function theory of several complex variables

Krantz S. G., Function theory of several complex variables. American Mathematical Soc., Providence, 2001

2001

[20] [20]

V., The Laplace transform, Princeton University Press, Princeton, 1946

Widder, D. V., The Laplace transform, Princeton University Press, Princeton, 1946

1946

[21] [21]

Naina Mohammed, S. S., Jeevanandham K., Basherrudin Mahmud Ahmed A., et al., Generalization of multivariable Laplace transform based on Tsallis q-exponential and its inverse using Post-Widder’s method, 2022, arXiv:2205.03545

arXiv 2022