arxiv: 2605.11570 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

OUI as a Structural Observable: Towards an Activation-Centric View of Neural Network Training

Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Enrique S. Quintana-Ort\'i, Jose Duato, Jose I. Mestre, Manuel F. Dolz

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:54 UTC · model grok-4.3

classification 💻 cs.LG

keywords Overfitting-Underfitting IndicatorOUIactivation patternstraining dynamicsneural network trainingstructural observablesweight decay adaptationactivation-centric view

0 comments

The pith

The Overfitting-Underfitting Indicator (OUI) acts as an early activation-based signal revealing whether neural network training heads into poor or promising regimes before convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues for shifting focus from external metrics like loss and accuracy to internal activation patterns during training. It presents OUI as a label-free observable derived from activations that flags good or bad training trajectories in advance. Evidence spans supervised learning where it anticipates weight decay choices, reinforcement learning where it distinguishes learning-rate regimes in PPO, and online control where it supports layer-wise adaptation. The claim rests on activation patterns stabilizing sooner than parameters, opening the door to training theories built around structural evolution rather than final performance curves.

Core claim

OUI should be understood as a first practical observable of internal network structure. Across results, it consistently appears as an early, label-free, activation-based signal that reveals whether a network is entering a poor or promising training regime before convergence. In supervised learning it anticipates weight decay regimes; in reinforcement learning it discriminates learning-rate regimes early in PPO actor-critic; and in online control it can drive layer-wise weight decay adaptation. Read together with evidence that activation patterns tend to stabilize earlier than parameters, these results suggest a broader research direction: an activation-centric theory of training dynamics, of

What carries the argument

The Overfitting-Underfitting Indicator (OUI), an activation-derived metric that extracts structural regime information from network activations to anticipate training trajectory quality ahead of loss or accuracy signals.

If this is right

Weight decay selection in supervised training can be guided by OUI before full convergence occurs.
Learning-rate regimes in PPO actor-critic reinforcement learning can be discriminated early using activation signals.
Layer-wise weight decay can be adapted dynamically in online control tasks based on per-layer OUI readings.
Training can be monitored and adjusted using internal activation structure instead of waiting for external performance metrics to evolve.
An activation-centric theory of training dynamics becomes empirically testable with OUI as a foothold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If OUI generalizes across domains, similar activation observables could be developed for unsupervised or self-supervised regimes to reduce wasted compute on failed runs.
Early regime detection might enable automated intervention mechanisms that prune or restart training branches in large-scale experiments.
OUI could be combined with other internal signals such as gradient statistics to build a more complete picture of when networks enter stable structural phases.

Load-bearing premise

Activation patterns stabilize earlier than parameters, and OUI extracts structural information independent of the external loss or accuracy curves it is meant to anticipate.

What would settle it

A controlled run on a new task or architecture where OUI values remain stable yet final performance diverges sharply from the regime OUI indicated, or where OUI tracks only after-the-fact loss rather than preceding it.

Figures

Figures reproduced from arXiv: 2605.11570 by Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Enrique S. Quintana-Ort\'i, Jose Duato, Jose I. Mestre, Manuel F. Dolz.

**Figure 1.** Figure 1: OUI trajectories for a DenseNet-BC-100 on CIFAR-100 under seven logarithmically spaced weight-decay values. OUI quickly enters stable regimes, and by ∼15% of training the trajectories are already separated, anticipating which settings will generalize well or lead to under- or overfitting (validation accuracy in % shown on the right). dard external metrics had settled. That result already hinted that activ… view at source ↗

**Figure 2.** Figure 2: Training and validation loss together with an activationpattern convergence metric for a ViT-B16 on CIFAR-100. The activation metric saturates much earlier, while losses continue improving, illustrating that structural organization stabilizes before parameter refinement. The second step mattered even more. In PPO actor–critic, OUI was reformulated in the batch-based form above, making it cheaper and more… view at source ↗

**Figure 3.** Figure 3: Return, actor OUI, and critic OUI across learning rates for PPO on CartPole-v1. While performance shows a peaked dependence on the learning rate, actor and critic exhibit distinct OUI profiles, highlighting role-dependent activation structure and enabling early identification of good regimes. tation of training: activation-pattern changes tend to decay earlier than parameter updates, so a neural network ma… view at source ↗

read the original abstract

Activation functions are what make deep networks expressive: without them, the model collapses to a linear map. Yet we still evaluate training mostly from the outside, through loss, accuracy, return, or final calibration, while the internal structural evolution of the network remains largely unobserved. In this paper, we argue that the Overfitting--Underfitting Indicator (OUI) should be understood as a first practical observable of that internal structure. Across our recent results, OUI consistently appears as an early, label-free, activation-based signal that reveals whether a network is entering a poor or promising training regime before convergence. In supervised learning, it anticipates weight decay regimes; in reinforcement learning, it discriminates learning-rate regimes early in PPO actor--critic; and in online control, it can drive layer-wise weight decay adaptation. Read together with recent evidence that activation patterns tend to stabilize earlier than parameters, these results suggest a broader research direction: an activation-centric theory of training dynamics. OUI is becoming an empirical foothold toward this theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper mostly repackages the authors' prior OUI observations into an activation-centric framing without new tests or data to back the independence claim.

read the letter

The core takeaway is that the manuscript gathers the authors' recent empirical notes on the Overfitting-Underfitting Indicator and presents OUI as an early activation-based signal that can flag training regimes before convergence shows up in loss or accuracy. It links this to examples in supervised learning, PPO actor-critic, and online control, and ties it to the idea that activation patterns settle earlier than weights. That synthesis is the main move here, and it does surface a coherent direction for watching internal structure rather than only external curves. The examples across settings show the authors have at least tried to apply the same indicator in different contexts, which gives the direction some breadth. The text also correctly notes that most training evaluation still happens from the outside. Those points are fair and worth keeping in view. The soft spots sit in the evidence base. The abstract and summary lean on unspecified recent results for the consistent predictive behavior, but no numbers, error bars, or step-by-step derivation appear in the supplied material. More importantly, nothing directly tests whether OUI at early steps carries information beyond what the loss trajectory already provides. No partial correlation, no matched counterfactual, and no check against the possibility that activation statistics simply track performance in a monotonic way. If that separation does not hold, the structural-observable claim and the early-anticipation story both shrink to re-description. The circularity risk the reader flagged is real on the current text. This is for people already working on internal diagnostics or adaptive schedules who want to explore activation dynamics as a monitoring layer. A reader who needs concrete, reproducible evidence before changing practice will find the current version thin. It still deserves a serious referee because the underlying idea of label-free internal signals is worth proper scrutiny, and referees can ask for the missing controls and full experimental details. I would send it for review with instructions to supply the quantitative support and address the independence question explicitly.

Referee Report

2 major / 1 minor

Summary. The manuscript positions the Overfitting-Underfitting Indicator (OUI) as a practical structural observable computed from activation patterns. It claims that OUI supplies an early, label-free signal capable of discriminating poor versus promising training regimes before convergence, with applications to weight-decay selection in supervised learning, learning-rate regime identification in PPO actor-critic, and layer-wise adaptation in online control. The argument rests on cited prior observations that activation statistics stabilize earlier than parameters and on the assertion that OUI extracts information independent of external loss or accuracy curves.

Significance. If the independence claim is substantiated with explicit controls, the work could open an activation-centric line of inquiry that complements loss-based monitoring and enables earlier, more targeted training interventions. The absence of new quantitative demonstrations or falsification tests in the present text, however, limits its immediate contribution to the literature.

major comments (2)

[Abstract] Abstract: The central claim that OUI 'consistently appears as an early, label-free, activation-based signal' that anticipates regimes 'before convergence' is asserted without any quantitative evidence, error bars, or derivation steps supplied in the manuscript. The reference to 'recent results' is left unspecified, so the reader cannot assess whether the reported behavior is robust or merely descriptive.
[Abstract] Abstract and introduction: No explicit test (partial correlation, Granger causality, or matched-loss counterfactual) is presented to show that OUI at early t carries incremental predictive power once loss(t) is controlled for. Without such a separation check, the asserted independence from external trajectories remains an untested assumption rather than a demonstrated property, directly undermining the 'structural observable' framing.

minor comments (1)

[Abstract] The manuscript would benefit from a concise, self-contained definition or formula for OUI early in the text so that readers can evaluate the activation-centric claims without consulting prior work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, clarifying the manuscript's scope as a position paper that synthesizes prior empirical work on OUI while strengthening the presentation of evidence and independence claims through targeted revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that OUI 'consistently appears as an early, label-free, activation-based signal' that anticipates regimes 'before convergence' is asserted without any quantitative evidence, error bars, or derivation steps supplied in the manuscript. The reference to 'recent results' is left unspecified, so the reader cannot assess whether the reported behavior is robust or merely descriptive.

Authors: The manuscript is a position paper whose claims rest on previously published empirical studies. In the revised version we explicitly cite those studies (including the specific references where quantitative results, error bars, and derivation details appear) and add a short summary paragraph in the introduction that recapitulates the key empirical patterns observed across supervised, reinforcement-learning, and control settings. This makes the abstract's assertions directly traceable without requiring the reader to consult external material. revision: yes
Referee: [Abstract] Abstract and introduction: No explicit test (partial correlation, Granger causality, or matched-loss counterfactual) is presented to show that OUI at early t carries incremental predictive power once loss(t) is controlled for. Without such a separation check, the asserted independence from external trajectories remains an untested assumption rather than a demonstrated property, directly undermining the 'structural observable' framing.

Authors: We agree that formal statistical separation tests would provide stronger support for the independence claim. The current argument for independence is based on OUI's construction—it is computed exclusively from activation statistics and does not require labels or loss values—together with prior evidence that activation patterns stabilize earlier than parameters. In the revised introduction we have added an explicit discussion of this limitation, clarifying that the practical utility demonstrated in the applications (early regime discrimination when loss curves remain similar) serves as complementary evidence, while outlining how partial-correlation or counterfactual analyses could be performed in follow-up work. We have also referenced the activation-stabilization literature to ground the structural-observable framing. revision: partial

Circularity Check

0 steps flagged

No circularity: OUI introduced as empirical activation statistic with independent experimental support

full rationale

The manuscript presents OUI as a computed quantity derived from internal activation patterns and reports its observed correlation with training regimes across supervised, RL, and control settings. No equations or definitions are supplied in which OUI is constructed from the loss, accuracy, or regime labels it is claimed to anticipate; the indicator is described as label-free and activation-centric. Self-references to prior activation-stabilization observations are used only to motivate the broader research direction and do not serve as the sole justification for the current empirical claims. The derivation chain therefore remains observational rather than self-referential by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that activation statistics carry independent structural information about training regimes; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Activation patterns stabilize earlier than parameters during training
Invoked in the abstract as supporting evidence for using OUI as an early signal.

pith-pipeline@v0.9.0 · 5508 in / 1242 out tokens · 40961 ms · 2026-05-13T01:54:27.129864+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
OUIi(t) = 1/dl sum u(l)j(t) / floor(B/2) where u is min(s, B-s) from binary activation mask m=1{a>0}
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear
activation patterns stabilize earlier than parameters; two-timescale view of training

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[3]

M. J. Kearns , title =

work page
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[6]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[7]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[8]

Frontiers in Artificial Intelligence , author =

Studying the. Frontiers in Artificial Intelligence , author =. doi:10.3389/frai.2021.642374 , abstract =

work page doi:10.3389/frai.2021.642374 2021
[9]

and Dolz, Manuel F

Pérez-Corral, Cristian and Fernández-Hernández, Alberto and Mestre, Jose I. and Dolz, Manuel F. and Duato, Jose and Quintana-Ortí, Enrique S. , month = feb, year =. Regime. doi:10.48550/arXiv.2602.08333 , abstract =

work page doi:10.48550/arxiv.2602.08333
[10]

and Dolz, Manuel F

Fernández-Hernández, Alberto and Mestre, Jose I. and Dolz, Manuel F. and Duato, Jose and Quintana-Ortí, Enrique S. , month = jul, year =. 2025. doi:10.1109/AMLDS63918.2025.11159348 , abstract =

work page doi:10.1109/amlds63918.2025.11159348 2025
[11]

arXiv.org , author =

When. arXiv.org , author =

work page
[12]

Similarity of

Kornblith, Simon and Norouzi, Mohammad and Lee, Honglak and Hinton, Geoffrey , year =. Similarity of. International

work page