arxiv: 2605.07790 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Spectral Surgery: Class-Targeted Post-Hoc Rebalancing via Hessian Spike Perturbation

Hugo Vigna, Samuel Bontemps

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords Hessian spectrumclass rebalancingpost-hoc optimizationdeep neural networksspike eigenvectorssensitivity matrixaccuracy balancing

0 comments

The pith

Perturbing model weights along Hessian spike eigenvectors rebalances per-class accuracy without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that the small number of large outlier eigenvalues in a trained network's Hessian point toward directions tied to individual classes. By building a matrix that measures how each such direction affects every class's accuracy, the authors solve a constrained optimization problem to find weight adjustments that lift performance on weaker classes while holding steady on stronger ones. This post-training adjustment uses an adaptive rule to scale the size of the change based on whether each step improves the target metric. A reader would care because it supplies a lightweight alternative to full retraining or data reweighting whenever class performance proves uneven after initial optimization.

Core claim

Spectral Surgery directly perturbs model weights along the spike eigenvectors of the Hessian. It introduces a spike-class sensitivity matrix that records the directional derivative of each class accuracy along each spike, then solves a constrained optimization over the perturbation coefficients to raise accuracy on weak classes while preserving it on strong classes. An adaptive amplitude controller raises or lowers the total perturbation budget according to whether successive steps produce improvement signals. Experiments on CIFAR-10 and ISIC-2019 report gains in balanced accuracy together with lower standard deviation across classes.

What carries the argument

The spike eigenvectors of the Hessian, whose count matches the number of classes minus one, together with the spike-class sensitivity matrix that quantifies how each eigenvector affects per-class accuracy.

Load-bearing premise

The spike eigenvectors align with class-specific directions such that small perturbations chosen via the sensitivity matrix can improve weak-class accuracy without harming strong-class accuracy.

What would settle it

Applying the optimized perturbations to a model trained on CIFAR-10 and finding no increase in balanced accuracy or an increase in per-class standard deviation on the test set would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.07790 by Hugo Vigna, Samuel Bontemps.

**Figure 2.** Figure 2: Spike–class sensitivity matrix S for the baseline ResNet-50/CIFAR-10 model (ε = 0.02, n = 128). Values are in %/unit of perturbation amplitude. Red: moving along +qi improves class j; blue: degrades it. 6.2 Constrained Coefficient Optimization Given S and the current per-class accuracy vector a ∈ R C, we seek coefficients α ∈ R K for the compound perturbation δθ = PK i=1 αiqi : max α X C j=1 wj · [PITH_FU… view at source ↗

**Figure 3.** Figure 3: Per-class accuracy on CIFAR-100 before (grey) and after deflated Spectral Surgery [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Predicted vs. measured per-class accuracy change on CIFAR-10 as [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Linearization error vs. ∥α∥2 on log–log axes. The additive fit (15) (c+b ∥α∥ d , R2 = 0.99) gives d ≈ 1, indicating linear—not quadratic—deviation from the linearization. A pure power law (grey dashed, R2 = 0.96) yields an apparent exponent 0.74, an artefact of mixing the constant floor and the linear term in log–log without an intercept. Implication for the choice of αmax. Since the practical SS budget on… view at source ↗

read the original abstract

The Hessian spectrum of trained deep networks exhibits a characteristic structure: a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues (spikes), confirming the relevance of Random Matrix Theory in deep learning. The spike count matches the number of classes minus one. While prior work has described this structure, no method has exploited it operationally to improve classification performance. We propose Spectral Surgery, a post-hoc optimization method that directly perturbs model weights along spike eigenvectors to rebalance per-class accuracy without retraining. We introduce (i) a spike-class sensitivity matrix that quantifies the directional derivative of each class's accuracy along each spike eigenvector, (ii) a constrained optimization of perturbation coefficients that targets weak classes while preserving strong ones, and (iii) an adaptive amplitude control that raises or lowers the perturbation budget based on iteration-level improvement signals. We obtain encouraging results on CIFAR-10 and ISIC-2019 on both balanced accuracy and standard deviation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Spectral Surgery, a post-hoc optimization technique that perturbs the weights of a trained deep network along the eigenvectors of the large outlier eigenvalues (spikes) of the Hessian to rebalance per-class accuracies without retraining. It introduces a spike-class sensitivity matrix that quantifies directional derivatives of each class accuracy along these eigenvectors, solves a constrained optimization over perturbation coefficients to boost weak classes while preserving strong ones, and uses an adaptive amplitude control driven by iteration-level improvement signals. Encouraging results are claimed on CIFAR-10 and ISIC-2019 in terms of balanced accuracy and its standard deviation.

Significance. If the central claims can be substantiated with rigorous definitions and quantitative evidence, the work would offer a novel operational exploitation of the known low-rank spike structure in the Hessian spectrum (whose count matches the number of classes minus one) for efficient post-hoc fairness adjustments. This could be practically significant for imbalanced or biased models where retraining is costly, extending random matrix theory insights into actionable interventions.

major comments (2)

[Abstract / Method] Abstract and method description: the spike-class sensitivity matrix is defined via the directional derivative of per-class accuracy (an indicator-based discontinuous quantity) along each spike eigenvector. No surrogate (e.g., softmax probabilities, smoothed accuracy, or finite-difference protocol with explicit step size) is referenced, yet the constrained optimization of perturbation coefficients and the adaptive amplitude control both depend on this matrix being well-defined and stable.
[Abstract] Abstract: the claim of 'encouraging results' on CIFAR-10 and ISIC-2019 supplies no quantitative deltas versus baselines, error bars, number of runs, or optimization details (e.g., how the perturbation amplitude budget is initialized or updated). This prevents verification that the rebalancing is statistically meaningful or that the sensitivity-matrix-driven perturbations outperform simpler alternatives.

minor comments (2)

Clarify whether perturbations are applied to all layers or selected ones, and provide pseudocode or explicit equations for the constrained optimization and adaptive amplitude update rule to aid reproducibility.
The manuscript should include a brief comparison to prior post-hoc rebalancing methods (e.g., logit adjustment or threshold tuning) to situate the novelty of the Hessian-based approach.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for clarification and improvement in our manuscript. We address each major comment in detail below.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: the spike-class sensitivity matrix is defined via the directional derivative of per-class accuracy (an indicator-based discontinuous quantity) along each spike eigenvector. No surrogate (e.g., softmax probabilities, smoothed accuracy, or finite-difference protocol with explicit step size) is referenced, yet the constrained optimization of perturbation coefficients and the adaptive amplitude control both depend on this matrix being well-defined and stable.

Authors: We acknowledge that per-class accuracy, being based on indicator functions, is discontinuous, which could make direct derivatives ill-defined. However, in our implementation, we approximate these directional derivatives using finite differences with a small, fixed step size along the eigenvector. This provides a numerically stable sensitivity matrix that is used in the optimization. We will update the method description to explicitly detail this approximation, including the step size selection and any averaging procedures to ensure stability. revision: yes
Referee: [Abstract] Abstract: the claim of 'encouraging results' on CIFAR-10 and ISIC-2019 supplies no quantitative deltas versus baselines, error bars, number of runs, or optimization details (e.g., how the perturbation amplitude budget is initialized or updated). This prevents verification that the rebalancing is statistically meaningful or that the sensitivity-matrix-driven perturbations outperform simpler alternatives.

Authors: We agree that including quantitative details in the abstract would make the claims more verifiable. The full manuscript contains the specific results, including deltas in balanced accuracy and its standard deviation, along with experimental details such as the number of runs and baseline comparisons. We will revise the abstract to incorporate key quantitative findings, such as the improvement margins and optimization parameters, to better substantiate the encouraging results. revision: yes

Circularity Check

0 steps flagged

No circularity: new constructs and explicit optimization remain independent of inputs

full rationale

The paper introduces the spike-class sensitivity matrix as a novel quantification of directional derivatives along Hessian spike eigenvectors, followed by a constrained optimization over perturbation coefficients and adaptive amplitude control. These elements are defined and applied as algorithmic contributions rather than reductions of the claimed rebalancing improvement to any fitted parameter, self-citation, or tautological renaming. Prior observations on Hessian spike structure are cited as background (not load-bearing uniqueness theorems from the same authors), and the central post-hoc perturbation procedure is presented as an operational exploitation of that structure via new optimization machinery. No step equates the output performance gain to the input definitions by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified assumption that Hessian spikes align with class directions and that small perturbations along them can be optimized for selective accuracy gains; the sensitivity matrix is a new invented construct.

free parameters (1)

perturbation amplitude budget
Adaptive control raises or lowers the budget based on iteration-level improvement signals; exact initialization and update rules are not specified in the abstract.

axioms (1)

domain assumption The Hessian spectrum of trained deep networks exhibits a continuous bulk of near-zero eigenvalues and a small number of large outlier eigenvalues whose count equals the number of classes minus one.
Invoked to justify targeting the spike eigenvectors; presented as confirming Random Matrix Theory but treated as given for the method.

invented entities (1)

spike-class sensitivity matrix no independent evidence
purpose: Quantifies the directional derivative of each class's accuracy along each spike eigenvector
New matrix introduced to enable the constrained optimization; no independent evidence or prior validation supplied in the abstract.

pith-pipeline@v0.9.0 · 5466 in / 1433 out tokens · 50433 ms · 2026-05-11T03:30:49.060360+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
spike–class sensitivity matrix that quantifies the directional derivative of each class’s accuracy along each spike eigenvector
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
C−1 large isolated eigenvalues called spikes

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

L. Sagun, U. Evci, V. U. Güney, Y. Dauphin, and L. Bottou. Empirical analysis of the Hessian of over-parametrized neural networks.arXiv:1706.04454, 2017

work page Pith review arXiv 2017
[2]

Ghorbani, S

B. Ghorbani, S. Krishnan, and Y. Xiao. An investigation into neural net optimization via Hessian eigenvalue density. InICML, 2019

work page 2019
[3]

V. Papyan. Traces of class/cross-class structure pervade deep learning spectra.JMLR, 21(167):1–64, 2020

work page 2020
[4]

B. A. Pearlmutter. Fast exact multiplication by the Hessian.Neural Computation, 6(1):147– 160, 1994

work page 1994
[5]

C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators.J. Res. Nat. Bur. Standards, 45:255–282, 1950

work page 1950
[6]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016
[7]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. InICCV, 2017

work page 2017
[8]

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie. Class-balanced loss based on effective number of samples. InCVPR, 2019

work page 2019
[9]

K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma. Learning imbalanced datasets with label-distribution-aware margin loss. InNeurIPS, 2019

work page 2019
[10]

Foret, A

P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021

work page 2021
[11]

Zhuang, B

J. Zhuang, B. Gong, L. Yuan, Y. Cui, H. Adam, N. Dvornek, S. Tatikonda, J. Duncan, and T. Liu. Surrogate gap minimization improves sharpness-aware training. InICLR, 2022

work page 2022
[12]

Fort and S

S. Fort and S. Ganguli. Emergent properties of the local geometry of neural loss landscapes. arXiv:1910.05929, 2019

work page arXiv 1910
[13]

Z. Yao, A. Gholami, K. Keutzer, and M. W. Mahoney. PyHessian: Neural networks through the lens of the Hessian. InIEEE Big Data, 2020

work page 2020
[14]

Y. Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods.SIAM J. Numer. Anal., 17(5):687–706, 1980

work page 1980
[15]

B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis. Decoupling representation and classifier for long-tailed recognition. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020
[16]

A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar. Long-tail learning via logit adjustment. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[17]

Roy and M

O. Roy and M. Vetterli. The effective rank: A measure of effective dimensionality. In15th European Signal Processing Conference (EUSIPCO), pp. 606–610, 2007. 30

work page 2007