pith. sign in

arxiv: 2606.06854 · v1 · pith:WQZKKYSOnew · submitted 2026-06-05 · 💻 cs.LG

The Geometry of Last-Layer Model Stealing

Pith reviewed 2026-06-27 22:36 UTC · model grok-4.3

classification 💻 cs.LG
keywords model stealingtransformer networksgeometric analysislast layermodel extractionreverse engineeringmachine learning security
0
0 comments X

The pith

Geometry identifies the exact conditions for perfectly copying a transformer's final layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies geometric analysis to a standard model stealing technique to determine when the final layer of a transformer can be copied exactly. This matters because it clarifies the boundaries of what information about a model's architecture can be extracted from its outputs alone. The analysis shows that while the last layer can be perfectly replicated under certain conditions, deeper hidden layers cannot be fully reconstructed from final results. Overall, it maps the possibilities and impossibilities of reverse-engineering models through stealing methods.

Core claim

Using geometry, the paper establishes the precise conditions under which a well-known stealing method can perfectly copy the final layer of a transformer network. It further demonstrates that hidden layers impose clear limits on what can be reverse-engineered from the model's outputs, showing that a complete network cannot be reconstructed solely from final results.

What carries the argument

Geometric analysis of the stealing method applied to the last layer of transformers, identifying conditions for exact copying.

Load-bearing premise

The well-known stealing method admits a geometric analysis capable of yielding exact, verifiable conditions for perfect last-layer copying when the target is a transformer.

What would settle it

Applying the stealing method to a transformer under the derived geometric conditions and checking whether the copied last layer matches the original exactly, compared to cases where conditions are not met.

Figures

Figures reproduced from arXiv: 2606.06854 by Snigdha Chandan Khilar.

Figure 1
Figure 1. Figure 1: The degree-1 part of the ideal. On a toy model with [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: regularity is load-bearing. Under logit noise the rank gap (R1) decays like 1/σ but never falls below 1; the projection recovery error (R2/R3) grows linearly. Right: below the last layer, the linear span (what the SVD reports) overstates the content; the intrinsic manifold dimension recovers it and exposes the nonlinear bottleneck. is the same as a circle in the plane: a circle is a one-dimensional o… view at source ↗
read the original abstract

This paper uses geometry to explain how a machine learning model can be stolen using an already existing well-known method. The author has shown the exact conditions required to perfectly copy the final layer of a transformer network. When looking deeper into the hidden layers the author has explained clear limits. The author has also demonstrated that a hidden network cannot be fully reverse engineered just by looking at the final results. The research clearly maps out what can and cannot be stolen from a model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper applies geometric analysis to a standard model stealing method, claiming to derive the exact conditions under which the final linear layer of a transformer can be perfectly copied, while identifying clear limits on recovering hidden layers from final outputs alone and mapping what aspects of a model can and cannot be stolen.

Significance. If the claimed geometric conditions are rigorously derived and the limits on hidden-layer recovery are shown to hold under standard assumptions, the work would provide a theoretical framework clarifying the boundaries of last-layer extraction attacks on transformers.

major comments (2)
  1. [Abstract] Abstract: the claim that 'exact conditions' for perfect last-layer copying have been shown cannot be assessed, as no equations, derivations, or geometric constructions are visible to verify whether query distribution, normalization ambiguities, or output access assumptions are handled.
  2. [Full text] Full text: no derivations, proofs, or empirical checks are provided, so it is impossible to confirm whether the geometric analysis of the stealing method actually yields verifiable, parameter-free conditions for transformer final-layer recovery as asserted.
minor comments (1)
  1. The abstract would benefit from a brief statement of the specific stealing method analyzed and the precise output access model assumed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We agree that the current manuscript does not contain the explicit equations, derivations, geometric constructions, or empirical checks needed to substantiate the claims about exact conditions for last-layer recovery. We will revise the paper to include these elements.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'exact conditions' for perfect last-layer copying have been shown cannot be assessed, as no equations, derivations, or geometric constructions are visible to verify whether query distribution, normalization ambiguities, or output access assumptions are handled.

    Authors: We accept the point. The abstract asserts exact conditions without visible supporting mathematics. In revision we will either tone down the abstract or ensure the main text presents the geometric constructions, query-distribution requirements, normalization handling, and output-access assumptions at the outset so the claim can be assessed. revision: yes

  2. Referee: [Full text] Full text: no derivations, proofs, or empirical checks are provided, so it is impossible to confirm whether the geometric analysis of the stealing method actually yields verifiable, parameter-free conditions for transformer final-layer recovery as asserted.

    Authors: The observation is correct: the manuscript as written supplies neither derivations nor proofs nor checks. We will add the geometric analysis, the derivation of the parameter-free conditions, and any necessary empirical verification in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain self-contained with no self-referential reductions

full rationale

The abstract describes applying geometry to an existing well-known stealing method to derive exact conditions for last-layer copying in transformers, plus limits on hidden layers. No equations, fitted parameters, self-citations, or ansatzes are present in the provided text. No step reduces a claimed prediction or uniqueness result to a definition or prior self-citation by construction. The central claim is an analysis of an external method, which is independent by the paper's own framing. This is the expected honest non-finding for a geometry-based explanation without load-bearing internal fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that geometry can be applied to the known method to produce exact conditions.

pith-pipeline@v0.9.1-grok · 5588 in / 1005 out tokens · 11709 ms · 2026-06-27T22:36:58.604270+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Carlini et al

    N. Carlini et al.,Stealing Part of a Production Language Model.ICML 2024. arXiv:2403.06634

  2. [2]

    The Cartan-K\"ahler theorem for exterior differential systems on transitive Lie algebroids

    S. Hohloch, T. Mestdag, K. Yasaka,The Cartan–K¨ ahler theorem for exterior differential systems on transitive Lie algebroids.arXiv:2605.29083 (2026)

  3. [3]

    R. L. Bryant, S. S. Chern, R. B. Gardner, H. L. Goldschmidt, P. A. Griffiths,Exterior Differential Systems.Springer, 1991

  4. [4]

    H. J. Sussmann,Uniqueness of the weights for minimal feedforward nets with a given input–output map.Neural Networks 5(4):589–593, 1992

  5. [5]

    Macke, Davide Zoccolan,Intrinsic dimension of data representations in deep neural networks.https://arxiv.org/abs/1905.12784

    A. Ansuini, A. Laio, J. H. Macke, D. Zoccolan,Intrinsic dimension of data representations in deep neural networks.NeurIPS 2019. arXiv:1905.12784

  6. [6]

    Finlayson, X

    M. Finlayson, S. Swayamdipta, X. Ren,Logits of API-protected LLMs leak proprietary informa- tion.arXiv:2403.09539 (2024)

  7. [7]

    Zanella-B´ eguelin, S

    S. Zanella-B´ eguelin, S. Tople, A. Paverd, B. K¨ opf,Grey-box extraction of natural language models.ICML 2021. 8