The Geometry of Last-Layer Model Stealing

Snigdha Chandan Khilar

arxiv: 2606.06854 · v1 · pith:WQZKKYSOnew · submitted 2026-06-05 · 💻 cs.LG

The Geometry of Last-Layer Model Stealing

Snigdha Chandan Khilar This is my paper

Pith reviewed 2026-06-27 22:36 UTC · model grok-4.3

classification 💻 cs.LG

keywords model stealingtransformer networksgeometric analysislast layermodel extractionreverse engineeringmachine learning security

0 comments

The pith

Geometry identifies the exact conditions for perfectly copying a transformer's final layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies geometric analysis to a standard model stealing technique to determine when the final layer of a transformer can be copied exactly. This matters because it clarifies the boundaries of what information about a model's architecture can be extracted from its outputs alone. The analysis shows that while the last layer can be perfectly replicated under certain conditions, deeper hidden layers cannot be fully reconstructed from final results. Overall, it maps the possibilities and impossibilities of reverse-engineering models through stealing methods.

Core claim

Using geometry, the paper establishes the precise conditions under which a well-known stealing method can perfectly copy the final layer of a transformer network. It further demonstrates that hidden layers impose clear limits on what can be reverse-engineered from the model's outputs, showing that a complete network cannot be reconstructed solely from final results.

What carries the argument

Geometric analysis of the stealing method applied to the last layer of transformers, identifying conditions for exact copying.

Load-bearing premise

The well-known stealing method admits a geometric analysis capable of yielding exact, verifiable conditions for perfect last-layer copying when the target is a transformer.

What would settle it

Applying the stealing method to a transformer under the derived geometric conditions and checking whether the copied last layer matches the original exactly, compared to cases where conditions are not met.

Figures

Figures reproduced from arXiv: 2606.06854 by Snigdha Chandan Khilar.

**Figure 2.** Figure 2: Left: regularity is load-bearing. Under logit noise the rank gap (R1) decays like 1/σ but never falls below 1; the projection recovery error (R2/R3) grows linearly. Right: below the last layer, the linear span (what the SVD reports) overstates the content; the intrinsic manifold dimension recovers it and exposes the nonlinear bottleneck. is the same as a circle in the plane: a circle is a one-dimensional o… view at source ↗

read the original abstract

This paper uses geometry to explain how a machine learning model can be stolen using an already existing well-known method. The author has shown the exact conditions required to perfectly copy the final layer of a transformer network. When looking deeper into the hidden layers the author has explained clear limits. The author has also demonstrated that a hidden network cannot be fully reverse engineered just by looking at the final results. The research clearly maps out what can and cannot be stolen from a model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper overlays geometry on a known stealing method to claim exact conditions for last-layer recovery in transformers, but the derivations and checks are not visible in the provided material.

read the letter

The main takeaway is that this work applies a geometric lens to an existing model-stealing technique and states precise conditions under which the final layer of a transformer can be copied exactly, while also spelling out why the hidden layers stay out of reach from output access alone.

It does a reasonable job of drawing those boundaries in plain terms. The point that full reverse engineering is not possible from final results alone matches what most people in the area already expect from black-box settings, and framing it geometrically might help some readers visualize the reachable versus unreachable parts.

The soft spot is the missing support for the central claim. The abstract asserts exact conditions but shows no derivations, no equations, and no experiments that would let a reader verify whether the geometry actually produces those conditions or whether it rests on unstated assumptions about query coverage, normalization, or output access. The stress-test note on query distribution and finite-sample effects therefore lands as a live concern until the full math is examined.

This is for people already working on model extraction and ML security who want a reframing of known limits rather than new attacks. A reader looking for formally verified boundaries or reproducible results will not find enough here to build on directly.

I would send it to peer review only if the full manuscript contains the actual geometric derivations and some form of validation; otherwise the evidence is too thin to justify referee time.

Referee Report

2 major / 1 minor

Summary. The paper applies geometric analysis to a standard model stealing method, claiming to derive the exact conditions under which the final linear layer of a transformer can be perfectly copied, while identifying clear limits on recovering hidden layers from final outputs alone and mapping what aspects of a model can and cannot be stolen.

Significance. If the claimed geometric conditions are rigorously derived and the limits on hidden-layer recovery are shown to hold under standard assumptions, the work would provide a theoretical framework clarifying the boundaries of last-layer extraction attacks on transformers.

major comments (2)

[Abstract] Abstract: the claim that 'exact conditions' for perfect last-layer copying have been shown cannot be assessed, as no equations, derivations, or geometric constructions are visible to verify whether query distribution, normalization ambiguities, or output access assumptions are handled.
[Full text] Full text: no derivations, proofs, or empirical checks are provided, so it is impossible to confirm whether the geometric analysis of the stealing method actually yields verifiable, parameter-free conditions for transformer final-layer recovery as asserted.

minor comments (1)

The abstract would benefit from a brief statement of the specific stealing method analyzed and the precise output access model assumed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We agree that the current manuscript does not contain the explicit equations, derivations, geometric constructions, or empirical checks needed to substantiate the claims about exact conditions for last-layer recovery. We will revise the paper to include these elements.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'exact conditions' for perfect last-layer copying have been shown cannot be assessed, as no equations, derivations, or geometric constructions are visible to verify whether query distribution, normalization ambiguities, or output access assumptions are handled.

Authors: We accept the point. The abstract asserts exact conditions without visible supporting mathematics. In revision we will either tone down the abstract or ensure the main text presents the geometric constructions, query-distribution requirements, normalization handling, and output-access assumptions at the outset so the claim can be assessed. revision: yes
Referee: [Full text] Full text: no derivations, proofs, or empirical checks are provided, so it is impossible to confirm whether the geometric analysis of the stealing method actually yields verifiable, parameter-free conditions for transformer final-layer recovery as asserted.

Authors: The observation is correct: the manuscript as written supplies neither derivations nor proofs nor checks. We will add the geometric analysis, the derivation of the parameter-free conditions, and any necessary empirical verification in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain self-contained with no self-referential reductions

full rationale

The abstract describes applying geometry to an existing well-known stealing method to derive exact conditions for last-layer copying in transformers, plus limits on hidden layers. No equations, fitted parameters, self-citations, or ansatzes are present in the provided text. No step reduces a claimed prediction or uniqueness result to a definition or prior self-citation by construction. The central claim is an analysis of an external method, which is independent by the paper's own framing. This is the expected honest non-finding for a geometry-based explanation without load-bearing internal fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that geometry can be applied to the known method to produce exact conditions.

pith-pipeline@v0.9.1-grok · 5588 in / 1005 out tokens · 11709 ms · 2026-06-27T22:36:58.604270+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Carlini et al

N. Carlini et al.,Stealing Part of a Production Language Model.ICML 2024. arXiv:2403.06634

work page arXiv 2024
[2]

The Cartan-K\"ahler theorem for exterior differential systems on transitive Lie algebroids

S. Hohloch, T. Mestdag, K. Yasaka,The Cartan–K¨ ahler theorem for exterior differential systems on transitive Lie algebroids.arXiv:2605.29083 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

R. L. Bryant, S. S. Chern, R. B. Gardner, H. L. Goldschmidt, P. A. Griffiths,Exterior Differential Systems.Springer, 1991

1991
[4]

H. J. Sussmann,Uniqueness of the weights for minimal feedforward nets with a given input–output map.Neural Networks 5(4):589–593, 1992

1992
[5]

Macke, Davide Zoccolan,Intrinsic dimension of data representations in deep neural networks.https://arxiv.org/abs/1905.12784

A. Ansuini, A. Laio, J. H. Macke, D. Zoccolan,Intrinsic dimension of data representations in deep neural networks.NeurIPS 2019. arXiv:1905.12784

work page arXiv 2019
[6]

Finlayson, X

M. Finlayson, S. Swayamdipta, X. Ren,Logits of API-protected LLMs leak proprietary informa- tion.arXiv:2403.09539 (2024)

work page arXiv 2024
[7]

Zanella-B´ eguelin, S

S. Zanella-B´ eguelin, S. Tople, A. Paverd, B. K¨ opf,Grey-box extraction of natural language models.ICML 2021. 8

2021

[1] [1]

Carlini et al

N. Carlini et al.,Stealing Part of a Production Language Model.ICML 2024. arXiv:2403.06634

work page arXiv 2024

[2] [2]

The Cartan-K\"ahler theorem for exterior differential systems on transitive Lie algebroids

S. Hohloch, T. Mestdag, K. Yasaka,The Cartan–K¨ ahler theorem for exterior differential systems on transitive Lie algebroids.arXiv:2605.29083 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

R. L. Bryant, S. S. Chern, R. B. Gardner, H. L. Goldschmidt, P. A. Griffiths,Exterior Differential Systems.Springer, 1991

1991

[4] [4]

H. J. Sussmann,Uniqueness of the weights for minimal feedforward nets with a given input–output map.Neural Networks 5(4):589–593, 1992

1992

[5] [5]

Macke, Davide Zoccolan,Intrinsic dimension of data representations in deep neural networks.https://arxiv.org/abs/1905.12784

A. Ansuini, A. Laio, J. H. Macke, D. Zoccolan,Intrinsic dimension of data representations in deep neural networks.NeurIPS 2019. arXiv:1905.12784

work page arXiv 2019

[6] [6]

Finlayson, X

M. Finlayson, S. Swayamdipta, X. Ren,Logits of API-protected LLMs leak proprietary informa- tion.arXiv:2403.09539 (2024)

work page arXiv 2024

[7] [7]

Zanella-B´ eguelin, S

S. Zanella-B´ eguelin, S. Tople, A. Paverd, B. K¨ opf,Grey-box extraction of natural language models.ICML 2021. 8

2021