pith. sign in

arxiv: 2606.09189 · v1 · pith:BTPN75WJnew · submitted 2026-06-08 · 💻 cs.CR · cs.AI

Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models

Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords EEG foundation modelsattribute leakagecross-encoder transferprivacy auditridge decodermembership inferencedifferential privacy
0
0 comments X

The pith

A ridge attribute decoder from one frozen EEG encoder transfers to all others via a linear bridge, with 95% CI lower bound at least 0.081.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper jointly audits EEG foundation models on raw reconstruction, membership inference, identity linkage, and attribute leakage. Single-endpoint audits pass for releases that still leak spectral attributes. A single ridge decoder trained on one frozen encoder transfers via a fitted linear bridge to held-out subject splits of every other encoder. The authors prove a sufficient condition on projector overlap for such a chained bridge attacker and introduce an audit-endpoint disagreement score that flags leaks where standard membership audits reach only AUC 0.50-0.70. DP-SGD and adaptive attackers leave the attribute channel intact across tested datasets.

Core claim

Each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. A sufficient condition is proved for encoders sharing nontrivial attribute-coordinate projector overlap beta, admitting a chained ridge bridge with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and beta is back-solved in [0.008, 0.198].

What carries the argument

a fitted linear bridge between encoders that enables transfer of a ridge attribute decoder across frozen models

If this is right

  • Joint multi-endpoint audits can block releases that pass any single audit.
  • The audit-endpoint disagreement score is positive in all eight matched-CI cells with p<0.001.
  • Wiener-style noise-aware attackers, LiRA membership audits, and DP-SGD at every utility-preserving epsilon leave the attribute channel essentially unchanged.
  • The cross-encoder bridge theorem supplies a release-blocking criterion grounded in embedding overlap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linear-bridge transfer may appear in foundation models for other biosignals or modalities.
  • Attribute sanitization stronger than DP-SGD on the head may be required to close the channel.
  • Measuring projector overlap beta directly on new model pairs would test whether the observed range is typical.

Load-bearing premise

A fitted linear bridge between encoders accurately captures attribute transfer without requiring additional unstated conditions on embedding distributions or subject matching beyond the stated disjoint splits.

What would settle it

A new pair of encoders where no linear bridge achieves a subject-disjoint 95% CI lower bound of 0.081 or higher for the transferred ridge decoder on held-out test splits.

Figures

Figures reproduced from arXiv: 2606.09189 by Jianwei Tai.

Figure 1
Figure 1. Figure 1: Threat model and claim boundary. The supported [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Split-controlled representation leakage. Spectral [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Subject-disjoint matched-control confidence inter [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cross-encoder transfer audit (left) and bridge [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Defense curves on the window split. Gaussian noise suppresses reference-set identity at high noise, while attribute [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Privacy-utility frontier for the tested noise grids. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: DP-SGD privacy-utility frontier on the LaBraM [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Audit ladder. Each step removes one common [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

EEG foundation-model releases are usually audited one endpoint at a time: raw-reconstruction, membership inference, identity linkage, or DP-SGD on the downstream head. We audit the same released embeddings under all four endpoints jointly, on BIOT, LaBraM, and EEGPT, and show that each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]. To turn the joint audit into a deployment-readable decision rule we introduce an audit-endpoint disagreement score (AEDS), prove sufficient conditions for its positivity, and bootstrap-calibrate it per cell; AEDS is positive in all eight matched-CI cells (BIOT/LaBraM/EEGPT on EEGMMI; LaBraM on Sleep-EDF, 54-channel LIMO, CHB-MIT pediatric scalp EEG) with p<0.001, while a head-level Carlini LiRA membership audit reaches AUC only 0.50-0.70. Standard defenses fail under audit: a Wiener-style noise-aware adaptive attacker, the LiRA audit, and DP-SGD at every utility-preserving epsilon in {4,8} leave the attribute channel essentially unchanged. The contribution is an audit framework that turns scattered single-endpoint defenses into a joint release decision, supported by a cross-encoder bridge theorem and adaptive-attacker, LiRA, and DP-SGD baselines; the audit licenses release-blocking, not raw-waveform exfiltration or held-out-subject identity recovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper claims that single-endpoint audits of EEG foundation models (BIOT, LaBraM, EEGPT) miss spectral attribute leakage, which is revealed by a cross-encoder transfer audit: a ridge attribute decoder trained on one frozen encoder transfers via a fitted linear bridge to held-out-subject splits of the others, yielding a subject-disjoint matched-control 95% CI lower bound of at least 0.081 across all six directions. It proves a sufficient condition for a chained ridge bridge attacker based on attribute-coordinate projector overlap beta (back-solved to [0.008, 0.198]), introduces a bootstrap-calibrated audit-endpoint disagreement score (AEDS) that is positive in all eight cells, and shows that Wiener-style, LiRA, and DP-SGD defenses leave the attribute channel intact.

Significance. If the cross-encoder transfer result and the supporting theorem hold after verification of the distributional assumptions, the work supplies a joint audit framework stronger than isolated membership or reconstruction attacks, directly supporting release-blocking decisions for EEG foundation models.

major comments (3)
  1. [Abstract] Abstract (sufficient condition paragraph): the proof states a centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0 and then back-solves beta in [0.008, 0.198] directly from the observed transfer performance; this makes the claimed lower bound dependent on the same fitted quantities it is intended to explain, creating a circularity risk for the joint-audit conclusion.
  2. [Abstract] Abstract (ridge bridge derivation): the sufficient condition implicitly requires centering and bounded-gain conditions on the embedding distributions plus dominant attribute-projector overlap; the reported experiments state subject-disjoint splits and matched-control CIs but do not report checks (e.g., cross-encoder covariance spectra or residual nonlinearity tests) that would confirm these conditions hold.
  3. [Results] Results (AEDS and CI cells): the claim that AEDS is positive with p<0.001 in all eight matched-CI cells and that the attribute lower bound is at least 0.081 rests on the linear bridge isolating the attribute channel; without explicit verification that residual subject or dataset covariates are not driving the transfer, the load-bearing claim that single-endpoint audits are insufficient does not yet follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below with clarifications and revisions to strengthen the presentation of assumptions and evidence, while preserving the core joint-audit findings.

read point-by-point responses
  1. Referee: [Abstract] Abstract (sufficient condition paragraph): the proof states a centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0 and then back-solves beta in [0.008, 0.198] directly from the observed transfer performance; this makes the claimed lower bound dependent on the same fitted quantities it is intended to explain, creating a circularity risk for the joint-audit conclusion.

    Authors: The theorem derives the lower bound on chained-ridge attack performance strictly as a function of the (unobserved) projector overlap beta under the stated centering and bounded-gain assumptions; this derivation is independent of any particular empirical transfer value. The back-solving step is a separate, post-hoc calculation that converts the observed transfer performance into an implied range for beta to aid interpretation. We agree the abstract does not separate these steps clearly enough. In revision we will (i) state the theorem bound first without reference to the numerical interval, (ii) move the back-solving calculation and its confidence interval to the methods/appendix, and (iii) add an explicit sentence that the bound itself does not rely on the fitted quantities. revision: partial

  2. Referee: [Abstract] Abstract (ridge bridge derivation): the sufficient condition implicitly requires centering and bounded-gain conditions on the embedding distributions plus dominant attribute-projector overlap; the reported experiments state subject-disjoint splits and matched-control CIs but do not report checks (e.g., cross-encoder covariance spectra or residual nonlinearity tests) that would confirm these conditions hold.

    Authors: We accept that explicit diagnostic checks for the centering, bounded-gain, and approximate linearity assumptions would increase confidence in the applicability of the sufficient condition. In the revised manuscript we will add (a) cross-encoder covariance spectra for the three model pairs, (b) residual plots and a simple nonlinearity test (e.g., quadratic term significance) on the fitted bridges, and (c) confirmation that subject-disjoint splits preserve zero-mean centering after standardization. These diagnostics will appear in a new appendix subsection. revision: yes

  3. Referee: [Results] Results (AEDS and CI cells): the claim that AEDS is positive with p<0.001 in all eight matched-CI cells and that the attribute lower bound is at least 0.081 rests on the linear bridge isolating the attribute channel; without explicit verification that residual subject or dataset covariates are not driving the transfer, the load-bearing claim that single-endpoint audits are insufficient does not yet follow.

    Authors: The matched-control protocol already pairs each transfer trial with a same-subject, same-dataset control that receives identical preprocessing and split structure; any residual subject or dataset covariate would therefore appear equally in the control distribution, which is subtracted in the reported lower bounds. Nevertheless, to further isolate the attribute channel we will add two supplementary controls in revision: (i) attribute-label permutation tests that destroy the attribute signal while preserving all other covariates, and (ii) an additional covariate-matched subset analysis on the largest dataset (EEGMMI). These will be reported alongside the existing AEDS results. revision: partial

Circularity Check

1 steps flagged

Sufficient-condition bound on attribute overlap beta is back-solved from the same fitted ridge-bridge transfer performance it purports to explain

specific steps
  1. fitted input called prediction [Abstract]
    "We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]."

    The lower-bound expression is derived under the stated sufficient condition; beta is then obtained by inverting the observed transfer performance of the fitted linear bridge on the identical subject-disjoint test splits. The numerical claim therefore depends on the same ridge-regression outputs it is invoked to interpret, rendering the 'proof' non-independent.

full rationale

The paper states a theorem giving a lower bound on chained-ridge attacker gain in terms of an attribute-projector overlap parameter beta, then immediately back-solves numerical values for beta directly from the observed cross-encoder transfer accuracies on the held-out splits. Because the reported interval [0.008, 0.198] and the claim of 'nontrivial' overlap are obtained by inverting the same linear-bridge fit that constitutes the headline empirical result, the mathematical 'proof' does not supply an independent constraint; the bound is a re-expression of the fitted quantities. No external verification of the centering, bounded-gain, or projector-dominance assumptions is reported, so the derivation chain reduces to the input data by construction. This matches the fitted-input-called-prediction pattern at the level of the central theorem.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Abstract-only view limits visibility; the central claim rests on the existence of a shared attribute projector overlap beta that is back-solved from data and on the validity of ridge regression plus linear bridge as transfer model.

free parameters (2)
  • beta = [0.008, 0.198]
    Attribute-coordinate projector overlap back-solved from transfer performance in the range [0.008, 0.198].
  • ridge regularization parameter
    Implicit in the ridge attribute decoder; value not stated in abstract.
axioms (2)
  • domain assumption Encoders share a nontrivial attribute-coordinate projector overlap beta
    Invoked as the sufficient condition enabling the chained ridge bridge attacker.
  • domain assumption Subject-disjoint matched-control splits provide valid 95% CI bounds
    Used to claim the 0.081 lower bound across directions.

pith-pipeline@v0.9.1-grok · 5922 in / 1394 out tokens · 24703 ms · 2026-06-27T16:18:53.712726+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 1 canonical work pages

  1. [1]

    B.; Mironov, I.; Talwar, K.; and Zhang, L

    Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. ACM Conference on Computer and Communications Security

  2. [2]

    V.; Spognardi, A.; Villani, A.; Vitali, D.; and Felici, G

    Ateniese, G.; Mancini, L. V.; Spognardi, A.; Villani, A.; Vitali, D.; and Felici, G. 2015. Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. International Journal of Security and Networks

  3. [3]

    Banville, H.; Chehab, O.; Hyv\"arinen, A.; Engemann, D.-A.; and Gramfort, A. 2021. Uncovering the structure of clinical EEG signals with self-supervised learning. Journal of Neural Engineering

  4. [4]

    Bonaci, T.; Calo, R.; and Chizeck, H. J. 2014. App stores for the brain: privacy and security in brain-computer interfaces. IEEE International Symposium on Ethics in Science, Technology and Engineering

  5. [5]

    Carlini, N.; Liu, C.; Erlingsson, U.; Kos, J.; and Song, D. 2019. The Secret Sharer: evaluating and testing unintended memorization in neural networks. USENIX Security Symposium

  6. [6]

    Carlini, N.; Tram\`er, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; Oprea, A.; and Raffel, C. 2021. Extracting training data from large language models. USENIX Security Symposium

  7. [7]

    Carlini, N.; Chien, S.; Nasr, M.; Song, S.; Terzis, A.; and Tram\`er, F. 2022. Membership inference attacks from first principles. IEEE Symposium on Security and Privacy

  8. [8]

    Coavoux, M.; Narayan, S.; and Cohen, S. B. 2018. Privacy-preserving neural representations of text. Empirical Methods in Natural Language Processing

  9. [9]

    M.; Weidemann, C

    DelPozo-Banos, M.; Travieso, C. M.; Weidemann, C. T.; and Alonso, J. B. 2015. EEG biometric identification: a thorough exploration of the time-frequency domain. Journal of Neural Engineering

  10. [10]

    Dwork, C.; McSherry, F.; Nissim, K.; and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference

  11. [11]

    Elazar, Y.; and Goldberg, Y. 2018. Adversarial removal of demographic attributes from text data. Empirical Methods in Natural Language Processing

  12. [12]

    Fredrikson, M.; Jha, S.; and Ristenpart, T. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. ACM Conference on Computer and Communications Security

  13. [13]

    A.; and Borisov, N

    Ganju, K.; Wang, Q.; Yang, W.; Gunter, C. A.; and Borisov, N. 2018. Property inference attacks on fully connected neural networks using permutation invariant representations. ACM Conference on Computer and Communications Security

  14. [14]

    L.; Amaral, L

    Goldberger, A. L.; Amaral, L. A. N.; Glass, L.; Hausdorff, J. M.; Ivanov, P. C.; Mark, R. G.; Mietus, J. E.; Moody, G. B.; Peng, C.-K.; and Stanley, H. E. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation

  15. [15]

    Ienca, M.; and Andorno, R. 2017. Towards new human rights in the age of neuroscience and neurotechnology. Life Sciences, Society and Policy

  16. [16]

    Jiang, W.-B.; Zhao, L.-M.; and Lu, B.-L. 2024. Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI. International Conference on Learning Representations (ICLR), spotlight

  17. [17]

    H.; Tuk, B.; Kamphuisen, H

    Kemp, B.; Zwinderman, A. H.; Tuk, B.; Kamphuisen, H. A. C.; and Obery\'e, J. J. L. 2000. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Transactions on Biomedical Engineering

  18. [18]

    Kostas, D.; Aroca-Ouellette, S.; and Rudzicz, F. 2021. BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Frontiers in Human Neuroscience

  19. [19]

    Mahendran, A.; and Vedaldi, A. 2015. Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition

  20. [20]

    Maiorana, E.; La Rocca, D.; and Campisi, P. 2015. On the permanence of EEG signals for biometric recognition. IEEE Transactions on Information Forensics and Security

  21. [21]

    Marcel, S.; and Mill \'a n, J. del R. 2007. Person authentication using brainwaves. IEEE Transactions on Pattern Analysis and Machine Intelligence

  22. [22]

    Martinovic, I.; Davies, D.; Frank, M.; Perito, D.; Ros, T.; and Song, D. 2012. On the feasibility of side-channel attacks with brain-computer interfaces. USENIX Security Symposium

  23. [23]

    Melis, L.; Song, C.; De Cristofaro, E.; and Shmatikov, V. 2019. Exploiting unintended feature leakage in collaborative learning. IEEE Symposium on Security and Privacy

  24. [24]

    X.; Kuleshov, V.; Shmatikov, V.; and Rush, A

    Morris, J. X.; Kuleshov, V.; Shmatikov, V.; and Rush, A. M. 2023. Text embeddings reveal almost as much as text. Empirical Methods in Natural Language Processing

  25. [25]

    Nasr, M.; Shokri, R.; and Houmansadr, A. 2019. Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. IEEE Symposium on Security and Privacy

  26. [26]

    Obeid, I.; and Picone, J. 2016. The Temple University Hospital EEG Data Corpus. Frontiers in Neuroscience

  27. [27]

    Palaniappan, R.; and Mandic, D. P. 2007. Biometrics from brain electrical activity: a machine learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence

  28. [28]

    R.; Chauveau, N.; Gaspar, C.; and Rousselet, G

    Pernet, C. R.; Chauveau, N.; Gaspar, C.; and Rousselet, G. A. 2011. LIMO EEG: a toolbox for hierarchical linear modeling of electroencephalographic data. Computational Intelligence and Neuroscience

  29. [29]

    Rousselet, G. 2016. LIMO EEG Dataset. University of Edinburgh DataShare

  30. [30]

    J.; Hinterberger, T.; Birbaumer, N.; and Wolpaw, J

    Schalk, G.; McFarland, D. J.; Hinterberger, T.; Birbaumer, N.; and Wolpaw, J. R. 2004. BCI2000: a general-purpose brain-computer interface system. IEEE Transactions on Biomedical Engineering

  31. [31]

    Shokri, R.; Stronati, M.; Song, C.; and Shmatikov, V. 2017. Membership inference attacks against machine learning models. IEEE Symposium on Security and Privacy

  32. [32]

    Shoeb, A. H. 2009. Application of machine learning to epileptic seizure onset detection and treatment. Ph.D. dissertation, Massachusetts Institute of Technology

  33. [33]

    Song, C.; and Raghunathan, A. 2020. Information leakage in embedding models. ACM Conference on Computer and Communications Security

  34. [34]

    Wang, G.; Liu, W.; He, Y.; Xu, C.; Ma, L.; and Li, H. 2024. EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals. Advances in Neural Information Processing Systems (NeurIPS)

  35. [35]

    B.; and Sun, J

    Yang, C.; Westover, M. B.; and Sun, J. 2023. BIOT: biosignal transformer for cross-data learning in the wild. Advances in Neural Information Processing Systems

  36. [36]

    Yeom, S.; Giacomelli, I.; Fredrikson, M.; and Jha, S. 2018. Privacy risk in machine learning: analyzing the connection to overfitting. IEEE Computer Security Foundations Symposium

  37. [37]

    Meng, L.; Jiang, X.; Huang, J.; Li, W.; Luo, H.; and Wu, D. 2024. User Identity Protection in EEG-based Brain-Computer Interfaces. arXiv preprint arXiv:2412.09854

  38. [38]

    Chen, X.; Jia, T.; Tu, Y.; and Wu, D. 2024. PAT: Privacy-Preserving Adversarial Transfer for Accurate, Robust and Privacy-Preserving EEG Decoding. arXiv preprint arXiv:2412.11390

  39. [39]

    S.; Drake, D.; Stuart, M.; and Manic, M

    Cobilean, V.; Mavikumbure, H. S.; Drake, D.; Stuart, M.; and Manic, M. 2025. Investigating Membership Inference Attacks Against CNN Models for BCI Systems. IEEE Journal of Biomedical and Health Informatics, 29(11). DOI: 10.1109/JBHI.2025.3593443

  40. [40]

    Fuhrmeister, K.; Pelzer, A.; Radke, F.; Lechinger, J.; Gharleghi, M.; K\"ollmer, T.; and Wolf, I. 2025. Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions. arXiv preprint arXiv:2509.20454

  41. [41]

    Tonekaboni, S.; Stempfle, L.; Fallahpour, A.; Gerych, W.; and Ghassemi, M. 2025. An Investigation of Memorization Risk in Healthcare Foundation Models. NeurIPS 2025 Workshop on Reliable and Responsible Foundation Models; arXiv:2510.12950