Device Passport: Enabling Spatio-Temporal Pretrained Models to Generalize Across Input Layouts

Behrooz Mahasseni; Christopher M. Sandino; Ellen L. Zippi; Erdrin Azemi; Geeling Chau; Juri Minxha; Ran Liu; Wenhui Cui

arxiv: 2607.00249 · v1 · pith:AV6EIXELnew · submitted 2026-06-30 · 💻 cs.LG · eess.SP

Device Passport: Enabling Spatio-Temporal Pretrained Models to Generalize Across Input Layouts

Geeling Chau , Ran Liu , Juri Minxha , Wenhui Cui , Erdrin Azemi , Ellen L. Zippi , Behrooz Mahasseni , Christopher M. Sandino This is my paper

Pith reviewed 2026-07-02 19:30 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords biosignal foundation modelschannel embeddinglayout transferpretrained modelsEEGexpert mixturesdevice generalizationspatio-temporal models

0 comments

The pith

Device Passport embeds each channel with both its activity and metadata so pretrained biosignal models can transfer across differing sensor layouts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Biosignal foundation models struggle when a new device uses a different arrangement of sensors than the data used for pretraining. The paper tests several channel embedding strategies under controlled subset-transfer conditions and on realistic ear-EEG transfer. Device Passport replaces simple lookup embeddings with expert mixture models whose inputs are each channel's functional activity plus its metadata. In the layout-transfer regimes that matter most for the work, this method matches or exceeds the strongest learned baseline. The authors conclude that embedding design is a central lever for reusing large pretrained models on new hardware.

Core claim

Device Passport learns experts and mixture models that take each channel's functional activity and metadata as input; when pretraining layouts differ substantially from the downstream layout, this embedding produces competitive overall performance and improves over the strongest learned baseline in the motivating transfer regimes.

What carries the argument

Device Passport: a channel embedding that routes each sensor through expert mixture models conditioned on both functional activity and metadata.

If this is right

Pretrained models can be reused on new devices without collecting large matched datasets for every layout.
Mixture models that condition on both activity and metadata outperform embeddings that use only one of those signals in layout-transfer settings.
Performance gains appear in both synthetic subset-transfer tests and real ear-EEG recordings.
Channel embedding choice becomes a first-order design decision for any spatio-temporal biosignal foundation model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same metadata-plus-activity conditioning could be tested on non-EEG biosignals such as ECG or EMG arrays whose sensor counts vary across products.
If metadata includes physical coordinates, the method might also reduce the need for explicit spatial alignment steps before decoding.
A natural next measurement is whether the expert mixture weights themselves become interpretable as soft assignments of channels to functional roles across devices.

Load-bearing premise

Incorporating each channel's functional activity and metadata into expert mixture models will enable effective generalization when pretraining layouts differ substantially from the downstream decoding layout.

What would settle it

A controlled transfer experiment in which Device Passport shows no improvement or clear degradation relative to the strongest learned baseline on a new layout whose sensors differ substantially in number and placement from the pretraining set.

Figures

Figures reproduced from arXiv: 2607.00249 by Behrooz Mahasseni, Christopher M. Sandino, Ellen L. Zippi, Erdrin Azemi, Geeling Chau, Juri Minxha, Ran Liu, Wenhui Cui.

**Figure 1.** Figure 1: Layout Transfer Challenge + Channel Embedding Techniques. (a) Pretraining often occurs on a single pretraining layout. (b) Decoding needs to work on variable downstream layouts. (c) Channel embedding methods help identify the origin of functional activity, but many techniques do not learn transferable representations due to channel layout mismatch between pretraining and decoding. (c.i) Identity lookup c… view at source ↗

**Figure 2.** Figure 2: Our method. We learn (a) MLP or (b) Cross-Attention mechanisms that combine channel activity with XYZ location to weight or attend to a set of learned experts. closely. The dataset used is the Temple University Hospital (TUH) EEG Corpus (Obeid & Picone, 2016), which primarily uses a 19-channel 10–20 electrode configuration across 14k subjects for a total of 27k hours of recording. Similar to CBraMod’s pro… view at source ↗

**Figure 3.** Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Variable pretraining layouts. Downstream decoding performance (AUROC, y-axis) versus number of fine-tuning subjects (xaxis), comparing ACPE across pretraining layouts (a) and channel embedding techniques across matched and mismatched layout-transfer settings (b–d). Higher AUROC is better; shaded bands summarize variation across random seeds. (a) Pretraining with different layouts (colors) hurts ACPE’s abi… view at source ↗

**Figure 5.** Figure 5: Pretraining on full TUH, fine-tuning on an ear-EEG form factor for sleep staging classification. (a) We plot the performance difference between ACPE and Device Passport across subjects at N = 6 training subjects as a focused comparison against the strongest prior learned baseline. Positive values indicate that the Device Passport variants improve over ACPE; negative values indicate that ACPE performs bette… view at source ↗

**Figure 6.** Figure 6: ACPE compared with traditional positional encoding methods. We compare ACPE against common channel embedding baselines (e.g., APE, Channel ID, and XYZ-based encodings) in the variable-layout pretraining setting. D. EESM17 Subject Note The subject numbering used in our EESM17 plots follows our evaluation order rather than the numbering used in (Mikkelsen et al., 2017). In particular, the poor-contact subjec… view at source ↗

read the original abstract

New device layouts pose a challenging modeling problem due to the lack of large datasets for each specific layout. Biosignal foundation models offer a plausible solution if they are able to generalize to new layouts effectively. To improve cross-layout transfer, we study how different channel embedding techniques behave when pretraining layouts differ substantially from the downstream decoding layout. We propose Device Passport, a new channel embedding technique that learns experts and mixture models that take each channel's functional activity and metadata as input. This contrasts with prior embedding methods, which typically use only functional information or only metadata to look up learned or fixed positional embeddings. Across controlled subset-transfer experiments and realistic transfer to ear-EEG, Device Passport is competitive overall and improves over the strongest learned baseline in the layout-transfer regimes that motivate this work. These results suggest that channel embedding design is a key consideration when reusing large-scale pretrained biosignal models on new devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Device Passport mixes per-channel activity stats and metadata through experts to handle layout shifts in biosignal models, but the abstract supplies no numbers so the gains cannot be checked.

read the letter

Two things stand out right away. The paper introduces Device Passport, a channel embedding that routes through learned experts conditioned on both a channel's functional activity and its metadata. This is positioned against earlier methods that use only one of those signals. The abstract reports that the approach stays competitive overall and beats the strongest learned baseline on the layout-transfer cases that matter, including controlled subset transfers and ear-EEG.

The combination itself is the clearest novelty. Prior channel embeddings typically pick either activity-derived features or metadata lookup; here both feed the mixture model. That targets a practical pain point in biosignal foundation models where pretraining layouts rarely match the downstream device.

The idea is reasonable on paper. If the experts can separate layout-invariant structure from layout-specific cues, reuse of large pretrained models becomes more feasible without full retraining.

The main limitation is obvious from the abstract alone: no quantitative results, no dataset sizes, no error bars, and no description of how the routing features are computed or whether they were ablated. Without those, it is impossible to tell whether the reported edge comes from the proposed mechanism or from dataset quirks. The stress-test concern about layout-specific leakage in the activity or metadata features is therefore still open.

This work is aimed at researchers building or adapting foundation models for EEG and similar signals. Anyone already working on variable sensor layouts or transfer in physiological ML would find the setup relevant. It deserves a serious referee to examine the full experiments and check whether the routing actually isolates the claimed effect.

Referee Report

2 major / 1 minor

Summary. The paper introduces Device Passport, a channel embedding technique for biosignal foundation models that uses expert mixture models taking each channel's functional activity and metadata as input. This is proposed to improve generalization when pretraining layouts differ from downstream decoding layouts. The method is evaluated on controlled subset-transfer experiments and realistic transfer to ear-EEG, where it is reported to be competitive overall and to improve over the strongest learned baseline in layout-transfer regimes.

Significance. If the results hold, this work could have significant impact by enabling the reuse of large-scale pretrained biosignal models across different device layouts, which is a practical challenge in the field. The approach of conditioning expert mixtures on both activity statistics and metadata offers a novel way to handle layout variability compared to prior methods that use only functional information or only metadata.

major comments (2)

[Abstract] The abstract asserts competitive performance and specific improvements but supplies no quantitative results, error bars, dataset sizes, or experimental details, making it challenging to evaluate the claims without referring to the full experimental sections.
[Experiments] The manuscript does not appear to include a section that isolates the routing behavior of the expert mixture models on a deliberately out-of-distribution layout pair while holding other factors fixed. This leaves the central claim that the routing produces layout-robust channel embeddings vulnerable to alternative explanations based on dataset idiosyncrasies.

minor comments (1)

[Methods] Clarify the exact architecture of the expert mixture models and how metadata is encoded, as this is central to reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential impact of Device Passport. We address each major comment below.

read point-by-point responses

Referee: [Abstract] The abstract asserts competitive performance and specific improvements but supplies no quantitative results, error bars, dataset sizes, or experimental details, making it challenging to evaluate the claims without referring to the full experimental sections.

Authors: We agree that including a small number of quantitative highlights in the abstract would help readers assess the claims more readily. We will revise the abstract to incorporate key metrics (e.g., relative improvement over the strongest baseline in layout-transfer settings) while remaining within typical length limits. revision: yes
Referee: [Experiments] The manuscript does not appear to include a section that isolates the routing behavior of the expert mixture models on a deliberately out-of-distribution layout pair while holding other factors fixed. This leaves the central claim that the routing produces layout-robust channel embeddings vulnerable to alternative explanations based on dataset idiosyncrasies.

Authors: The controlled subset-transfer experiments (Section 4.2) deliberately vary only the channel layout while drawing pretraining and downstream data from the same underlying dataset distribution, thereby holding dataset-specific factors fixed. These experiments therefore isolate the contribution of the routing mechanism to layout robustness. We will add an explicit paragraph in the experimental section clarifying this design choice and its relation to the routing behavior. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims rest on experimental comparisons.

full rationale

The manuscript proposes Device Passport as a channel embedding method using expert mixture models conditioned on per-channel functional activity and metadata. Its headline results (competitive performance and gains over baselines in layout-transfer regimes) are presented as outcomes of controlled subset-transfer experiments and ear-EEG transfer evaluations. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the text. The argument chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5717 in / 1006 out tokens · 25117 ms · 2026-07-02T19:30:31.007490+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 4 canonical work pages

[1]

International Conference on Learning Representations , year =

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author =. International Conference on Learning Representations , year =
[2]

International Conference on Learning Representations (2025)

Cbramod: A criss-cross brain foundation model for eeg decoding , author=. arXiv preprint arXiv:2412.07236 , year=

work page arXiv
[3]

Journal of neural engineering , volume=

EEGNet: a compact convolutional neural network for EEG-based brain--computer interfaces , author=. Journal of neural engineering , volume=. 2018 , publisher=

2018
[4]

Conditional positional encodings for vision transformers

Conditional positional encodings for vision transformers , author=. arXiv preprint arXiv:2102.10882 , year=

work page arXiv
[5]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[6]

Advances in Neural Information Processing Systems , volume=

A unified, scalable framework for neural population decoding , author=. Advances in Neural Information Processing Systems , volume=
[7]

Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765, 2024

Large brain model for learning generic representations with tremendous EEG data in BCI , author=. arXiv preprint arXiv:2405.18765 , year=

work page arXiv
[8]

ArXiv , pages=

Population transformer: Learning population-level representations of neural activity , author=. ArXiv , pages=
[9]

Frontiers in Neuroinformatics , volume=

MEG and EEG data analysis with MNE-Python , author=. Frontiers in Neuroinformatics , volume=. 2013 , publisher=

2013
[10]

Frontiers in neuroscience , volume=

The temple university hospital EEG data corpus , author=. Frontiers in neuroscience , volume=. 2016 , publisher=

2016
[11]

Biomedical engineering online , volume=

Automatic sleep staging using ear-EEG , author=. Biomedical engineering online , volume=. 2017 , publisher=

2017
[12]

2018 Computing in Cardiology Conference (CinC) , volume=

You snooze, you win: the physionet/computing in cardiology challenge 2018 , author=. 2018 Computing in Cardiology Conference (CinC) , volume=. 2018 , organization=

2018
[13]

circulation , volume=

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals , author=. circulation , volume=. 2000 , publisher=

2000
[14]

IEEE Transactions on biomedical engineering , volume=

BCI2000: a general-purpose brain-computer interface (BCI) system , author=. IEEE Transactions on biomedical engineering , volume=. 2004 , publisher=

2004
[15]

Nature , volume=

A generic non-invasive neuromotor interface for human-computer interaction , author=. Nature , volume=. 2025 , publisher=

2025
[16]

Large-scale training of foundation models for wearable biosignals

Large-scale training of foundation models for wearable biosignals , author=. arXiv preprint arXiv:2312.05409 , year=

work page arXiv

[1] [1]

International Conference on Learning Representations , year =

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author =. International Conference on Learning Representations , year =

[2] [2]

International Conference on Learning Representations (2025)

Cbramod: A criss-cross brain foundation model for eeg decoding , author=. arXiv preprint arXiv:2412.07236 , year=

work page arXiv

[3] [3]

Journal of neural engineering , volume=

EEGNet: a compact convolutional neural network for EEG-based brain--computer interfaces , author=. Journal of neural engineering , volume=. 2018 , publisher=

2018

[4] [4]

Conditional positional encodings for vision transformers

Conditional positional encodings for vision transformers , author=. arXiv preprint arXiv:2102.10882 , year=

work page arXiv

[5] [5]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[6] [6]

Advances in Neural Information Processing Systems , volume=

A unified, scalable framework for neural population decoding , author=. Advances in Neural Information Processing Systems , volume=

[7] [7]

Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765, 2024

Large brain model for learning generic representations with tremendous EEG data in BCI , author=. arXiv preprint arXiv:2405.18765 , year=

work page arXiv

[8] [8]

ArXiv , pages=

Population transformer: Learning population-level representations of neural activity , author=. ArXiv , pages=

[9] [9]

Frontiers in Neuroinformatics , volume=

MEG and EEG data analysis with MNE-Python , author=. Frontiers in Neuroinformatics , volume=. 2013 , publisher=

2013

[10] [10]

Frontiers in neuroscience , volume=

The temple university hospital EEG data corpus , author=. Frontiers in neuroscience , volume=. 2016 , publisher=

2016

[11] [11]

Biomedical engineering online , volume=

Automatic sleep staging using ear-EEG , author=. Biomedical engineering online , volume=. 2017 , publisher=

2017

[12] [12]

2018 Computing in Cardiology Conference (CinC) , volume=

You snooze, you win: the physionet/computing in cardiology challenge 2018 , author=. 2018 Computing in Cardiology Conference (CinC) , volume=. 2018 , organization=

2018

[13] [13]

circulation , volume=

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals , author=. circulation , volume=. 2000 , publisher=

2000

[14] [14]

IEEE Transactions on biomedical engineering , volume=

BCI2000: a general-purpose brain-computer interface (BCI) system , author=. IEEE Transactions on biomedical engineering , volume=. 2004 , publisher=

2004

[15] [15]

Nature , volume=

A generic non-invasive neuromotor interface for human-computer interaction , author=. Nature , volume=. 2025 , publisher=

2025

[16] [16]

Large-scale training of foundation models for wearable biosignals

Large-scale training of foundation models for wearable biosignals , author=. arXiv preprint arXiv:2312.05409 , year=

work page arXiv