pith. sign in

arxiv: 2606.20812 · v1 · pith:D3JC4FDBnew · submitted 2026-06-18 · 💻 cs.LG · cs.AI

B[FM]²: Brain Foundation Model via Flow Matching with SplitUNet

Pith reviewed 2026-06-26 18:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords EEG foundation modelflow matchingSplitUNetcontinuous pretrainingsynthetic EEGbrain-computer interfacemulti-channel time seriesraw waveform modeling
0
0 comments X

The pith

Pretraining directly on raw continuous EEG waveforms with flow matching and SplitUNet yields state-of-the-art results on seven of nine downstream tasks using roughly thirty times less data than prior foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that discretizing EEG into patches or tokens breaks the continuous rhythms that carry clinically relevant information. B[FM]^2 instead applies flow matching straight to the raw multi-channel waveform, learning a velocity field that maps noise to the observed signal without any masking or tokenization. SplitUNet addresses the time-electrode asymmetry by factoring each network block into separate one-dimensional convolutions along time and along electrodes, downsampling only the time axis so electrode positions stay fixed through the layers. If correct, this produces both stronger transfer to classification tasks and synthetic EEG that board-certified neurologists cannot tell apart from real recordings.

Core claim

B[FM]^2 shows that a foundation model pretrained via continuous-time flow matching on the unaltered EEG waveform, using a SplitUNet velocity network that factorizes temporal and electrode processing, reaches new state-of-the-art accuracy on seven of nine standard downstream EEG classification benchmarks after pretraining on only 36,895 segments (approximately 307 hours), one to two orders of magnitude less data than existing EEG foundation models, while also generating synthetic EEG traces that two neurologists cannot distinguish from real brain data (Cohen's kappa = -0.096).

What carries the argument

SplitUNet, a velocity network whose blocks factorize into independent 1D temporal convolutions and 1D electrode convolutions while downsampling only along the time axis to keep electrode topology intact at every scale.

If this is right

  • EEG foundation models become practical with far smaller pretraining budgets than previously required.
  • Downstream clinical and brain-computer-interface tasks can draw on a single backbone without task-specific discretization choices.
  • Synthetic EEG data generated by the model can serve as realistic augmentation or privacy-preserving substitutes for real recordings.
  • The same continuous flow-matching approach may reduce reliance on patching in other densely sampled time-series domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The time-electrode factorization in SplitUNet could be tested on other asymmetric sensor arrays such as MEG or multi-lead ECG.
  • If the performance gain holds, future work should measure how much of the improvement traces to the absence of discretization versus the flow-matching objective itself.
  • Scaling the pretraining corpus while keeping the raw-waveform inductive bias may further widen the gap over token-based models.

Load-bearing premise

Training on the raw continuous waveform with flow matching and SplitUNet factorization preserves fine-grained temporal dynamics and electrode positions better than any discretization method and without adding new artifacts that hurt downstream use.

What would settle it

A discretization-based transformer trained on the identical 36,895 segments reaches equal or higher accuracy on the same nine downstream tasks, or board-certified neurologists achieve positive Cohen's kappa when asked to distinguish the generated synthetic EEG from real recordings.

Figures

Figures reproduced from arXiv: 2606.20812 by Ila Fiete, Jaedong Hwang, Kathleen Zhang, Konstantinos Kontras, Maarten De Vos, Maarten Vanmarcke, Paul Pu Liang, Wei Dai.

Figure 1
Figure 1. Figure 1: Continuous-time generative pretraining for EEG. (Top) The flow matching process maps Gaussian noise (t = 0.0) to continuous, multi-channel EEG signals (t = 1.0) along a continuous trajectory. (Left/Center) During pretraining, our proposed SplitUNet — a UNet velocity network in which every spatiotemporal convolution is factorized into a 1D temporal followed by a 1D electrode conv, with downsampling restrict… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of SplitUNet. (left) Four encoder stages halve only the time axis (electrode dimension preserved); a self-attention bottleneck mixes globally; the decoder mirrors the encoder with skip connections and time-only upsampling. (center) Each UNet block contains two residual sub￾blocks separated by a linear-attention layer [Shen et al., 2021]; each sub-block alternates Conv(1+1)D operators with AdaL… view at source ↗
Figure 3
Figure 3. Figure 3: Real vs. B[FM]2 EEG segments. A 30-second held-out TUEG segment (middle) and an unconditional B[FM]2 sample (right), displayed in the 19-channel referential 10–20 montage (left; electrode layout from Ferrell et al. [2020]). Both panels exhibit both spatial and temporal coherence characteristic of clinical EEG. For example, notice how adjacent electrodes in the same brain region (e.g., the frontal red trace… view at source ↗
Figure 4
Figure 4. Figure 4: B[FM]2 generates physiologically diverse EEG patterns. Three B[FM]2 samples illustrating distinct brain-state patterns, curated from the unconditional generation pool by simple heuristics and displayed in the canonical 10–20 montage. (left) a sharp-wave / spike-like transient at ∼20 s. (center) δ-dominant slow-wave activity across all electrodes, characteristic of slow-wave sleep. (right) eyes-closed poste… view at source ↗
Figure 5
Figure 5. Figure 5: Real vs. B[FM]2 EEG segments. Three 30 s held-out TUEG segments (top row) and three unconditional B[FM]2 samples (bottom row), shown in the 19-channel referential 10–20 montage used for pretraining. Readers in the blinded study ( [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Blinded neurologist rating interface. (a) Welcome screen describing the task, the 5-point realness Likert, and selecting the longitudinal bipolar (“double banana”) montage—the standard clinical reading layout, which removes reference-electrode artifacts and matches neurologists’ habitual reading view. (b) Per-trial rating screen: a 30 s segment in the bipolar montage at clinical display settings, with a 50… view at source ↗
read the original abstract

EEG foundation models can learn generalizable representations from large-scale EEG corpora to enable single-backbone transfer across diverse clinical and brain-computer interface tasks. Existing models typically discretize the continuous multi-channel EEG waveform into patches or codebook tokens and train a transformer with masked self-supervision. Recognizing that this discretization fragments continuous brain rhythms and obscures fine-grained temporal dynamics, we present B[FM]$^2$(Brain Foundation Model via Flow Matching), whose inductive bias aligns with the data by pretraining directly on the raw signal using continuous-time flow matching without patches, tokenization, or masking. However, multi-channel EEG signals pose an architectural challenge for flow matching: time is densely sampled and highly autocorrelated (thousands of timepoints), while the electrode axis is short (tens of channels) at distinct scalp positions. To address this time-electrode asymmetry, we introduce SplitUNet, a velocity network that factorizes each block into separate 1D temporal and 1D electrode convolutions and downsamples only along time, preserving electrode topology throughout the hierarchy. B[FM]$^2$ sets a new state of the art on 7 of 9 standard downstream EEG classification tasks, using a pretraining budget of only 36,895 segments ($\approx$ 307h), 1-2 orders of magnitude ($\approx$ 30x) less than required by existing EEG foundation models. Further, it generates synthetic EEGs that two board-certified neurologists cannot distinguish from brain data (Cohen's $\kappa =$ -0.096). https://jd730.github.io/projects/BFM2

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces B[FM]^2, an EEG foundation model pretrained directly on raw continuous multi-channel waveforms via flow matching, avoiding discretization, patches, or masking. To handle the time-electrode asymmetry (long autocorrelated time axis vs. short electrode axis), it proposes SplitUNet, which factorizes each block into separate 1D temporal and 1D electrode convolutions while downsampling only along time. The paper reports new state-of-the-art results on 7 of 9 standard downstream EEG classification tasks using only ~36,895 segments (~307 hours) of pretraining data—approximately 30x less than prior EEG foundation models—and shows that its generated synthetic EEGs are indistinguishable from real data by two board-certified neurologists (Cohen's κ = -0.096).

Significance. If the empirical results hold after verification, the work offers a concrete alternative to discretization-based EEG foundation models by aligning the pretraining objective and architecture more closely with the continuous, topology-preserving nature of scalp EEG. The SplitUNet design is a domain-specific architectural contribution that could generalize to other asymmetric multi-channel time-series settings. The reduced data requirement and neurologist study are notable strengths if supported by rigorous controls.

major comments (2)
  1. [§4] §4 (Experiments) and associated tables: the SOTA claims on 7/9 downstream tasks and the 30x data-efficiency statement are presented without visible baseline tables, exact accuracy numbers, error bars, or statistical significance tests against the cited prior EEG foundation models; this information is load-bearing for the central performance claim and must be supplied for assessment.
  2. [§3.2, §4.3] §3.2 (SplitUNet) and §4.3 (Ablations): no ablation isolating the contribution of the time-only downsampling and electrode-topology preservation versus a standard UNet or transformer baseline is reported; without this, it is unclear whether the claimed preservation of fine-grained temporal dynamics is responsible for the downstream gains or the reduced data budget.
minor comments (2)
  1. [§2] The notation B[FM]^2 is introduced without an explicit expansion or comparison to standard flow-matching notation; a brief clarification in §2 would improve readability.
  2. [§4.4] The neurologist discrimination study reports Cohen's κ = -0.096 but does not specify the number of trials, stimulus presentation protocol, or inter-rater agreement; these details belong in the methods or supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of aligning the pretraining objective and architecture with the continuous nature of EEG. We address each major comment below and will revise the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments) and associated tables: the SOTA claims on 7/9 downstream tasks and the 30x data-efficiency statement are presented without visible baseline tables, exact accuracy numbers, error bars, or statistical significance tests against the cited prior EEG foundation models; this information is load-bearing for the central performance claim and must be supplied for assessment.

    Authors: We agree that the current presentation of results requires additional detail to fully substantiate the SOTA and data-efficiency claims. In the revised manuscript we will expand the tables in §4 to include exact per-task accuracy numbers for B[FM]^2 and all cited baselines, standard deviations across multiple runs or cross-validation folds, and statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) against the prior EEG foundation models. The 30x data comparison will also be presented with explicit hour counts and references for each prior model. revision: yes

  2. Referee: [§3.2, §4.3] §3.2 (SplitUNet) and §4.3 (Ablations): no ablation isolating the contribution of the time-only downsampling and electrode-topology preservation versus a standard UNet or transformer baseline is reported; without this, it is unclear whether the claimed preservation of fine-grained temporal dynamics is responsible for the downstream gains or the reduced data budget.

    Authors: We acknowledge that an ablation isolating the architectural choices would strengthen the claims. In the revised §4.3 we will add a controlled ablation that compares SplitUNet (time-only downsampling + separate 1D temporal/electrode convolutions) against (i) a standard 2D UNet and (ii) a transformer-based velocity network, while holding the pretraining data budget fixed. This will help quantify the contribution of topology preservation and time-only downsampling to the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper advances an empirical EEG foundation model via flow matching on raw continuous waveforms and a SplitUNet architecture that factorizes temporal and electrode convolutions. All load-bearing claims (SOTA on 7/9 downstream tasks with ~30x less pretraining data, and neurologist-indistinguishable synthetic EEG) rest on external empirical benchmarks and human discrimination studies rather than any internal derivation, parameter fit to the target metric, or self-citation chain. No equations or architectural choices are shown to reduce by construction to the reported results; the inductive bias is presented as a design choice justified by data properties, with performance validated independently.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim depends on standard neural-network training assumptions plus the untested premise that continuous flow matching better captures EEG dynamics than discretization; no new physical entities are postulated.

free parameters (2)
  • SplitUNet architecture hyperparameters
    Number of layers, channel widths, and downsampling factors chosen to fit the EEG data shape; values not stated in abstract.
  • Flow matching training schedule
    Noise schedule and velocity network capacity are fitted or tuned on the pretraining corpus.
axioms (1)
  • domain assumption Continuous-time flow matching can be stably trained on densely sampled autocorrelated time series without discretization.
    Invoked when the paper states that avoiding patches preserves fine-grained temporal dynamics.
invented entities (1)
  • SplitUNet no independent evidence
    purpose: Velocity network that factorizes 1D temporal and 1D electrode convolutions while preserving electrode topology.
    New architectural component introduced to solve the time-electrode asymmetry.

pith-pipeline@v0.9.1-grok · 5850 in / 1505 out tokens · 33300 ms · 2026-06-26T18:18:21.301208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 1 linked inside Pith

  1. [1]

    A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688,

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688,

  2. [2]

    2020 international brain–computer interface competition: A review.Frontiers in human neuroscience, 16:898300,

    Ji-Hoon Jeong, Jeong-Hyun Cho, Young-Eun Lee, Seo-Hyun Lee, Gi-Hwan Shin, Young-Seok Kweon, José del R Millán, Klaus-Robert Müller, and Seong-Whan Lee. 2020 international brain–computer interface competition: A review.Frontiers in human neuroscience, 16:898300,

  3. [3]

    Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170,

    Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170,

  4. [4]

    Training approaches for current models are generally classified into two categories: contrastive learning [Chen et al., 2020] and masked reconstruction [He et al., 2022]

    13 A Related Work A.1 Brain Foundation Model EEG foundation models aim to learn a general representation from large-scale brain signals (i.e., EEG) to have better performance on diverse downstream tasks, as opposed to task-specific meth- ods [Lawhern et al., 2018, Song et al., 2022]. Training approaches for current models are generally classified into two...

  5. [5]

    CodeBrain [Ma et al., 2026] tokenizes both waveform and frequency for richer representation and leverages a state-space model [Gu et al., 2022]

    proposes a unified framework for multiple pretraining datasets via 4D electrode positional encoding with block masking. CodeBrain [Ma et al., 2026] tokenizes both waveform and frequency for richer representation and leverages a state-space model [Gu et al., 2022]. Across these models, discrete tokenization and masked autoencoding remain central. However, ...

  6. [6]

    TUAB.A binary normal-vs-abnormal benchmark drawn from the TUH EEG Corpus [Obeid and Picone, 2016], with labels assigned by clinical neurologists. We adopt the16-channel bipolar montage of the CBraMod [Wang et al., 2025] pipelines and 10 s non-overlapping windows on the released subject-disjoint split:∼409k windows from∼2,993patients. TUEV .Six-class event...

  7. [7]

    (background)

    Calibration bar:1s and50µV . (background). Same montage as TUAB; 5 s windows; subject-disjoint split; ∼112 k events. Class frequencies are strongly skewed toward BCKG. PhysioNet-MI.Four-class motor imagery [Goldberger et al., 2000]: imagined left-fist, right-fist, both-fists, and both-feet movements from109 subjects. We retain the original64 channels (res...

  8. [8]

    Subject-wise split,30s window from∼89k samples

    we drop it and keep the six EEG derivations only. Subject-wise split,30s window from∼89k samples. Mumtaz.Binary major-depressive-disorder vs. healthy-control classification [Mumtaz et al., 2017] on 19 MDD patients and 15 controls, 19-channel 10–20 EEG at 256 Hz. The released subject-wise split with5s non-overlapping windows:∼7,100samples. MAT (Mental Arit...

  9. [9]

    We retain the 29 channels available across all subjects and use 10 s windows ( 51,307 samples)

    (512 Hz, 10–20 montage). We retain the 29 channels available across all subjects and use 10 s windows ( 51,307 samples). Subjects PN16 and PN17 are held out as the test set; the remaining 12 are split 8 : 2 for training and validation. 16 HMC.Five-class AASM sleep staging from the 151-subject PSG corpus of Alvarez-Estevez and Rijsman [2021]. We use the fo...

  10. [10]

    double banana

    at standard clinical display settings ( 10 mm/s paper speed, 7µ V/mm sensitivity, 0.3–70 Hz bandpass, 60 Hz notch). Although the original data and generated samples are in a referential montage, we convert them to a longitudinal bipolar montage to match the neurologists’ routine reading convention. Real and generated segments were matched in duration and ...

  11. [11]

    and Zhou et al. [2025]. All B[FM] 2 entries are mean ± standard deviation over five seeds. Dataset descriptions are in Appendix C. G.1 Mumtaz (Mental Disorder Diagnosis) Table 7: Mumtaz (2-class, MDD vs. healthy control). External-baseline numbers from Ouahidi et al. [2025]. Method Balanced Accuracy AUC-PR AUROC EEGNet 0.923±0.010 0.963±0.009 0.964±0.009 ...

  12. [12]

    and Zhou et al. [2025]. Both source papers evaluate the publicly released checkpoints for BIOT [Yang et al., 2023a], LaBraM [Jiang et al., 2024], CBraMod [Wang et al., 2025], and REVE [Ouahidi et al., 2025]; the CSBrain paper additionally evaluates its own checkpoint. All baselines are reported on the same train/val/test splits our model uses, with the si...