B[FM]²: Brain Foundation Model via Flow Matching with SplitUNet
Pith reviewed 2026-06-26 18:18 UTC · model grok-4.3
The pith
Pretraining directly on raw continuous EEG waveforms with flow matching and SplitUNet yields state-of-the-art results on seven of nine downstream tasks using roughly thirty times less data than prior foundation models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
B[FM]^2 shows that a foundation model pretrained via continuous-time flow matching on the unaltered EEG waveform, using a SplitUNet velocity network that factorizes temporal and electrode processing, reaches new state-of-the-art accuracy on seven of nine standard downstream EEG classification benchmarks after pretraining on only 36,895 segments (approximately 307 hours), one to two orders of magnitude less data than existing EEG foundation models, while also generating synthetic EEG traces that two neurologists cannot distinguish from real brain data (Cohen's kappa = -0.096).
What carries the argument
SplitUNet, a velocity network whose blocks factorize into independent 1D temporal convolutions and 1D electrode convolutions while downsampling only along the time axis to keep electrode topology intact at every scale.
If this is right
- EEG foundation models become practical with far smaller pretraining budgets than previously required.
- Downstream clinical and brain-computer-interface tasks can draw on a single backbone without task-specific discretization choices.
- Synthetic EEG data generated by the model can serve as realistic augmentation or privacy-preserving substitutes for real recordings.
- The same continuous flow-matching approach may reduce reliance on patching in other densely sampled time-series domains.
Where Pith is reading between the lines
- The time-electrode factorization in SplitUNet could be tested on other asymmetric sensor arrays such as MEG or multi-lead ECG.
- If the performance gain holds, future work should measure how much of the improvement traces to the absence of discretization versus the flow-matching objective itself.
- Scaling the pretraining corpus while keeping the raw-waveform inductive bias may further widen the gap over token-based models.
Load-bearing premise
Training on the raw continuous waveform with flow matching and SplitUNet factorization preserves fine-grained temporal dynamics and electrode positions better than any discretization method and without adding new artifacts that hurt downstream use.
What would settle it
A discretization-based transformer trained on the identical 36,895 segments reaches equal or higher accuracy on the same nine downstream tasks, or board-certified neurologists achieve positive Cohen's kappa when asked to distinguish the generated synthetic EEG from real recordings.
Figures
read the original abstract
EEG foundation models can learn generalizable representations from large-scale EEG corpora to enable single-backbone transfer across diverse clinical and brain-computer interface tasks. Existing models typically discretize the continuous multi-channel EEG waveform into patches or codebook tokens and train a transformer with masked self-supervision. Recognizing that this discretization fragments continuous brain rhythms and obscures fine-grained temporal dynamics, we present B[FM]$^2$(Brain Foundation Model via Flow Matching), whose inductive bias aligns with the data by pretraining directly on the raw signal using continuous-time flow matching without patches, tokenization, or masking. However, multi-channel EEG signals pose an architectural challenge for flow matching: time is densely sampled and highly autocorrelated (thousands of timepoints), while the electrode axis is short (tens of channels) at distinct scalp positions. To address this time-electrode asymmetry, we introduce SplitUNet, a velocity network that factorizes each block into separate 1D temporal and 1D electrode convolutions and downsamples only along time, preserving electrode topology throughout the hierarchy. B[FM]$^2$ sets a new state of the art on 7 of 9 standard downstream EEG classification tasks, using a pretraining budget of only 36,895 segments ($\approx$ 307h), 1-2 orders of magnitude ($\approx$ 30x) less than required by existing EEG foundation models. Further, it generates synthetic EEGs that two board-certified neurologists cannot distinguish from brain data (Cohen's $\kappa =$ -0.096). https://jd730.github.io/projects/BFM2
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces B[FM]^2, an EEG foundation model pretrained directly on raw continuous multi-channel waveforms via flow matching, avoiding discretization, patches, or masking. To handle the time-electrode asymmetry (long autocorrelated time axis vs. short electrode axis), it proposes SplitUNet, which factorizes each block into separate 1D temporal and 1D electrode convolutions while downsampling only along time. The paper reports new state-of-the-art results on 7 of 9 standard downstream EEG classification tasks using only ~36,895 segments (~307 hours) of pretraining data—approximately 30x less than prior EEG foundation models—and shows that its generated synthetic EEGs are indistinguishable from real data by two board-certified neurologists (Cohen's κ = -0.096).
Significance. If the empirical results hold after verification, the work offers a concrete alternative to discretization-based EEG foundation models by aligning the pretraining objective and architecture more closely with the continuous, topology-preserving nature of scalp EEG. The SplitUNet design is a domain-specific architectural contribution that could generalize to other asymmetric multi-channel time-series settings. The reduced data requirement and neurologist study are notable strengths if supported by rigorous controls.
major comments (2)
- [§4] §4 (Experiments) and associated tables: the SOTA claims on 7/9 downstream tasks and the 30x data-efficiency statement are presented without visible baseline tables, exact accuracy numbers, error bars, or statistical significance tests against the cited prior EEG foundation models; this information is load-bearing for the central performance claim and must be supplied for assessment.
- [§3.2, §4.3] §3.2 (SplitUNet) and §4.3 (Ablations): no ablation isolating the contribution of the time-only downsampling and electrode-topology preservation versus a standard UNet or transformer baseline is reported; without this, it is unclear whether the claimed preservation of fine-grained temporal dynamics is responsible for the downstream gains or the reduced data budget.
minor comments (2)
- [§2] The notation B[FM]^2 is introduced without an explicit expansion or comparison to standard flow-matching notation; a brief clarification in §2 would improve readability.
- [§4.4] The neurologist discrimination study reports Cohen's κ = -0.096 but does not specify the number of trials, stimulus presentation protocol, or inter-rater agreement; these details belong in the methods or supplementary material.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of aligning the pretraining objective and architecture with the continuous nature of EEG. We address each major comment below and will revise the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [§4] §4 (Experiments) and associated tables: the SOTA claims on 7/9 downstream tasks and the 30x data-efficiency statement are presented without visible baseline tables, exact accuracy numbers, error bars, or statistical significance tests against the cited prior EEG foundation models; this information is load-bearing for the central performance claim and must be supplied for assessment.
Authors: We agree that the current presentation of results requires additional detail to fully substantiate the SOTA and data-efficiency claims. In the revised manuscript we will expand the tables in §4 to include exact per-task accuracy numbers for B[FM]^2 and all cited baselines, standard deviations across multiple runs or cross-validation folds, and statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) against the prior EEG foundation models. The 30x data comparison will also be presented with explicit hour counts and references for each prior model. revision: yes
-
Referee: [§3.2, §4.3] §3.2 (SplitUNet) and §4.3 (Ablations): no ablation isolating the contribution of the time-only downsampling and electrode-topology preservation versus a standard UNet or transformer baseline is reported; without this, it is unclear whether the claimed preservation of fine-grained temporal dynamics is responsible for the downstream gains or the reduced data budget.
Authors: We acknowledge that an ablation isolating the architectural choices would strengthen the claims. In the revised §4.3 we will add a controlled ablation that compares SplitUNet (time-only downsampling + separate 1D temporal/electrode convolutions) against (i) a standard 2D UNet and (ii) a transformer-based velocity network, while holding the pretraining data budget fixed. This will help quantify the contribution of topology preservation and time-only downsampling to the observed gains. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper advances an empirical EEG foundation model via flow matching on raw continuous waveforms and a SplitUNet architecture that factorizes temporal and electrode convolutions. All load-bearing claims (SOTA on 7/9 downstream tasks with ~30x less pretraining data, and neurologist-indistinguishable synthetic EEG) rest on external empirical benchmarks and human discrimination studies rather than any internal derivation, parameter fit to the target metric, or self-citation chain. No equations or architectural choices are shown to reduce by construction to the reported results; the inductive bias is presented as a design choice justified by data properties, with performance validated independently.
Axiom & Free-Parameter Ledger
free parameters (2)
- SplitUNet architecture hyperparameters
- Flow matching training schedule
axioms (1)
- domain assumption Continuous-time flow matching can be stably trained on densely sampled autocorrelated time series without discretization.
invented entities (1)
-
SplitUNet
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688,
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688,
-
[2]
2020 international brain–computer interface competition: A review.Frontiers in human neuroscience, 16:898300,
Ji-Hoon Jeong, Jeong-Hyun Cho, Young-Eun Lee, Seo-Hyun Lee, Gi-Hwan Shin, Young-Seok Kweon, José del R Millán, Klaus-Robert Müller, and Seong-Whan Lee. 2020 international brain–computer interface competition: A review.Frontiers in human neuroscience, 16:898300,
2020
-
[3]
Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170,
-
[4]
Training approaches for current models are generally classified into two categories: contrastive learning [Chen et al., 2020] and masked reconstruction [He et al., 2022]
13 A Related Work A.1 Brain Foundation Model EEG foundation models aim to learn a general representation from large-scale brain signals (i.e., EEG) to have better performance on diverse downstream tasks, as opposed to task-specific meth- ods [Lawhern et al., 2018, Song et al., 2022]. Training approaches for current models are generally classified into two...
2018
-
[5]
CodeBrain [Ma et al., 2026] tokenizes both waveform and frequency for richer representation and leverages a state-space model [Gu et al., 2022]
proposes a unified framework for multiple pretraining datasets via 4D electrode positional encoding with block masking. CodeBrain [Ma et al., 2026] tokenizes both waveform and frequency for richer representation and leverages a state-space model [Gu et al., 2022]. Across these models, discrete tokenization and masked autoencoding remain central. However, ...
2026
-
[6]
TUAB.A binary normal-vs-abnormal benchmark drawn from the TUH EEG Corpus [Obeid and Picone, 2016], with labels assigned by clinical neurologists. We adopt the16-channel bipolar montage of the CBraMod [Wang et al., 2025] pipelines and 10 s non-overlapping windows on the released subject-disjoint split:∼409k windows from∼2,993patients. TUEV .Six-class event...
2016
-
[7]
(background)
Calibration bar:1s and50µV . (background). Same montage as TUAB; 5 s windows; subject-disjoint split; ∼112 k events. Class frequencies are strongly skewed toward BCKG. PhysioNet-MI.Four-class motor imagery [Goldberger et al., 2000]: imagined left-fist, right-fist, both-fists, and both-feet movements from109 subjects. We retain the original64 channels (res...
2000
-
[8]
Subject-wise split,30s window from∼89k samples
we drop it and keep the six EEG derivations only. Subject-wise split,30s window from∼89k samples. Mumtaz.Binary major-depressive-disorder vs. healthy-control classification [Mumtaz et al., 2017] on 19 MDD patients and 15 controls, 19-channel 10–20 EEG at 256 Hz. The released subject-wise split with5s non-overlapping windows:∼7,100samples. MAT (Mental Arit...
2017
-
[9]
We retain the 29 channels available across all subjects and use 10 s windows ( 51,307 samples)
(512 Hz, 10–20 montage). We retain the 29 channels available across all subjects and use 10 s windows ( 51,307 samples). Subjects PN16 and PN17 are held out as the test set; the remaining 12 are split 8 : 2 for training and validation. 16 HMC.Five-class AASM sleep staging from the 151-subject PSG corpus of Alvarez-Estevez and Rijsman [2021]. We use the fo...
2021
-
[10]
double banana
at standard clinical display settings ( 10 mm/s paper speed, 7µ V/mm sensitivity, 0.3–70 Hz bandpass, 60 Hz notch). Although the original data and generated samples are in a referential montage, we convert them to a longitudinal bipolar montage to match the neurologists’ routine reading convention. Real and generated segments were matched in duration and ...
1999
-
[11]
and Zhou et al. [2025]. All B[FM] 2 entries are mean ± standard deviation over five seeds. Dataset descriptions are in Appendix C. G.1 Mumtaz (Mental Disorder Diagnosis) Table 7: Mumtaz (2-class, MDD vs. healthy control). External-baseline numbers from Ouahidi et al. [2025]. Method Balanced Accuracy AUC-PR AUROC EEGNet 0.923±0.010 0.963±0.009 0.964±0.009 ...
2025
-
[12]
and Zhou et al. [2025]. Both source papers evaluate the publicly released checkpoints for BIOT [Yang et al., 2023a], LaBraM [Jiang et al., 2024], CBraMod [Wang et al., 2025], and REVE [Ouahidi et al., 2025]; the CSBrain paper additionally evaluates its own checkpoint. All baselines are reported on the same train/val/test splits our model uses, with the si...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.