arxiv: 2605.03874 · v1 · submitted 2026-05-05 · 💻 cs.LG · cs.AI

Recognition: unknown

Spatiotemporal Convolutions on EEG signal -- A Representation Learning Perspective on Efficient and Explainable EEG Classification with Convolutional Neural Nets

Laurits Dixen , Stefan Heinrich , Paolo Burelli

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords convolutionalconvolutionsclassificationmodelsacrossalongchannelcnns

0 comments

The pith

2D spatiotemporal convolutions reduce training time on high-dimensional EEG data while maintaining performance and creating distinct representational geometries compared with concatenated 1D convolutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

EEG signals are time series recorded from electrodes on the scalp. Standard shallow CNNs for classifying these signals apply separate one-dimensional convolutions along the time axis and along the electrode axis, then combine the results. The authors instead test a single two-dimensional convolution that operates across both time and space at once. On a simple 3-channel task the two approaches perform similarly. On a 22-channel task the 2D version trains noticeably faster without loss of accuracy. The authors also measure how similar the internal patterns learned by each model are; the 1D and 2D networks produce clearly different geometries even when final accuracy is the same. Spectral feature importance stays comparable, so the speed gain is not explained by different frequency content. The work therefore shows that the architectural choice of how to mix spatial and temporal information changes the learned representations and the computational cost, even when the numerical operations are equivalent.

Core claim

2D convolutions significantly reduce training time in high-dimensional tasks while maintaining performance and yield vastly different representational geometries than 1D models.

Load-bearing premise

That the observed difference in representational similarity and training time is caused by the convolution dimensionality rather than by other unstated differences in model capacity or optimization.

Figures

Figures reproduced from arXiv: 2605.03874 by Laurits Dixen, Paolo Burelli, Stefan Heinrich.

**Figure 1.** Figure 1: 1D and 2D convolution illustration output layer. Conformers are significantly bigger than the CNNs but should offer better performance on larger datasets. 2) 2D models: The 2D models are the main independent variable in this paper, as we aim to isolate the effect of adding spatiotemporal learning rather than splitting them into spatial and temporal layers. The 2D models are referred to as such since they c… view at source ↗

**Figure 2.** Figure 2: Performance results of 1D (circles) and 2D (squares) models are view at source ↗

**Figure 5.** Figure 5: Representational Dissimilarity Matrices (RDMs) of datasets 1 and 2. view at source ↗

read the original abstract

Classification of EEG signals using shallow Convolutional Neural Networks (CNNs) is a prevalent and successful approach across a variety of fields. Most of these models use independent one-dimensional (1D) convolutional layers along the spatial and temporal dimensions, which are concatenated without a non-linear activation layer between. In this paper, we investigate an alternative encoding that operates a bi-dimensional (2D) spatiotemporal convolution. While 2D convolutions are numerically identical to two concatenated 1D convolutions along the two dimensions, the impact on learning is still uncertain. We test 1D and 2D CNNs and a CNN+transformer hybrid model in a low-dimensional (3-channel) and a high-dimensional (22-channel) BCI motor imagery classification task. We observe that 2D convolutions significantly reduce training time in high-dimensional tasks while maintaining performance. We investigate the root of this improvement and find no difference in spectral feature importance. However, a clear pattern emerges in representational similarity across models: 1D and 2D models yield vastly different representational geometries. Overall, we suggest an improved model with a 2D convolutional layer for faster training and inference. We also highlight the importance of architecturally-driven encoding when processing complex multivariate signals, as reflected in internal representations rather than purely in performance metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's repeated claim that 2D convolutions are numerically identical to concatenated 1D ones is false, which undercuts how we should read their results on representational differences and training speed.

read the letter

The paper runs a direct comparison of 1D versus 2D spatiotemporal convolutions on two BCI motor imagery datasets, one with 3 channels and one with 22. In the higher-dimensional case the 2D models train faster while matching performance, and the authors show that the internal representations end up with quite different geometries even when spectral feature importance looks similar. That geometry observation is the clearest new piece here; it is not just another accuracy table and it points to something worth checking in other multivariate signal tasks. The work also includes a hybrid CNN-transformer variant for context, which is a reasonable addition. The experiments are straightforward and the negative result on spectral features helps narrow down explanations. The central problem is the assertion that 2D convolutions are numerically identical to two concatenated 1D convolutions without an activation between them. That is not true. The composition of separate temporal and spatial 1D convolutions can only produce rank-1 separable kernels, while a genuine 2D kernel has no such restriction and can represent non-separable space-time filters. Because the architectures therefore differ in expressivity, the reported differences in training time and representational similarity could simply reflect that mismatch rather than any special advantage of 2D encoding per se. The paper does not appear to test or even acknowledge this distinction. There are also no error bars, no statistical tests on the performance numbers, and no details on data splits or hyperparameter search, so the quantitative claims stay provisional. Model capacities are not explicitly matched either, which adds another possible confound. This is the sort of paper that would interest people already building or tuning CNNs for EEG or similar high-channel time series. The training-time note could be practically useful and the geometry analysis might prompt follow-up work, but it does not change how most labs would approach the problem. I would send it to peer review. The core experiment is clean enough to be worth referee time, the equivalence claim is a fixable misstatement rather than a load-bearing error in the data, and the representational similarity result is independent enough to stand on its own once the architecture point is clarified.

Referee Report

3 major / 3 minor

Summary. The paper compares 1D and 2D convolutional layers in shallow CNNs (and a CNN+transformer hybrid) for EEG-based motor imagery classification on low-dimensional (3-channel) and high-dimensional (22-channel) BCI datasets. It asserts that 2D spatiotemporal convolutions are numerically identical to concatenated 1D temporal-then-spatial convolutions without intervening activations, yet observes that 2D models reduce training time in high-dimensional tasks while preserving accuracy, produce distinct representational geometries (via similarity analysis), and show no difference in spectral feature importance. The authors recommend 2D convolutions for efficiency and emphasize architecturally driven encoding effects visible in internal representations.

Significance. If the empirical patterns hold after correcting the architectural premise, the work would usefully illustrate how convolution dimensionality influences optimization speed and learned representations in multivariate time-series tasks, beyond raw accuracy. The representational similarity analysis is a positive element that moves discussion past performance metrics alone. However, the absence of error bars, statistical tests, and controlled capacity comparisons limits the strength of the efficiency and geometry claims.

major comments (3)

[Abstract] Abstract and introduction: the repeated claim that '2D convolutions are numerically identical to two concatenated 1D convolutions along the two dimensions' is incorrect. The composition of independent 1D temporal and 1D spatial convolutions yields only rank-1 (separable) effective kernels, whereas a general 2D kernel can realize non-separable space-time filters. This expressivity gap means the observed differences in training time and representational geometry cannot be unambiguously attributed to 'spatiotemporal encoding' or dimensionality per se; they may simply reflect the greater capacity of the 2D model. The central recommendation for 2D layers therefore rests on an unexamined premise.
[Results] Experimental section (results on high-dimensional task): no error bars, no statistical significance tests, and no description of data splits, cross-validation procedure, or hyperparameter search are reported. Without these, the claims of 'maintaining performance' and 'significantly reduce training time' cannot be evaluated reliably, especially given the reader's note on partial support for the performance and geometry observations.
[Representational similarity analysis] Representational similarity analysis: the assertion of 'vastly different representational geometries' between 1D and 2D models requires quantitative controls for the capacity difference identified above (e.g., matching effective kernel rank or parameter count). Without such controls, it is unclear whether the geometry divergence is caused by the convolution type or by the architectural mismatch.

minor comments (3)

Clarify the precise 1D and 2D model architectures, including kernel sizes, whether any non-linearity is placed between the temporal and spatial 1D layers, and how the 2D kernel is initialized and regularized.
Provide details on the CNN+transformer hybrid architecture and its relative performance to isolate the contribution of the convolutional front-end.
Add a brief discussion of how the 2D model could be made parameter-matched to the 1D baseline (e.g., via depthwise-separable 2D convolutions) to strengthen the causal attribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which has helped us clarify key distinctions in our work and strengthen the experimental reporting. We address each major comment below, with revisions planned for the manuscript.

read point-by-point responses

Referee: [Abstract] the repeated claim that '2D convolutions are numerically identical to two concatenated 1D convolutions along the two dimensions' is incorrect. The composition of independent 1D temporal and 1D spatial convolutions yields only rank-1 (separable) effective kernels, whereas a general 2D kernel can realize non-separable space-time filters. This expressivity gap means the observed differences in training time and representational geometry cannot be unambiguously attributed to 'spatiotemporal encoding' or dimensionality per se; they may simply reflect the greater capacity of the 2D model.

Authors: We acknowledge the imprecision in our original statement. Sequential 1D convolutions without an intervening non-linearity are indeed equivalent to separable (rank-1) 2D kernels, while standard 2D convolutions support non-separable filters and thus greater expressivity. We have revised the abstract and introduction to correct this description, explicitly noting the capacity difference and that observed efficiency gains and representational distinctions may partly stem from it. The empirical benefits for high-dimensional EEG tasks remain, and we now frame the recommendation around both efficiency and the ability to learn joint spatiotemporal patterns. revision: yes
Referee: [Results] no error bars, no statistical significance tests, and no description of data splits, cross-validation procedure, or hyperparameter search are reported. Without these, the claims of 'maintaining performance' and 'significantly reduce training time' cannot be evaluated reliably.

Authors: We agree that the experimental details were incomplete. The revised manuscript now reports error bars (standard deviation over multiple random seeds), includes statistical significance tests (paired t-tests on accuracy and training time), and fully describes the data splits (following BCI competition subject-specific protocols), cross-validation procedure, and hyperparameter search. These additions support reliable evaluation of the performance and efficiency claims. revision: yes
Referee: [Representational similarity analysis] the assertion of 'vastly different representational geometries' between 1D and 2D models requires quantitative controls for the capacity difference identified above (e.g., matching effective kernel rank or parameter count). Without such controls, it is unclear whether the geometry divergence is caused by the convolution type or by the architectural mismatch.

Authors: We accept the need for capacity controls. In the revision we add experiments that match parameter counts between 1D and 2D models (by scaling filter numbers) and include comparisons to explicitly separable 2D kernels. The representational similarity analysis still indicates distinct geometries linked to the spatiotemporal convolution choice, which we now discuss with these controls in place. revision: yes

Circularity Check

0 steps flagged

No significant circularity; experimental comparison on external benchmarks

full rationale

The paper conducts direct empirical tests of 1D vs. 2D CNN architectures (plus a hybrid) on standard BCI motor-imagery datasets, measuring training time, accuracy, spectral importance, and representational similarity via independent metrics. No derivation chain reduces claims to fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatzes smuggled from prior author work. The central observations about representational geometries and training-time differences are obtained from the experiments themselves rather than from any self-referential construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions of deep learning (gradient descent finds useful minima, convolutional layers extract local patterns) plus the untested premise that the two architectures have comparable capacity. No new entities are postulated.

free parameters (1)

network depth, width, and learning-rate schedule
Typical hyperparameters that must be chosen or tuned for each architecture; their values are not reported in the abstract.

axioms (1)

domain assumption Convolutional layers with identical receptive fields extract comparable features regardless of whether they are factored into 1D or kept as 2D.
Invoked when the authors treat the numerical equivalence of 1D+1D and 2D as a baseline for comparing learning dynamics.

pith-pipeline@v0.9.0 · 5544 in / 1373 out tokens · 35164 ms · 2026-05-07T16:15:47.209539+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 1 canonical work pages · 1 internal anchor

[1]

An automated system for epilepsy detection using EEG brain signals based on deep learning approach,

I. Ullah, M. Hussain, E.-u.-H. Qazi, and H. Aboalsamh, “An automated system for epilepsy detection using EEG brain signals based on deep learning approach,” vol. 107, pp. 61–71
[2]

A novel deep learning approach for classifi- cation of EEG motor imagery signals,

Y . R. Tabar and U. Halici, “A novel deep learning approach for classifi- cation of EEG motor imagery signals,”Journal of Neural Engineering, vol. 14, no. 1, p. 016003, Nov. 2016

2016
[3]

EEG-based deep learning model for the automatic detection of clinical depression,

P. P. Thoduparambil, A. Dominic, and S. M. Varghese, “EEG-based deep learning model for the automatic detection of clinical depression,” Physical and Engineering Sciences in Medicine, vol. 43, no. 4, pp. 1349– 1360, December 2020

2020
[4]

Performance Analysis of Deep Learning Models for Detection of Autism Spectrum Disorder from EEG Signals,

M. Radhakrishnan, K. Ramamurthy, K. K. Choudhury, D. Won, and T. A. Manoharan, “Performance Analysis of Deep Learning Models for Detection of Autism Spectrum Disorder from EEG Signals,”Traitement du Signal, vol. 38, no. 3, pp. 853–863, June 2021

2021
[5]

Deep learning-based EEG analysis to classify normal, mild cognitive impairment, and dementia: Algorithms and dataset,

M.-j. Kim, Y . C. Youn, and J. Paik, “Deep learning-based EEG analysis to classify normal, mild cognitive impairment, and dementia: Algorithms and dataset,”NeuroImage, vol. 272, p. 120054, May 2023

2023
[6]

SleepTransformer: Automatic Sleep Staging With Interpretability and Uncertainty Quantification,

H. Phan, K. Mikkelsen, O. Y . Ch´en, P. Koch, A. Mertins, and M. De V os, “SleepTransformer: Automatic Sleep Staging With Interpretability and Uncertainty Quantification,”IEEE Transactions on Biomedical Engi- neering, vol. 69, no. 8, pp. 2456–2467, August 2022

2022
[7]

Brain decoding: Toward real-time reconstruction of visual perception,

Y . Benchetrit, H. Banville, and J.-R. King, “Brain decoding: Toward real-time reconstruction of visual perception,” March 2024

2024
[8]

A Survey on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers,

X. Zhang, L. Yao, X. Wang, J. Monaghan, D. McAlpine, and Y . Zhang, “A Survey on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers,”Journal of Neural Engineering, vol. 1, no. 1, 2016

2016
[9]

Deep learning with convolutional neural networks for EEG decoding and visualization,

R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for EEG decoding and visualization,”Human Brain Mapping, vol. 38, no. 11, pp. 5391– 5420, 2017

2017
[10]

EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces,

V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces,”Journal of Neural Engineering, vol. 15, no. 5, p. 056013, October 2018

2018
[11]

EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,

Y . Song, Q. Zheng, B. Liu, and X. Gao, “EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710– 719, 2023

2023
[12]

Peeling the Onion of Brain Representations,

N. Kriegeskorte and J. Diedrichsen, “Peeling the Onion of Brain Representations,”Annual Review of Neuroscience, vol. 42, pp. 407–432
[13]

A Shared Vision for Machine Learning in Neuroscience,

M.-A. T. Vu, T. Adalı, D. Ba, G. Buzs ´aki, D. Carlson, K. Heller, C. Liston, C. Rudin, V . S. Sohal, A. S. Widge, H. S. Mayberg, G. Sapiro, and K. Dzirasa, “A Shared Vision for Machine Learning in Neuroscience,”Journal of Neuroscience, vol. 38, no. 7, pp. 1601–1607
[14]

Relational inductive biases, deep learning, and graph networks,

P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez-Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V . Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y . Li, and R. Pascanu, “Relational inductive b...
[15]

Relational inductive biases, deep learning, and graph networks

[Online]. Available: http://arxiv.org/abs/1806.01261

work page internal anchor Pith review arXiv
[16]

Electroencephalog- raphy Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques,

A. Chaddad, Y . Wu, R. Kateb, and A. Bouridane, “Electroencephalog- raphy Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques,”Sensors (Basel, Switzerland), vol. 23, no. 14, p. 6434, July 2023

2023
[17]

Emotion Recognition based on EEG using LSTM Recurrent Neural Network,

S. Alhagry, A. Aly, and R. A., “Emotion Recognition based on EEG using LSTM Recurrent Neural Network,”International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017

2017
[18]

Zhang and L

X. Zhang and L. Yao,Deep Learning for EEG-Based Brain-Computer Interfaces: Representations, Algorithms and Applications. WORLD SCIENTIFIC (EUROPE), October 2021

2021
[19]

Epileptic Seizure Detection Based on EEG Signals and CNN,

M. Zhou, C. Tian, R. Cao, B. Wang, Y . Niu, T. Hu, H. Guo, and J. Xiang, “Epileptic Seizure Detection Based on EEG Signals and CNN,”Frontiers in Neuroinformatics, vol. 12, December 2018

2018
[20]

EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks,

T. Song, W. Zheng, P. Song, and Z. Cui, “EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks,”IEEE Trans- actions on Affective Computing, vol. 11, no. 3, pp. 532–541, July 2020

2020
[21]

Cross-Subject EEG Emotion Recognition With Self-Organized Graph Neural Network,

J. Li, S. Li, J. Pan, and F. Wang, “Cross-Subject EEG Emotion Recognition With Self-Organized Graph Neural Network,”Frontiers in Neuroscience, vol. 15, June 2021

2021
[22]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

2016
[23]

MOABB: Trustworthy algorithm bench- marking for BCIs,

V . Jayaram and A. Barachant, “MOABB: Trustworthy algorithm bench- marking for BCIs,”Journal of Neural Engineering, vol. 15, no. 6, p. 066011, September 2018

2018
[24]

Review of the BCI Competition IV,

M. Tangermann, K.-R. M ¨uller, A. Aertsen, N. Birbaumer, C. Braun, C. Brunner, R. Leeb, C. Mehring, K. J. Miller, G. R. M ¨uller-Putz, G. Nolte, G. Pfurtscheller, H. Preissl, G. Schalk, A. Schl¨ogl, C. Vidaurre, S. Waldert, and B. Blankertz, “Review of the BCI Competition IV,” Frontiers in Neuroscience, vol. 6, p. 55, 2012

2012
[25]

MEG and EEG data analysis with MNE-Python,

A. Gramfort, “MEG and EEG data analysis with MNE-Python,”Fron- tiers in Neuroscience, vol. 7, 2013

2013
[26]

What is the fast Fourier transform?

W. Cochran, J. Cooley, D. Favin, H. Helms, R. Kaenel, W. Lang, G. Maling, D. Nelson, C. Rader, and P. Welch, “What is the fast Fourier transform?”Proceedings of the IEEE, vol. 55, no. 10, pp. 1664–1674, October 1967

1967
[27]

Representational similarity analysis - connecting the branches of systems neuroscience,

N. Kriegeskorte, M. Mur, and P. A. Bandettini, “Representational similarity analysis - connecting the branches of systems neuroscience,” Frontiers in Systems Neuroscience, vol. 2, November 2008

2008
[28]

Similarity of Neural Network Representations Revisited,

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of Neural Network Representations Revisited,” inProceedings of the 36th International Conference on Machine Learning. PMLR, May 2019, pp. 3519–3529

2019