arxiv: 2605.13133 · v1 · submitted 2026-05-13 · 💻 cs.LG · eess.SP

Recognition: unknown

KAST-BAR: Knowledge-Anchored Semantically-Dynamic Topology Brain Autoregressive Modeling for Universal Neural Interpretation

Haoning Wang , Wenchao Yang , Shuai Shen , Yang Li

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:23 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords EEGfoundation modelbrain topologysemantic alignmentknowledge integrationneural decodingautoregressive model

0 comments

The pith

KAST-BAR builds an EEG foundation model that aligns brain signals with medical knowledge using dynamic topology modeling and achieves better results on six tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KAST-BAR to address limitations in EEG foundation models, specifically inadequate modeling of brain topology and the gap between physiological signals and textual semantics. It uses a dual-stream hierarchical attention encoder to capture non-Euclidean brain structures, a knowledge-anchored profiler to create semantic profiles from signals, and a refiner to update representations with expert queries. Large-scale pre-training on 21 datasets allows integration of medical knowledge, resulting in improved performance on six downstream tasks. This matters for developing more effective tools for brain signal interpretation in various applications.

Core claim

KAST-BAR dynamically aligns physiological representations from multi-level brain topology with an expert-level semantic space through its Dual-Stream Hierarchical Attention encoder that models local temporal dynamics alongside global spatial contexts, a Knowledge-Anchored Semantic Profiler that synthesizes physically-grounded textual profiles, and a Semantic Text-Aware Refiner that reconstructs EEG representations using Latent Expert Queries. After pre-training on 21 diverse datasets, the model integrates expert medical knowledge into EEG signals and delivers superior performance across six downstream tasks.

What carries the argument

Dual-Stream Hierarchical Attention encoder combined with Knowledge-Anchored Semantic Profiler and Semantic Text-Aware Refiner to capture brain topology and align with semantic space.

Load-bearing premise

The proposed encoders and profilers accurately capture brain topology and align signals with semantics without introducing artifacts or overfitting to the training data.

What would settle it

Evaluating the model on an independent EEG dataset not included in the 21 pre-training sets and finding no performance advantage over prior methods would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2605.13133 by Haoning Wang, Shuai Shen, Wenchao Yang, Yang Li.

**Figure 1.** Figure 1: Holistic evaluation of KAST-BAR against leading methods across six diverse EEG datasets. 1. Introduction The emergence of foundation models has reshaped the interpretation paradigm of physiological signals, particularly large-scale electroencephalograph (EEG) pre-trained models, which have demonstrated unprecedented potential in decoding brain neural activity by mining massive unlabeled EEG data (Jiang … view at source ↗

**Figure 2.** Figure 2: Overview of the KAST-BAR framework, featuring a “Topological Encoding - Knowledge Anchoring - Guided Refinement” pipeline. (1) DSHA: Encodes the non-Euclidean EEG manifold via bidirectional local-global interactions to produce topology-enhanced tokens. (2) KASP: Synthesizes signal-correlated medical profiles to anchor the model in expert clinical priors. (3) STAR: Leverages these profiles to guide Latent E… view at source ↗

**Figure 3.** Figure 3: Architecture of the DSHA Mechanism. Unlike unidirectional approaches, DSHA introduces a interacted dual-stream structure based on the Brain Topology Hierarchy. It synergizes a Global Refinement Stream and a Local Context Stream through iterative Cross-Scale Attention. The resulting features are fused and quantized to align continuous EEG signals with the discrete textual vocabulary. 2.2. Knowledge-Anchore… view at source ↗

**Figure 4.** Figure 4: Training dynamics of KAST-BAR. (a) Reconstruction loss comparison between DSHA and baselines (VQ, THVQ); (b) CPT loss during autoregressive pre-training, comparing full models (Base/Large) with ablation variants (w/o STAR, w/o KASP). Occipital: O1, O2… Visual Perception Temporal: T7, T8… Emotion Processing Frontal: Fp1, Fp2… Cognitive Regulation KASP Output: Semantic Profile Keywords STAR Experts Attention… view at source ↗

**Figure 5.** Figure 5: Visualization on SEED (Emotion). Driven by VisualEmotional semantic priors (bottom), latent experts (top) specifically cluster around Frontal, Temporal and Occipital regions, demonstrating knowledge-guided topological alignment. perts to actively aggregate features based on comprehensive medical insights. In the SEED emotion recognition task ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization on HMC (Sleep). In contrast to SEED, latent experts (top) here adaptively shift focus to Frontal, Central and Occipital regions, anchored by Sleep-specific semantic priors (bottom) such as spindles and delta waves. 5. Conclusion In this paper, we introduced KAST-BAR, a foundation model synergizing dynamic topological perception with knowledge-driven reasoning. By integrating the DSHA encoder… view at source ↗

**Figure 7.** Figure 7: Illustration of the 5-scale Brain Topology Hierarchy (BTH). 16 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Schematic of the Knowledge-Anchored Semantic Profiler (KASP). The module transforms multi-task EEG inputs into rich textual descriptions through three stages: (1) individualized feature extraction to verbalize spatiotemporal characteristics; (2) structured prompt construction; and (3) LLM-based synthesis. The resulting Semantic Profiles contain expert-level medical priors and signal statistics to guide dow… view at source ↗

**Figure 9.** Figure 9: The structured prompt template used in KASP. Different sections are highlighted with distinct colors to indicate their semantic roles (e.g., orange for instructions, blue for data context, green for features). E.2. Metrics for Multi-class Classification For multi-class tasks such as sleep staging (HMC) and emotion recognition (SEED), we utilize the following metrics. 1. Balanced Accuracy (B-Acc): In the mu… view at source ↗

**Figure 10.** Figure 10: Training and fine-tuning dynamics of KAST-BAR. (a) EEG tokenizer reconstruction loss, highlighting the superiority of DSHA over VQ and THVQ; (b) Autoregressive pre-training (CPT) loss comparison; (c) Multi-task fine-tuning metrics, where the KAST-BAR-Large model demonstrates faster adaptation in Fine-tune Loss (left) and lower Validation Perplexity (right) compared to the KAST-BAR-Base model. (a) All Expe… view at source ↗

**Figure 11.** Figure 11: Visualization of spatial attention weights for STAR experts on (a) SEED (Emotion Recognition) and (b) HMC (Sleep Staging). The distinct patterns—focusing on temporal-occipital areas for emotion versus frontal-central areas for sleep—verify the model’s capability to capture task-specific topological dependencies. improvements (+1.2% on TUEV and +1.7% on HMC), which still falls significantly short of KAST-B… view at source ↗

**Figure 12.** Figure 12: Visualization of spatial attention weights on (a) TUAB (Abnormal Detection) and (b) TUEV (Epilepsy Events). The experts demonstrate distributed attention patterns corresponding to pathological features and transient epileptic events across different brain regions. rather than being hindered by larger capacity, the scaled-up backbone—synergized with our semantic alignment—more efficiently captures complex … view at source ↗

**Figure 13.** Figure 13: Visualization of spatial attention weights on (a) TUSL (Slowing Event) and (b) Workload Assessment. The heatmaps highlight the topological regions most relevant to detecting background slowing and estimating cognitive load levels. becomes focal and lateralized to pinpoint pathological slowing events. These distinct topological reconfigurations strongly evidence that KAST-BAR goes beyond implicit alignment… view at source ↗

**Figure 14.** Figure 14: Visualization of attention heatmaps for 16 latent experts across six datasets: (a) SEED, (b) HMC, (c) TUAB, (d) TUEV, (e) TUSL, and (f) Workload. The y-axis represents the latent experts, and the x-axis represents the EEG channels, with color intensity indicating the magnitude of attention weights. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

read the original abstract

While EEG foundation models have shown significant potential in universal neural decoding across tasks, their advancement remains constrained by the inadequacy modeling of complex spatiotemporal topology, as well as the inherent modality gap between low-level physiological signals and high-level textual semantics. To address these challenges, we propose a Knowledge-Anchored Semantically-Dynamic Topology Brain Autoregressive Model (KAST-BAR), which dynamically aligns physiological representations derived from multi-level brain topology with an expert-level semantic space. Specifically, we design a Dual-Stream Hierarchical Attention (DSHA) encoder that accurately captures the brain's intrinsic non-Euclidean topology by modeling local temporal dynamics with global spatial contexts. On this basis, a Knowledge-Anchored Semantic Profiler (KASP) is proposed to synthesize physically-grounded and instance-level textual profiles, which subsequently drive a Semantic Text-Aware Refiner (STAR) to dynamically reconstruct EEG representations using Latent Expert Queries. By conducting large-scale pre-training on 21 diverse datasets to build a foundation model, KAST-BAR effectively integrates expert-level medical knowledge into EEG signal representations, consistently achieving superior performance across six downstream tasks. Our code is available at https://github.com/KAST-BAR/KAST-BAR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KAST-BAR sketches a semantic-anchoring architecture for EEG autoregressive models but supplies no numbers or ablations to show the new modules actually help.

read the letter

The paper's main contribution is a three-part architecture that models brain topology with Dual-Stream Hierarchical Attention, builds instance-level text profiles via Knowledge-Anchored Semantic Profiler, and refines representations with Semantic Text-Aware Refiner. Pre-training on 21 datasets is a reasonable scale for an EEG foundation model attempt, and the stated goal of bridging raw signals to expert medical semantics is a clear direction worth exploring.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes KAST-BAR, a brain autoregressive foundation model for EEG signals that integrates expert medical knowledge via three new modules: a Dual-Stream Hierarchical Attention (DSHA) encoder to capture non-Euclidean spatiotemporal topology, a Knowledge-Anchored Semantic Profiler (KASP) to generate instance-level textual profiles from physiological signals, and a Semantic Text-Aware Refiner (STAR) that uses latent expert queries to dynamically align low-level EEG representations with high-level semantics. The central claim is that large-scale pre-training on 21 diverse datasets produces a model that consistently outperforms prior approaches on six downstream tasks.

Significance. If the performance claims and knowledge-integration mechanism are substantiated with rigorous evidence, the work could advance EEG foundation modeling by explicitly bridging the modality gap between raw physiological signals and textual medical semantics, offering a template for knowledge-anchored autoregressive architectures in neural decoding.

major comments (2)

[Abstract] Abstract: the claim of 'consistently achieving superior performance across six downstream tasks' after pre-training on 21 datasets is presented without any quantitative metrics, baseline comparisons, ablation results, or error analysis, so the data-to-claim link cannot be evaluated.
[Method] Method description: no details are supplied on the training objective, the loss terms that enforce knowledge anchoring between DSHA/KASP/STAR, the exact composition or preprocessing of the 21 datasets, or how non-Euclidean topology is encoded, leaving open the possibility that reported gains arise from standard autoregressive pre-training rather than the claimed semantic alignment.

minor comments (2)

[Abstract] Abstract: the sentence 'dynamically aligns physiological representations derived from multi-level brain topology with an expert-level semantic space' uses vague phrasing; specify the exact alignment mechanism and any regularization used to prevent overfitting.
[Abstract] The GitHub link is provided but no reproducibility checklist or hyperparameter table appears in the manuscript; add these to support verification of the pre-training setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify our work. We address each major comment point-by-point below, agreeing that certain aspects of the presentation require strengthening for better evaluation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'consistently achieving superior performance across six downstream tasks' after pre-training on 21 datasets is presented without any quantitative metrics, baseline comparisons, ablation results, or error analysis, so the data-to-claim link cannot be evaluated.

Authors: We agree that the abstract is a high-level summary and lacks specific quantitative metrics, which limits immediate evaluation of the claims. The full manuscript provides these details in the Experiments section, including performance tables with baseline comparisons, ablation studies isolating the contribution of each module, and error analyses across tasks. To address the concern directly, we will revise the abstract to include key quantitative highlights such as average accuracy improvements and statistical significance over baselines. revision: yes
Referee: [Method] Method description: no details are supplied on the training objective, the loss terms that enforce knowledge anchoring between DSHA/KASP/STAR, the exact composition or preprocessing of the 21 datasets, or how non-Euclidean topology is encoded, leaving open the possibility that reported gains arise from standard autoregressive pre-training rather than the claimed semantic alignment.

Authors: We acknowledge that the current method description could be more explicit on these points to rule out alternative explanations for the gains. The manuscript describes the DSHA encoder for non-Euclidean topology via hierarchical graph attention in Section 3.1, the combined autoregressive and semantic alignment losses (including contrastive terms for KASP/STAR anchoring) in Section 3.3, and the 21 datasets with preprocessing in Table 1 and Appendix A. However, to strengthen the link to semantic alignment, we will add a dedicated subsection on the full training objective with explicit loss equations, move key dataset composition details into the main text, and include further ablations comparing against pure autoregressive baselines without the knowledge-anchoring components. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with standard pre-training results

full rationale

The paper proposes DSHA+KASP+STAR components to align EEG topology with textual semantics and reports superior downstream performance after pre-training on 21 datasets. No mathematical derivation chain, first-principles prediction, or fitted parameter renamed as output is present in the provided text. Claims rest on experimental outcomes rather than any self-referential reduction (e.g., no equation where a 'prediction' equals a training fit by construction, and no load-bearing self-citation of uniqueness theorems). This is a standard empirical foundation-model paper whose central results are externally falsifiable via the released code and datasets.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 3 invented entities

The central claim rests on the effectiveness of three newly introduced modules and the generalization power of pre-training on 21 datasets, with domain assumptions about brain topology and semantic alignment that lack independent falsification in the provided abstract.

free parameters (1)

Model hyperparameters and training settings
Attention dimensions, layer counts, and optimization parameters are optimized during pre-training on the 21 datasets to produce the reported performance.

axioms (2)

domain assumption Brain signals possess intrinsic non-Euclidean spatiotemporal topology that hierarchical attention can accurately capture.
Invoked to justify the DSHA encoder design.
domain assumption Physiological EEG representations can be aligned with expert-level textual semantics via knowledge-anchored profiling.
Core premise underlying KASP and STAR components.

invented entities (3)

Dual-Stream Hierarchical Attention (DSHA) encoder no independent evidence
purpose: Capture local temporal dynamics together with global spatial contexts in brain topology
Newly proposed component to address non-Euclidean structure.
Knowledge-Anchored Semantic Profiler (KASP) no independent evidence
purpose: Synthesize physically-grounded instance-level textual profiles from EEG signals
Introduced to bridge modality gap with medical knowledge.
Semantic Text-Aware Refiner (STAR) no independent evidence
purpose: Dynamically reconstruct EEG representations using Latent Expert Queries
New mechanism for semantic-aware refinement of signals.

pith-pipeline@v0.9.0 · 5528 in / 1622 out tokens · 52974 ms · 2026-05-14T19:23:44.022160+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

universal neural representations

doi: 10.1109/SPMB.2017.8257018. Wang, G., Liu, W., He, Y ., Xu, C., Ma, L., and Li, H. EEGPT: Pretrained transformer for universal and reli- able representation of EEG signals. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. Wang, J., Zhao, S., Luo, Z., Zhou, Y ., Jiang, H., Li, S., Li, T., and Pan, G. CBramod: A cris...

work page doi:10.1109/spmb.2017.8257018 2017
[2]

Topological Mismatch

leverage Transformer architectures, treating EEG signals as flattened 1D sequences or 2D time-frequency images, and utilize Masked Autoencoder (MAE) or Autoregressive (AR) objectives to capture long-range dependencies. Although these methods have verified the Scaling Laws in neural data, they inherently neglect the non-Euclidean properties of brain signal...

work page 2026
[3]

Brain Invaders

Brain Topology Hierarchy (BTH) Specification: Adhering to the 5-scale BTH architecture in THD-BAR (Yang et al., 2025), we organize electrodes into a coarse-to-fine hierarchy (B1 →B 5) based on spatial proximity and anatomical divisions, as illustrated in Figure 7. Specifically,Level 1 (Whole Brain, B1)treats all channels as a single global unit to capture...

work page 2025
[4]

Detailed network architecture specifications are provided in Table 7

Vector Quantizer Configuration.We employ the standard VQ-V AE (Van Den Oord et al., 2017) mechanism to discretize the multi-channel continuous EEG signals output by the DSHA encoder. Detailed network architecture specifications are provided in Table 7

work page 2017
[5]

Zoom-out

DSHA Structure: Dual-Stream Interaction Mechanism.Going beyond simple unidirectional multi-scale feature aggregation, DSHA adopts an explicitBidirectional Progressive Interaction Strategyto address the dilution of local details or the absence of global context. Specifically, the model maintains two parallel processing streams: • Global Refinement Stream:I...

work page
[6]

Dataset Name

KASP Feature Operators ( Φ): Mathematical Formulation and Rationale.The KASP module utilizes a set of deterministic operators to extract robust physical features. Based on the logic defined in the main text, the detailed formulations are as follows: • Temporal Statistical Operator (ϕstat):Corresponding to the signal amplitude distribution modeling mention...

work page arXiv
[7]

Active Perception

STAR: Construction Logic and Design PhilosophyThe design of the Semantic Text-Aware Refiner (STAR) is rooted in the cognitive principle of“Active Perception”. • Why use “Latent Experts”?EEG data streams are variable in length and extremely long, whereas LLMs require condensed inputs of fixed length. We initialize a fixed set of learnable vectors (Qlat) to...

work page
[8]

The model is trained using the AdamW optimizer ( lr= 5e −5) with a batch size of 128 for 50 epochs

Stage 1: Self-Supervised Reconstruction Configuration.We adopt a patch size of 200 (corresponding to 1 second). The model is trained using the AdamW optimizer ( lr= 5e −5) with a batch size of 128 for 50 epochs. Detailed model hyperparameters and training configurations are presented in Table 7. As shown in Figure 10(b), the training losses converge withi...

work page
[9]

This explicit separation ensures that the LLM can distinguish between the static knowledge context and dynamic signal tokens

Stage 2: Joint Autoregressive Pre-training Configuration.We construct the hybrid sequence S input to the BAR model using the following format: [BOS] KASP_Profile (S text) [SEP] STAR_Summary (S sem) [SEP] EEG_Tokens (SEEG ) [EOS] . This explicit separation ensures that the LLM can distinguish between the static knowledge context and dynamic signal tokens. ...

work page
[10]

This ensures that the general EEG-text alignment capabilities acquired during pre-training are preserved

Multi-task Instruction Tuning Strategy.To adapt the foundation model to specific downstream tasks while mitigating catastrophic forgetting, we employ aDecoupled Update Strategy: • Adapter Decoupling:We freeze the pre-trained LoRA adapter or merge it into the backbone, and initialize a new, task-specific LoRA adapter for the Supervised Fine-Tuning (SFT) st...

work page
[11]

Compared to pre-training, we use a smaller global Batch Size (64) to accommodate the GPU memory overhead of task- specific gradients

Hyperparameter Configuration.Table 9 summarizes the hyperparameters used for downstream fine-tuning in detail. Compared to pre-training, we use a smaller global Batch Size (64) to accommodate the GPU memory overhead of task- specific gradients. A cosine learning rate schedule with a warm-up ratio of 0.1 is adopted to stabilize the initial adaptation proce...

work page
[12]

Label-Free

Training Dynamics and Loss Curves.Figure 10(c) illustrates the training dynamics during the SFT stage. The training loss exhibits a rapid decline within the first 2-3 epochs, significantly faster than in the pre-training stage. This rapid convergence validates the efficacy of our pre-trained representations. Concurrently, the validation perplexity (dashed...

work page
[13]

It effectively mitigates the bias introduced by skewed class distributions

Balanced Accuracy (B-Acc):B-Acc is defined as the arithmetic mean of the recall of the positive class and that of the negative class. It effectively mitigates the bias introduced by skewed class distributions. B-Acc= 1 2 T P T P+F N + T N T N+F P (22) whereT P, T N, F P,andF Ndenote True Positives, True Negatives, False Positives, and False Negatives, res...

work page
[14]

It is calculated as the area under the curve plotting the True Positive Rate (TPR) against the False Positive Rate (FPR)

Area Under the Receiver Operating Characteristic Curve (AUROC):AUROC quantifies the generalization ability of the model across all classification thresholds. It is calculated as the area under the curve plotting the True Positive Rate (TPR) against the False Positive Rate (FPR). The value ranges from 0.5 to 1, with a higher value indicating better discrim...

work page
[15]

AUC-PR focuses specifically on the quality of positive predictions

Area Under the Precision-Recall Curve (AUC-PR):In scenarios with extreme class imbalance (where positive samples are rare), AUROC may overestimate performance. AUC-PR focuses specifically on the quality of positive predictions. It is the area under the curve plotting Precision against Recall. Precision= T P T P+F P ,Recall= T P T P+F N (23) Table 9.Hyperp...

work page
[16]

Temporal Stats (ϕ stat): Mean 0.12, Std 14.5, Energy 250.4

work page
[17]

Spectral Features (ϕ spec): Mean Peak Freq 2.5Hz, Delta Power 0.65

work page
[18]

Dataset Task Description

Spatial Features (ϕ spat): Channel T3 (Left Temporal) shows ... [Analysis Requirements] 1.Dataset Task Description: Describe the general experimental paradigm. 2.Task-Related Prior Knowledge: List relevant neuroscience background. 3.Signal Physical Features: Objectively describe time, frequency, and spatial features. [Output Format] Please respond strictl...

work page
[19]

B-Acc= 1 C CX i=1 Recalli (24) whereCis the total number of classes, and Recall i is the recall for classi

Balanced Accuracy (B-Acc):In the multi-class setting, B-Acc is defined as the macro-average of the recall scores for all classes. B-Acc= 1 C CX i=1 Recalli (24) whereCis the total number of classes, and Recall i is the recall for classi

work page
[20]

κ= po −p e 1−p e (25) wherep o represents the observed agreement (accuracy), andp e is the expected agreement by chance

Cohen’s Kappa Coefficient (κ):Cohen’s Kappa measures the agreement between the model’s predictions and the ground truth labels, correcting for agreement occurring by chance. κ= po −p e 1−p e (25) wherep o represents the observed agreement (accuracy), andp e is the expected agreement by chance

work page
[21]

General Model

Weighted F1-Score (F1-W):To balance precision and recall while accounting for the support of each class, we employ the Weighted F1-Score. It is calculated as the weighted sum of per-class F1-scores, where the weight wi corresponds to the 23 KAST-BAR: Brain Autoregressive Modeling for Universal Neural Interpretation Table 10.Example of KASP-generated Seman...

work page 2018
[22]

Advantage of Cross-scale Feature Capture:CSBrain introduces a Cross-scale Spatiotemporal Tokenization (CST) mechanism, designed to explicitly aggregate local high-frequency transients and global low-frequency rhythms. For specific spectral anomalies in the TUSL task (such as slowing waves), this multi-scale inductive bias is more effective at precisely lo...

work page
[23]

negative transfer

Negative Transfer in Multi-Task Learning:TUSL represents a highly specific clinical anomaly detection task, with a data distribution vastly different from tasks like emotion recognition or cognitive workload. In unified multi-task modeling, forcibly optimizing these semantically conflicting tasks simultaneously may introduce gradient interference, leading...

work page