pith. sign in

arxiv: 2606.00129 · v1 · pith:QKYXTTYGnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

A Shared Valence Axis Across Modern LLMs and Human EEG: The Saturation Regularity

Pith reviewed 2026-06-29 08:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords valence axisLLM EEG alignmentsaturation regularityemotion decodingresidual ensemblingaffective computingbrain decoding
0
0 comments X

The pith

A valence direction from nine sentences in LLMs projects linearly onto EEG and is rediscovered by classifiers, yet alignment training distorts the saturated direction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a one-dimensional valence direction called the V-axis from modern LLMs using nine emotion-evocative sentences and validates its stability through zero-shot sentiment transfer and consistency across fourteen models. This direction maps onto EEG recordings from 123 subjects viewing affective videos via a simple linear projection, and multiple EEG classifiers trained only on task labels independently recover the same direction inside their feature spaces. Attempts to align LLM and EEG representations with twenty-five different supervision strategies, including distillation and contrastive losses, produce no gains and often lower accuracy. The authors formalize this outcome as the saturation regularity, in which task labels already place the decoder at the target direction so extra gradients mainly warp the basin while the within-class residual receives little signal. Gains instead arise from ensembling models that differ in their residual subspaces, delivering a 10.5 percent lift in balanced accuracy on FACED that replicates on SEED-V.

Core claim

The V-axis is a stable one-dimensional direction extracted from LLMs with nine sentences that linearly tracks stimulus valence in EEG and is spontaneously recovered by EEG emotion classifiers; twenty-five alignment methods fail because task labels saturate the target direction, leaving the within-class residual as the source of further improvement through residual ensembling.

What carries the argument

The V-axis, a one-dimensional valence direction built from nine emotion-evocative sentences in LLMs, which acts as the shared structure that aligns with EEG features and is rediscovered by classifiers.

If this is right

  • The V-axis transfers zero-shot to sentiment benchmarks and remains consistent across fourteen LLMs.
  • A single linear projection of EEG features tracks the V-axis position of each affective stimulus.
  • Thirty-six EEG emotion classifiers trained without reference to the V-axis rediscover the same direction in their internal representations.
  • Twenty-five alignment strategies including knowledge distillation, representational similarity, contrastive, and topographic losses fail to improve or actively reduce decoding accuracy.
  • Ensembling across residual diversity improves balanced accuracy by 10.5 percent on FACED and replicates on SEED-V.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The saturation regularity suggests that once a target direction is reachable by labels alone, further multimodal supervision may systematically underperform compared with methods that preserve residual variation.
  • Residual ensembling could be tested on other brain-decoding tasks where a low-dimensional target direction is already accessible from task labels.
  • The within-class residual subspace may encode subject-specific or stimulus-specific emotional nuance that standard supervision overlooks.

Load-bearing premise

The V-axis built from nine specific sentences forms a stable valence representation whose linear projection onto EEG captures the same structure independent of sentence selection or model choice.

What would settle it

A replication in which a new set of nine sentences or a different group of LLMs yields a V-axis whose projection fails to track EEG stimulus valence or is not recovered inside the trained classifiers.

Figures

Figures reproduced from arXiv: 2606.00129 by Kilichbek Haydarov, Mohamed Elhoseiny, Xuhui Liu, Yousef A. Radwan, Yuqian Fu.

Figure 1
Figure 1. Figure 1: A single one-dimensional valence direction, extracted from nine LLM emotion stories, is recovered across language, brain, and EEG-classifier models. (a) Zero-shot probe AUC / |r| of the V-axis (PC1 of nine Qwen3-4B story-class centroids at the penultimate layer) on standard text-sentiment benchmarks; no labels are seen during axis construction. (b) Stimulus-level scatter of LLM V-axis projection vs. cohort… view at source ↗
Figure 2
Figure 2. Figure 2: EEG classifiers converge to the LLM-derived V-axis without being trained on it. (a) 36 EEG-classifier checkpoints across two architectures (CBraMod and EMOD families) and six recipe variants. x-axis: absolute correlation between class-PC1 of the penultimate features and the LLM V-axis (no V-axis loss used in training). y-axis: FACED 9-class balanced accuracy. Linear fit r=+0.885 (p=7.8×10−13 , n=36). (b) M… view at source ↗
Figure 3
Figure 3. Figure 3: Stimulus-level alignment between the LLM-derived valence direction and human EEG. Each point is one of the 28 FACED video stimuli; x is the LLM V-axis projection of the textual stimulus description, y is the single ridge-regression prediction of that projection from all 160 cohort channel–band features (32 channels × 5 frequency bands, averaged over 123 subjects and over each clip). Cohort Pearson r=+0.87 … view at source ↗
Figure 4
Figure 4. Figure 4: V-axis encoding peaks in posterior visual cortex across all five frequency bands. (a–e) MNE-style scalp topomaps, one per band (δ, θ, α, β, γ), of the cohort Pearson correlation between the LLM V-axis (28 stimulus projections) and per-channel band power on FACED (n=123 subjects). Diverging colourmap: positive (red) vs. negative (blue) correlation; symmetric limits across bands. Gold circles mark the top ch… view at source ↗
Figure 5
Figure 5. Figure 5: Binary pole-vs-pole AUC across 20 unrelated concepts (sorted descending), plus the [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cross-concept transfer matrix (best-orientation AUC). Diagonal: self-classification (white [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cross-lingual V-axis recovery. Direct causal-Qwen transfer collapses (English-bias); mT0- [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-LLM brain-prediction quality. Top-tier 7 LLMs all exceed the 95th percentile of their [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Frontal-alpha asymmetry replicates in direction with smaller magnitude than posterior [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The cohort V-axis is a 9-stimulus contrast and a between-subject phenomenon. Top: [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Time-resolved V-axis encoding across the 30-second clip. [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: V-axis-aligned channels form a coordinated network with F7 as frontal hub. Left: scalp [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Forest plot of all V-axis-as-supervision interventions (Table 9) ranked by [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: ∆BACC from V-axis supervision plotted against base-recipe BACC across ∼25 interven￾tions. The sign of the effect transitions in the interval [0.62, 0.66]. percentile of this null (pone = 0.066). We treat the within-class residual (Dresid ≈ Dm − 9) as the statistically robust signal. Class-PC1 basin saturation. The 10 SOTA-pool checkpoints exhibit a saturated class-PC1 V-axis range: |r| ∈ [0.60, 0.77] acro… view at source ↗
Figure 15
Figure 15. Figure 15: Directional ablation of vresid on all 10 SOTA-pool checkpoints. Top: per-checkpoint BACC drops (red) compared to matched-norm random-direction ablation (grey); the V-axis effect is well above the null on every checkpoint (mean z ≈ 7.7, all p < 0.001). Bottom: per-checkpoint ablation ∆BACC vs. within-class residual encoding |r|; the population-mean drop (∆ = −0.0157) confirms the residual-encoding mechanis… view at source ↗
Figure 16
Figure 16. Figure 16: Recipe ablation cascade from the EMOD baseline at [PITH_FULL_IMAGE:figures/full_fig_p036_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Two-tier ensemble theory. Per-checkpoint within-class V-axis residual [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Two-tier ensemble gain across four benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p037_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: FACED 9-class test confusion. Left: best single checkpoint (BACC [PITH_FULL_IMAGE:figures/full_fig_p039_19.png] view at source ↗
read the original abstract

Large language models (LLMs) have emerged as powerful representation learners whose internal features increasingly align with human cognition. We study whether modern LLMs can serve as a lens for understanding neural representations in the human brain, focusing on emotional valence in EEG. We first build a one-dimensional valence direction, the V-axis, from modern LLMs using only nine emotion-evocative sentences. We validate it through zero-shot transfer to sentiment benchmarks and cross-model consistency across fourteen LLMs. We then show that this LLM-derived direction maps onto human neural activity. On a public EEG cohort of 123 subjects watching affective videos, a single linear projection on EEG features tracks the V-axis position of each stimulus. Moreover, 36 EEG emotion classifiers trained without exposure to the V-axis spontaneously rediscover the same direction in their internal representations, suggesting that the same valence structure emerges in both language models and human electrophysiology. Yet this convergence does not provide an effective training signal. We test twenty-five alignment strategies, including knowledge distillation, representational similarity, contrastive, and topographic losses; none improve decoding, and sixteen significantly reduce accuracy. We formalize this result as the saturation regularity: once task labels alone drive a brain-decoding network onto the target direction, additional supervision mainly distorts an already-saturated basin, while the load-bearing within-class residual receives little useful gradient. This regularity also indicates where improvement should come from: the residual subspace unreachable by supervision. Motivated by this insight, we ensemble across residual diversity rather than supervising the basin, improving balanced accuracy by 10.5% over the prior best on FACED, with the same effect replicated on SEED-V.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that a one-dimensional valence direction (V-axis) constructed from nine emotion-evocative sentences in LLMs shows zero-shot sentiment transfer and consistency across 14 LLMs; this direction linearly maps to EEG features from 123 subjects watching affective videos; 36 label-trained EEG classifiers spontaneously rediscover the same direction; 25 alignment strategies (distillation, similarity, contrastive, topographic) fail to improve and often reduce accuracy; this is formalized as the saturation regularity (task labels saturate the target direction while within-class residuals receive little gradient); ensembling across residual diversity yields a 10.5% balanced-accuracy gain over prior best on FACED, replicated on SEED-V.

Significance. If the results hold, the work provides large-scale empirical evidence (14 LLMs, 123 subjects, 36 classifiers, 25 strategies, two datasets) for a shared valence structure between modern LLMs and human EEG, plus a practical insight that further supervision is ineffective once labels saturate the direction and that residual ensembling can improve decoding. The scale of the experiments is a clear strength.

major comments (3)
  1. [V-axis construction (Methods)] V-axis construction (Methods section describing sentence selection): The direction is derived from exactly nine specific emotion-evocative sentences with validation limited to zero-shot transfer and cross-model consistency across 14 LLMs. No ablation on sentence choice, synonym substitution, or alternative affective prompts is reported. This choice is load-bearing for the saturation regularity, the EEG linear-projection claim, and the 10.5% gain, because a different nine-sentence set could produce a less stable direction whose rediscovery by classifiers and resistance to alignment losses would not hold.
  2. [Results (EEG mapping and ensembling)] Statistical reporting (Results on EEG mapping, classifier rediscovery, and 10.5% gain): The manuscript reports consistent results across 123 subjects and 36 classifiers but provides no visible error bars, exact statistical tests (e.g., p-values or confidence intervals for the 10.5% improvement), or full subject exclusion criteria. This leaves the central saturation claim and the improvement only moderately supported.
  3. [Saturation regularity formalization] Formalization of saturation regularity (section introducing the regularity): The claim that task labels saturate the target direction and that additional supervision distorts an already-saturated basin while the within-class residual receives little gradient is presented as an empirical observation from the 25 alignment runs rather than derived from a specific equation or gradient analysis. A more precise definition tying the observation to the linear projection or loss landscape would be needed to make the regularity load-bearing rather than descriptive.
minor comments (1)
  1. [Methods] The linear projection weights from EEG features to the V-axis are listed among free parameters; a brief clarification of how they are obtained (e.g., via regression on stimulus V-axis positions) would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, proposing revisions where the points identify gaps in the current manuscript.

read point-by-point responses
  1. Referee: [V-axis construction (Methods)] V-axis construction (Methods section describing sentence selection): The direction is derived from exactly nine specific emotion-evocative sentences with validation limited to zero-shot transfer and cross-model consistency across 14 LLMs. No ablation on sentence choice, synonym substitution, or alternative affective prompts is reported. This choice is load-bearing for the saturation regularity, the EEG linear-projection claim, and the 10.5% gain, because a different nine-sentence set could produce a less stable direction whose rediscovery by classifiers and resistance to alignment losses would not hold.

    Authors: The manuscript does not report ablations on sentence selection or synonym substitution. The nine sentences were chosen to span a valence range using established affective norms, with robustness evidenced by zero-shot sentiment transfer and consistency across 14 LLMs. We will add a supplementary note discussing the rationale for sentence choice and explicitly acknowledging the lack of ablation studies as a limitation, but new experiments on alternative sets are not feasible within the current scope. revision: partial

  2. Referee: [Results (EEG mapping and ensembling)] Statistical reporting (Results on EEG mapping, classifier rediscovery, and 10.5% gain): The manuscript reports consistent results across 123 subjects and 36 classifiers but provides no visible error bars, exact statistical tests (e.g., p-values or confidence intervals for the 10.5% improvement), or full subject exclusion criteria. This leaves the central saturation claim and the improvement only moderately supported.

    Authors: We agree that the current statistical reporting is insufficient. The revision will add error bars to all figures, report exact p-values and confidence intervals for the 10.5% gain and other key metrics, and expand the Methods section with full subject exclusion criteria and preprocessing details. revision: yes

  3. Referee: [Saturation regularity formalization] Formalization of saturation regularity (section introducing the regularity): The claim that task labels saturate the target direction and that additional supervision distorts an already-saturated basin while the within-class residual receives little gradient is presented as an empirical observation from the 25 alignment runs rather than derived from a specific equation or gradient analysis. A more precise definition tying the observation to the linear projection or loss landscape would be needed to make the regularity load-bearing rather than descriptive.

    Authors: The saturation regularity is presented as an empirical observation from the alignment experiments. In revision we will add a more precise definition, including a brief mathematical characterization that ties the saturation effect to the linear projection geometry and the gradient dynamics under task-label supervision. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper constructs the V-axis empirically from nine LLM sentences, validates it via zero-shot transfer and cross-model checks on independent data, then reports separate empirical observations on an EEG cohort (linear projection tracking, 36 classifiers rediscovering the direction without exposure to V-axis, and 25 alignment strategies failing to improve decoding). The saturation regularity is formalized directly from these test outcomes rather than any equation that defines it by construction from fitted parameters or prior self-citations. The residual-ensemble improvement is a downstream method motivated by the observations but not equivalent to them. No load-bearing self-citation, ansatz smuggling, or renaming of known results appears in the derivation; the chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claim rests on the empirical construction of the V-axis from nine sentences and the observation that 25 alignment strategies produce no gain. Free parameters include sentence selection and linear projection fitting. Domain assumptions include one-dimensional valence and linear comparability across modalities. No invented entities carry independent evidence outside the reported experiments.

free parameters (2)
  • Selection of nine emotion-evocative sentences
    Determines the V-axis direction; specific choice may affect downstream alignment and saturation results.
  • Linear projection weights from EEG features to V-axis
    Fitted or chosen to map EEG onto the LLM-derived direction.
axioms (2)
  • domain assumption Emotional valence is adequately captured by a single linear axis in both LLM embeddings and EEG signals.
    Invoked when constructing the V-axis and when claiming EEG projections track it.
  • domain assumption Task-label supervision alone is sufficient to reach the valence direction without additional alignment losses.
    Underlies the saturation regularity claim.
invented entities (2)
  • V-axis no independent evidence
    purpose: One-dimensional valence direction shared between LLMs and EEG.
    Introduced as the central construct; no external falsifiable prediction supplied.
  • Saturation regularity no independent evidence
    purpose: Formal description of why extra alignment harms performance.
    Derived from the 25-strategy experiments; no independent evidence outside this study.

pith-pipeline@v0.9.1-grok · 5856 in / 1517 out tokens · 33043 ms · 2026-06-29T08:20:45.636641+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

    arXiv:2306.03341. Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. InAdvances in Neural Information Processing Systems, 2022a. arXiv:2205.05638. Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recogni...

  2. [2]

    Saturation (FACED-9, EMOD backbone, V-axis target)

    URLhttps://transformer-circuits.pub/2026/emotions/index.html. Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Netanel Y Tamir, Lucy Chai, Simon Kornblith, Trevor Darrell, and Phillip Isola. When does perceptual alignment benefit vision representations? InAdvances in Neural Information Processing Systems, 2024. arXiv:2410.10817. Yonglong Tian, Dilip K...

  3. [3]

    Rewrite the following so it preserves all emotional content but uses different words and sentence structures: {story}

    Institutional review board (IRB) approvals or equivalent for research with human subjects 20 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...

  4. [4]

    Theta-gamma cross-frequency coupling does not mediate the V-axis(Section 7, Ap- pendix S9): Tort modulation indexρ(PAC,Vr) = +0.082,p= 0.667

  5. [5]

    linear p= 0.020

    Mutual-information control is not significant(Section 6): MI p= 0.115 vs. linear p= 0.020. The V-axis-EEG relationship is essentially linear. 39

  6. [6]

    Within-class residual ( r= +0.74 , p= 0.014) is the statistically robust signal

    Random-direction null on cross-arch correlation(Section 4): r= +0.885 at the 93.5th percentile of a 1000-direction null, pone = 0.066 . Within-class residual ( r= +0.74 , p= 0.014) is the statistically robust signal

  7. [7]

    Per-subject V-axis adaptation does not transfer.The per-subject best-channel oracle reaches |r|= 0.62 , but no clean estimator transfers more than+0.001 over the cohort top-K

  8. [8]

    cross-arch0.6791(n.s.)

    Cross-architecture ensembling does not add diversity.Mixing d∈ {4,6,8,10} check- points gives within-arch0.6798vs. cross-arch0.6791(n.s.)

  9. [9]

    Mega-ensembles plateau at 10 checkpoints.15-, 20-, 25-checkpoint pools all hit BACC ≤0.6948(Appendix S13)

  10. [10]

    Stimulus-aggregation re-ranking does not help.Operating at the 28-stimulus level instead of trial level gives BACC=1.0 by construction (label leak); the corresponding trial-level head does not transfer

  11. [11]

    Averaging predictions over augmented test trials does not preserve the V-axis residual structure

    Test-time augmentation at K=5 does not improve the ensemble( 0.6753→0.6720 ). Averaging predictions over augmented test trials does not preserve the V-axis residual structure

  12. [12]

    Toxicity: recipe scope.The 9-prompt PCA recipe attains AUC 0.59 on Jigsaw toxic comments, below the 17/20 working concepts in our library (Appendix S4); toxicity appears to be encoded in a higher-rank subspace than the late-layer PC1

  13. [13]

    Arousal asymmetry: vision yes, text and brain no.OASIS arousal r= 0.803 in vision; ≤13% NRC arousal recovery in text; r∈[0.18,0.41] brain alignment across LLMs (Appendix S6)

  14. [14]

    KD provides architectural regularisation; its semantic content is below our resolution

    LLM-content of KD is irrelevant at this scale.Replacing the 9-D LLM-derived class prototypes with random orthonormal 9-D directions changes BACC by ≤0.003 . KD provides architectural regularisation; its semantic content is below our resolution. S19 Reproducibility Release commitment.Code, configs, model checkpoints, and figure-generation scripts will be r...