Recognition: unknown
Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks
Pith reviewed 2026-05-08 17:06 UTC · model grok-4.3
The pith
Non-semisimple Jordan blocks realize a distance-modulated phase basis for relative positional encoding
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The non-semisimple Jordan block structure for the relative translation operator produces oscillatory-polynomial features such as e to the minus gamma d times cos of omega d, the same times sin, and the lag d times each of those, for causal lag d. This exactly realizes the distance-modulated phase basis d times e to the i omega d when the block is defective, and the authors give the exact one-parameter representation, its real block form, and the required contragredient query map.
What carries the argument
The defective complex Jordan block that unites the rotary eigenvalue with the nilpotent response inside the generator of the relative positional operator
Load-bearing premise
The non-semisimple Jordan structure supplies an inductive bias for coupled phase-distance interactions that cannot be obtained by simply combining existing rotary and additive channels.
What would settle it
A direct-sum model that applies RoPE for phase and an additive bias for distance would achieve equal performance to Jordan-RoPE on the synthetic task built around distance-modulated phase interactions.
Figures
read the original abstract
Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-\gamma d}\cos(\omega d)$, $e^{-\gamma d}\sin(\omega d)$, $d e^{-\gamma d}\cos(\omega d)$, and $d e^{-\gamma d}\sin(\omega d)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{i\omega d}$, rather than merely adding a separate distance channel to RoPE. We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Jordan-RoPE, a relative positional encoding obtained from non-semisimple representations of complex Jordan blocks. It derives an exact one-parameter operator whose real block form produces coupled oscillatory-polynomial features (e.g., d e^{-γd} cos(ωd), d e^{-γd} sin(ωd)) for causal lag d, formulates the required contragredient query action, and distinguishes this from stabilized variants that break the exact group law. Kernel diagnostics and a synthetic language-modeling task demonstrate utility when the target contains distance-modulated phase interactions; on a small WikiText-103 byte-level model a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The authors frame the contribution as structural rather than a broad performance claim.
Significance. If the non-semisimple Jordan structure supplies an inductive bias for distance-modulated phases that cannot be recovered by direct-sum combinations of existing channels, the construction supplies a principled, parameter-light extension of rotary encodings. The algebraic derivation of the exact operator and its real block form is cleanly executed from the defective Jordan block, and the synthetic task provides direct structural support for the coupled-feature claim. These elements constitute the primary strengths of the work.
major comments (2)
- [§3] §3 (exact Jordan-RoPE construction): the central claim that the defective block realizes a distance-modulated phase basis d e^{iωd} 'rather than merely adding a separate distance channel' is algebraically correct, yet the manuscript provides no expressivity comparison or ablation against a parameterized linear combination of RoPE and an independent distance bias with matching degrees of freedom; the synthetic task in §4.1 matches the Jordan form by construction and therefore does not rule out equivalent logit functions from other parameterizations.
- [§4.2] §4.2 (WikiText-103 experiment): results are reported exclusively for the scaled-exact variant, which §3.3 explicitly states breaks the exact group law; no performance numbers are given for the pure exact Jordan-RoPE operator despite its role as the load-bearing theoretical object, weakening the link between the algebraic construction and the empirical evidence.
minor comments (3)
- [§2] §2 (related work): the discussion of group-theoretic views of translation-invariant encodings would benefit from an explicit pointer to the specific representation-theoretic references that motivate the non-semisimple case.
- [Figure 2] Figure 2 (kernel diagnostics): the plotted attention logits would be clearer if the x-axis were labeled with explicit lag values and if the curves for different (ω, γ) pairs were distinguished by line style as well as color.
- [Notation] Notation throughout: the symbol E_p for the positional embedding matrix is introduced without an immediate reminder of its dimension and the precise action of the contragredient map on the query side; a one-line definition or small matrix example would remove ambiguity.
Simulated Author's Rebuttal
We thank the referee for the thorough review and for highlighting both the algebraic strengths of the Jordan-RoPE construction and the areas where additional evidence would strengthen the manuscript. We address each major comment below and describe the revisions we will implement.
read point-by-point responses
-
Referee: [§3] §3 (exact Jordan-RoPE construction): the central claim that the defective block realizes a distance-modulated phase basis d e^{iωd} 'rather than merely adding a separate distance channel' is algebraically correct, yet the manuscript provides no expressivity comparison or ablation against a parameterized linear combination of RoPE and an independent distance bias with matching degrees of freedom; the synthetic task in §4.1 matches the Jordan form by construction and therefore does not rule out equivalent logit functions from other parameterizations.
Authors: We agree that an explicit comparison would better isolate the contribution of the coupled structure. The defective Jordan block produces a specific non-semisimple coupling (the nilpotent term multiplies the oscillatory factor inside the same representation) that is not equivalent to an arbitrary linear combination of independent RoPE and distance channels; the latter would require separate parameters and would not satisfy the same group law. Nevertheless, we acknowledge that the synthetic task is constructed around the Jordan features and therefore cannot alone rule out other parameterizations. In the revision we will add an ablation that compares Jordan-RoPE against a parameterized linear combination of RoPE plus an independent distance bias with matched degrees of freedom, both on the synthetic task and on kernel diagnostics. This will clarify whether the coupled basis supplies an inductive bias beyond what separate channels can achieve. revision: partial
-
Referee: [§4.2] §4.2 (WikiText-103 experiment): results are reported exclusively for the scaled-exact variant, which §3.3 explicitly states breaks the exact group law; no performance numbers are given for the pure exact Jordan-RoPE operator despite its role as the load-bearing theoretical object, weakening the link between the algebraic construction and the empirical evidence.
Authors: We accept this criticism. The scaled-exact variant was used in the main WikiText-103 runs for numerical stability, since the exact operator’s nilpotent component produces unbounded shear for large lags. To tighten the connection between the theoretical object and the experiments, we will add performance numbers for the pure exact Jordan-RoPE on the synthetic language-modeling task (where distances are controlled) and, where feasible, on the byte-level WikiText-103 model (with appropriate truncation or reduced scale). We will also expand the discussion in §3.3 and §4.2 to explain the stability trade-off and to state explicitly that the scaled variant is a practical approximation rather than a replacement for the exact construction. revision: yes
Circularity Check
No circularity: Jordan-RoPE is a direct algebraic construction from non-semisimple blocks
full rationale
The paper defines Exact Jordan-RoPE explicitly as the action of a defective complex Jordan block containing both a rotary eigenvalue and a nilpotent component. The resulting features (d e^{-γd} cos(ωd), d e^{-γd} sin(ωd), etc.) are obtained by direct matrix exponentiation or linear action on the lag d, without any parameter fitting, data-dependent calibration, or reduction to a prior result via self-citation. The group-theoretic motivation is external; the central operator is not defined in terms of its own outputs, nor is any uniqueness theorem imported from the authors' previous work. The WikiText and synthetic experiments are downstream evaluations, not part of the derivation chain. This is a self-contained mathematical construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- complex eigenvalue and nilpotent coefficient
axioms (1)
- standard math Existence of a non-semisimple one-parameter representation of the additive group of integers
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , year=
Attention is All You Need , author=. Advances in Neural Information Processing Systems , year=
-
[2]
Proceedings of NAACL-HLT , year=
Self-Attention with Relative Position Representations , author=. Proceedings of NAACL-HLT , year=
-
[3]
and Salakhutdinov, Ruslan , booktitle=
Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , booktitle=. Transformer-
-
[4]
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding , author=. arXiv preprint arXiv:2104.09864 , year=
work page internal anchor Pith review arXiv
-
[5]
International Conference on Learning Representations , year=
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , author=. International Conference on Learning Representations , year=
-
[6]
A length-extrapolatable transformer
A Length-Extrapolatable Transformer , author=. arXiv preprint arXiv:2212.10554 , year=
-
[7]
Extending Context Window of Large Language Models via Positional Interpolation
Extending Context Window of Large Language Models via Positional Interpolation , author=. arXiv preprint arXiv:2306.15595 , year=
work page internal anchor Pith review arXiv
-
[8]
Peng, Bowen and Quesnelle, Jeffrey and Fan, Honglu and Shippole, Enrico , journal=
-
[9]
LongRoPE: Extending
Ding, Yiran and Zhang, Li Lyna and Zhang, Chengruidong and Xu, Yuanyuan and Shang, Ning and Xu, Jiahang and Yang, Fan and Yang, Mao , journal=. LongRoPE: Extending
-
[10]
Advances in Neural Information Processing Systems , year=
Algebraic Positional Encodings , author=. Advances in Neural Information Processing Systems , year=
-
[11]
and Langlotz, Curtis , booktitle=
Ostmeier, Sophie and Axelrod, Brian and Varma, Maya and Moseley, Michael and Chaudhari, Akshay S. and Langlotz, Curtis , booktitle=
-
[12]
2026 , month=
Using Group Theory to Explore the Space of Positional Encodings for Attention , author=. 2026 , month=
2026
-
[13]
arXiv preprint arXiv:2512.07805 , year=
Group Representational Position Encoding , author=. arXiv preprint arXiv:2512.07805 , year=
-
[14]
International Conference on Learning Representations , year=
Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=
-
[15]
Advances in Neural Information Processing Systems , year=
Diagonal State Spaces are as Effective as Structured State Spaces , author=. Advances in Neural Information Processing Systems , year=
-
[16]
International Conference on Machine Learning , year=
Hyena Hierarchy: Towards Larger Convolutional Language Models , author=. International Conference on Machine Learning , year=
-
[17]
International Conference on Learning Representations , year=
Pointer Sentinel Mixture Models , author=. International Conference on Learning Representations , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.