Recognition: 2 theorem links
· Lean TheoremGroup Representational Position Encoding
Pith reviewed 2026-05-17 00:00 UTC · model grok-4.3
The pith
GRAPE models positions as group actions on features, recovering RoPE and ALiBi exactly while adding low-cost extensions for cross-feature coupling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRAPE derives positional encodings from group actions: a position n acts via G(n) = exp(n ω L) with L a rank-2 skew-symmetric matrix for the multiplicative case, yielding a relative compositional map in SO(d); this recovers RoPE exactly when the d/2 planes are canonical coordinate pairs with log-uniform eigenvalues. For the additive case, rank-1 or low-rank unipotent actions in GL produce logit biases that recover ALiBi and FoX exactly while preserving relative properties and streaming cacheability. Learned commuting subspaces and compact non-commuting mixtures extend the geometry to capture cross-subspace coupling at O(d) and O(r d) cost per head respectively.
What carries the argument
The group action map G(n) realized either as the matrix exponential exp(n ω L) of a rank-2 skew-symmetric generator in SO(d) or as a low-rank unipotent element in GL that adds a bias to logits.
If this is right
- Any choice of group generator produces an encoding that is exactly relative and compositional.
- RoPE is recovered precisely when the generators act on fixed coordinate planes with log-uniform spectrum.
- ALiBi and FoX arise exactly from rank-1 unipotent actions that add relative logit biases.
- Learned commuting subspaces extend the geometry at O(d) cost per head while preserving closed-form evaluation.
- Compact non-commuting mixtures allow richer cross-subspace coupling at O(r d) cost per head.
Where Pith is reading between the lines
- The same group-action lens could be applied to other sequence models that currently use hand-designed positional biases.
- Closed-form matrix exponentials for low-rank generators may simplify hardware kernels for very long contexts.
- Differences in empirical behavior between RoPE and ALiBi may trace directly to the algebraic properties of their underlying groups.
- The framework suggests testing whether optimizing the generator spectrum alone, without learned subspaces, already improves length generalization.
Load-bearing premise
The learned commuting subspaces and non-commuting mixtures will produce useful feature coupling in practice without creating new optimization difficulties.
What would settle it
Train identical long-context language models that differ only in replacing standard RoPE with a GRAPE version using learned non-commuting mixtures, then measure whether perplexity on held-out long sequences improves, stays flat, or degrades.
Figures
read the original abstract
We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n) = \exp(n \, \omega \, \mathbf{L})$ with a rank-2 skew-symmetric generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise from rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Overall, GRAPE provides a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project page: https://github.com/model-architectures/GRAPE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces GRAPE, a group-theoretic framework for positional encodings that unifies multiplicative rotations in SO(d) (recovering RoPE exactly via rank-2 skew-symmetric generators and matrix exponentials) with additive logit biases from unipotent actions in GL (recovering ALiBi and FoX). It proposes extensions via learned commuting subspaces at O(d) cost and compact non-commuting mixtures at O(r d) cost to capture cross-subspace feature coupling while preserving closed-form exponentials, relative compositionality, and norm preservation.
Significance. If the algebraic recoveries and preservation properties hold, GRAPE supplies a principled Lie-group design space for long-context positional geometry that subsumes existing methods as exact special cases rather than approximations. The exact algebraic identities for RoPE/ALiBi and the emphasis on relative, compositional, streaming-cacheable maps are strengths; however, the practical value of the learned extensions hinges on whether they deliver measurable gains without new optimization issues.
major comments (2)
- [Abstract / Multiplicative GRAPE extensions] Abstract and the section on Multiplicative GRAPE extensions: the assertion that learned non-commuting mixtures 'strictly extend this geometry' while 'preserving an exact relative law' and 'closed-form matrix exponential' lacks an explicit derivation showing that the effective generator remains skew-symmetric (or that the map stays exactly norm-preserving and compositional) when the subspaces fail to commute; without this, the unification risks being a reparametrization whose extra degrees of freedom do not guarantee the claimed properties.
- [Section on learned commuting subspaces and non-commuting mixtures] Section on learned commuting subspaces and non-commuting mixtures: the O(d) and O(r d) cost claims and the statement that these capture 'useful cross-subspace feature coupling' are presented without accompanying optimization analysis or ablation results demonstrating absence of performance regressions or increased non-convexity when jointly optimizing the generators L with model weights.
minor comments (2)
- [Multiplicative GRAPE definition] Clarify the precise construction of the rank-2 skew-symmetric generator L and the log-uniform spectrum choice that recovers RoPE as an exact special case (include the relevant equation).
- [Additive GRAPE] Add a short remark on how the unipotent actions in Additive GRAPE ensure streaming cacheability is preserved exactly when recovering ALiBi.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on the GRAPE framework. We address each major comment below with clarifications grounded in the manuscript's algebraic constructions and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract / Multiplicative GRAPE extensions] Abstract and the section on Multiplicative GRAPE extensions: the assertion that learned non-commuting mixtures 'strictly extend this geometry' while 'preserving an exact relative law' and 'closed-form matrix exponential' lacks an explicit derivation showing that the effective generator remains skew-symmetric (or that the map stays exactly norm-preserving and compositional) when the subspaces fail to commute; without this, the unification risks being a reparametrization whose extra degrees of freedom do not guarantee the claimed properties.
Authors: We thank the referee for this observation. In the non-commuting mixture construction, each subspace is equipped with its own skew-symmetric generator L_k supported on a low-dimensional block. The effective generator is the direct sum L = sum_k L_k. Because the vector space of skew-symmetric matrices is closed under addition, L remains skew-symmetric even when the individual L_k fail to commute. Consequently exp(n ω L) is orthogonal for all n, guaranteeing exact norm preservation. The relative law holds because exp((n+m) ω L) = exp(n ω L) exp(m ω L) for any fixed matrix L (scalar multiples of the same matrix always commute). The closed-form matrix exponential is unchanged. We will insert an explicit derivation of these three properties, including the verification that the Lie-algebra closure and the exponential homomorphism are preserved, in the revised manuscript. revision: yes
-
Referee: [Section on learned commuting subspaces and non-commuting mixtures] Section on learned commuting subspaces and non-commuting mixtures: the O(d) and O(r d) cost claims and the statement that these capture 'useful cross-subspace feature coupling' are presented without accompanying optimization analysis or ablation results demonstrating absence of performance regressions or increased non-convexity when jointly optimizing the generators L with model weights.
Authors: The stated complexities follow directly from the constructions: commuting subspaces admit simultaneous block-diagonalization, reducing the per-head cost to O(d) independent 2-by-2 rotations; non-commuting mixtures are represented via a rank-r collection of generators whose exponential can be evaluated with O(r d) matrix-vector operations. These are asymptotic operation counts for the forward pass, not training-time claims. The manuscript is primarily a theoretical unification; we therefore did not include joint-optimization ablations. The additional parameters in L are structured and low-dimensional, so their inclusion does not alter the convexity properties of the overall loss beyond those already present in standard transformer training. We will expand the cost derivations with explicit operation counts and add a short discussion of optimization considerations, while noting that comprehensive empirical ablation of training dynamics lies outside the current scope. revision: partial
Circularity Check
GRAPE derivation is algebraically self-contained with exact recoveries of RoPE and ALiBi as special cases.
full rationale
The paper constructs positional encodings directly from Lie-group actions: G(n) = exp(n ω L) for skew-symmetric L in SO(d) (Multiplicative GRAPE) and rank-1 unipotent actions in GL (Additive GRAPE). RoPE is recovered exactly when d/2 planes are canonical coordinate pairs with log-uniform spectrum; ALiBi and FoX are recovered as exact rank-1 unipotent special cases. These identities are algebraic, not data-driven fits. Learned commuting subspaces and non-commuting mixtures are introduced as mathematical extensions that preserve closed-form exponentials and relative compositionality by construction. No load-bearing step reduces to a fitted parameter, self-citation chain, or ansatz smuggled from prior work; the framework is self-contained against standard Lie-group mathematics.
Axiom & Free-Parameter Ledger
free parameters (1)
- rank-2 skew-symmetric generator L
axioms (2)
- standard math Matrix exponential of skew-symmetric generators yields elements of SO(d)
- standard math Unipotent actions in GL produce additive logit biases
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
G(n)=exp(nωL) with L=ab⊤−ba⊤ ∈so(d), Rodrigues formula exp(L)=I+(sin s/s)L+(1−cos s/s²)L², recovers RoPE on canonical planes with log-uniform spectrum.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Multi-subspace commuting sum LR oPE=∑θiLi, block-diagonal product of planar rotations.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks
Jordan-RoPE realizes a non-semisimple relative positional operator that produces coupled oscillatory-polynomial features such as d e^{i omega d} for causal query-key lags.
Reference graph
Works this paper leans on
-
[1]
Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos, Razvan Pascanu, and Petar Veliˇckovic. Round and round we go! what makes rotary positional encodings useful? In International Conference on Learning Representations (ICLR 2025),
work page 2025
-
[2]
URLhttps: //arxiv.org/abs/2410.06205. Also arXiv:2410.06205. Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150,
-
[4]
Extending Context Window of Large Language Models via Positional Interpolation
URL https://arxiv.org/abs/2306.15595. Ta-Chung Chi, Ting-Han Fan, Peter J Ramadge, and Alexander Rudnicky. Kerple: Kernelized rel- ative positional embedding for length extrapolation.Advances in Neural Information Processing Systems, 35:8386–8399, 2022a. Ta-Chung Chi, Ting-Han Fan, Alexander I Rudnicky, and Peter J Ramadge. Dissecting transformer length e...
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[6]
URLhttps://arxiv.org/abs/2405.18719. Brian C Hall. Lie groups, lie algebras, and representations. InQuantum Theory for Mathematicians, pages 333–366. Springer,
-
[7]
Adi Haviv, Ori Ram, Ofir Press, Peter Izsak, and Omer Levy. Transformer language models without positional encodings still learn positional information.arXiv preprint arXiv:2203.16634,
-
[8]
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention.arXiv preprint arXiv:2006.03654,
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[9]
Llm maybe longlm: Self-extend llm context window without tuning
Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, and Xia Hu. Llm maybe longlm: Self-extend llm context window without tuning. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), volume 235, pages 22099–22114. PMLR,
work page 2024
-
[10]
Rethinking positional encoding in language pre-training.arXiv preprint arXiv:2006.15595,
11 Published as a conference paper at ICLR 2026 Guolin Ke, Di He, and Tie-Yan Liu. Rethinking positional encoding in language pre-training.arXiv preprint arXiv:2006.15595,
-
[11]
Functional interpolation for relative positions improves long context transformers
Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, and Srinadh Bhojanapalli. Functional interpolation for relative positions improves long context transformers.arXiv preprint arXiv:2310.04418,
-
[13]
Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh
URLhttps://arxiv.org/ abs/2503.02130. Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh. Learning to encode position for transformer with continuous dynamical model. InInternational conference on machine learning, pages 6327–6335. PMLR,
-
[14]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019,
work page 2019
-
[15]
YaRN: Efficient Context Window Extension of Large Language Models
Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole. Yarn: Efficient context window extension of large language models.arXiv preprint arXiv:2309.00071,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press, Noah A Smith, and Mike Lewis. Train short, test long: Attention with linear biases enables input length extrapolation.arXiv preprint arXiv:2108.12409,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Linearized relative positional encoding.arXiv preprint arXiv:2307.09270,
Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, and Yiran Zhong. Linearized relative positional encoding.arXiv preprint arXiv:2307.09270,
-
[18]
Anian Ruoss, Gr ´egoire Del ´etang, Tim Genewein, Jordi Grau-Moya, R ´obert Csord ´as, Mehdi Ben- nani, Shane Legg, and Joel Veness. Randomized positional encodings boost length generalization of transformers.arXiv preprint arXiv:2305.16843,
-
[19]
Learning the ropes: Better 2d and 3d position encodings with string.arXiv preprint arXiv:2502.02562,
12 Published as a conference paper at ICLR 2026 Connor Schenck, Isaac Reid, Mithun George Jacob, Alex Bewley, Joshua Ainslie, David Rendle- man, Deepali Jain, Mohit Sharma, Avinava Dubey, Ayzaan Wahid, et al. Learning the ropes: Better 2d and 3d position encodings with string.arXiv preprint arXiv:2502.02562,
-
[20]
Self-Attention with Relative Position Representations
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representa- tions.arXiv preprint arXiv:1803.02155,
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
The curious case of absolute position embeddings.arXiv preprint arXiv:2210.12574,
Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau, Dieuwke Hupkes, and Adina Williams. The curious case of absolute position embeddings.arXiv preprint arXiv:2210.12574,
-
[22]
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Yuancheng Zhang, Shengfeng Pan, Shengyu Ge, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.arXiv preprint arXiv:2104.09864,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
A length-extrapolatable transformer.arXiv preprint arXiv:2212.10554,
Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, and Furu Wei. A length-extrapolatable transformer.arXiv preprint arXiv:2212.10554,
-
[24]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth ´ee Lacroix, Baptiste Rozi `ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023a. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Niko-...
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Length generalization of causal transformers without position encoding
Jie Wang, Tao Ji, Yuanbin Wu, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang, and Xiaoling Wang. Length generalization of causal transformers without position encoding. InFindings of the Association for Computational Linguistics: ACL 2024, pages 14024–14040, Bangkok, Thailand, August
work page 2024
-
[26]
doi: 10.18653/v1/2024.findings-acl
Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-acl
-
[27]
Ulme Wennberg and Gustav Eje Henter
URLhttps://aclanthology.org/2024.findings-acl.834/. Ulme Wennberg and Gustav Eje Henter. The case for translation-invariant self-attention in transformer-based language models.arXiv preprint arXiv:2106.01950,
-
[28]
Da-transformer: Distance-aware transformer
Chuhan Wu, Fangzhao Wu, and Yongfeng Huang. Da-transformer: Distance-aware transformer. arXiv preprint arXiv:2010.06925,
-
[29]
Effective long- context scaling of foundation models
Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, et al. Effective long-context scaling of foundation models.arXiv preprint arXiv:2309.16039,
-
[30]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. Songlin Yang, Yikang Shen, Kaiyue Wen, Shawn Tan, Mayank Mishra, Liliang Ren, Rameswar Panda, and Yoon Kim. Path attention: Position encoding via accumulating household...
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
13 Published as a conference paper at ICLR 2026 Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, et al. Dape: Data-adaptive positional encoding for length extrapolation.Advances in Neural Information Processing Systems, 37:26659–26700,
work page 2026
-
[32]
Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li. Pose: Efficient context window extension of llms via positional skip-wise training.arXiv preprint arXiv:2309.10400,
-
[33]
21 J.2 Multi-subspace GRAPE-M and RoPE
14 Published as a conference paper at ICLR 2026 Appendix A Related Work 16 B Application in Multi-Head Attention 17 C Forgetting Transformer as a Special Additive GRAPE 17 D Non-Commuting Multiplicative GRAPE 18 E Composition of Additive GRAPE and Multiplicative GRAPE 19 F Comparison with LieRE 19 G 2D and 3D GRAPE for Vision and Multimodal Position Encod...
work page 2026
-
[34]
pu,h Positional Embedding/Representation: A vector derived from token-local features, obtained via a linear projection followed by RMS normalization. A RELATEDWORK Positional information in Transformers mainly can be categorized into these classes: (a) absolute encodings (sinusoidal or learned) (Vaswani et al., 2017; Devlin et al., 2019; Neishi and Yoshin...
work page 2017
-
[35]
and with context-scaling procedures (Xiong et al., 2023; Chen et al., 2023; Peng et al., 2023; Zhu et al., 2023; Jin et al., 2024). Beyond 1D language modeling, 2D RoPE and variants adapt rotary encodings to 2D grids by applying rotations along spatial axes, and have been shown to improve high-resolution extrapolation in Vision Transformers and related vi...
work page 2023
-
[36]
designs separable, translation-invariant RoPE-style encodings that scale to 2D and 3D coordinates in vision and robotics settings (Ostmeier et al., 2025; Schenck et al., 2025).GRAPE-M identifies RoPE as commuting rank-2 exponentials inSO(d)and extends it to learned subspaces and compact non-commuting mixtures in closed form and a much faster way. Compared...
work page 2025
-
[37]
and related kernelized/randomized forms (Chi et al., 2022a;b; Li et al., 2023; Ruoss et al.,
work page 2023
-
[38]
are captured exactly by GRAPE-A as unipotent actions in the general linear group GLthat preserve the same relative law and streaming cacheability. Importantly,forgetting mech- anisms are additive: the Forgetting Transformer (FoX) implements a learnable per-head expo- nential decay in the attention logits and is a specific GRAPE-A / GRAPE-AP instance impos...
work page 2025
-
[39]
The headwise gatesft,h addO(1)parameters and negligible computation. Special cases and composition.Iff t,h ≡e −βh (constant per head), thenD ij,h =−β h(i−j) and FoX reduces to exact ALiBi (Section 4.2). More generally, FoX composes additively with the multiplicative (orthogonal) GRAPE acting on(q,k)as in Eq. (5.3), preserving norm-preservation of the rota...
work page 2026
-
[40]
The method then applies the matrix exponential of this generator to get a rotational position map
encode positional information by learning a skew-symmetric generator inSO(d). The method then applies the matrix exponential of this generator to get a rotational position map. For each attention head, the method learns one skew matrix. Its exponential gives a dense orthogonal operator on queries and keys. Positions then match elements of a one-parameter ...
work page 2026
-
[41]
This gives a clear way to impose axis-aligned or radial recency bias in vision and multimodal models
The update matrix then stays unipotent, and the exact relative composition law still holds. This gives a clear way to impose axis-aligned or radial recency bias in vision and multimodal models. H ALGORITHMICDETAILS ANDPSEUDOCODE This appendix contains the detailed pseudocode. Algorithm 1Commuting Multi-Subspace GRAPE-M Require:Q,K∈R B×L×H×d , orthogonalE∈...
work page 2026
-
[42]
Ifb=Ja(Section 2.4) and∥a∥= 1, thens= 1andθ=η
Corollary J.2(Phase bounds and orthogonality).The per-step rotation angle ofexp(ηL)onU equalsθ=ηsand satisfies0≤θ≤η∥a∥∥b∥, with equality whena⊥b. Ifb=Ja(Section 2.4) and∥a∥= 1, thens= 1andθ=η. Exponential spectrum.For anyn∈Z, σ exp(nL) ={e ±ins} ∪ {1}d−2. Henceρ(exp(nL)) = 1, the map is unitary (orthogonal), and all Lyapunov exponents are zero. Periodicit...
work page 2026
-
[43]
(4.7), letE:=e d+2e⊤ d+1 so thatA h =−β hE
Corollary J.5(ALiBi and Additive GRAPE(GRAPE-A) conditioning numbers).For the exact AL- iBi generator in Eq. (4.7), letE:=e d+2e⊤ d+1 so thatA h =−β hE. ThenG add,h(m) =I+mA h = I−m β h E=I+sEwiths=−m β h, and the only nontrivial singular values follow from Eq. (J.1). For the single-vector additive lift Eq. (4.1) withA= 0 u shift 0⊤ 0 and∥u shift∥= 1, the...
work page 2026
-
[44]
These bounds are conservative but dimension-free. In the canonical rank-1case of Lemma J.4 with∥A∥ 2 = 1, one has the sharper small-|s|behaviorσ max(I+sA) = 1 + |s| 2 +O(s 2)and σmin(I+sA) = 1− |s| 2 +O(s 2). Proof.Use the triangle inequality∥(I+sA)x∥ 2 ≤ ∥x∥ 2 +|s| ∥A∥ 2∥x∥2 and its reverse form applied to(I+sA) −1 =I−sA; see also Weyl inequalities for s...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.