arxiv: 2605.08833 · v1 · submitted 2026-05-09 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

Jinshuai Yang, Lixin Li, Mengqi Li, Wensheng Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:34 UTC · model grok-4.3

classification 💻 cs.AI

keywords state space modelslong sequence modelingfractional measuresprojection operatorsrecurrent architecturestemporal analysisLong Range Arena

0 comments

The pith

FRACTAL integrates fractional measure theory into state space model projections to retain scale-invariant memory while increasing sensitivity to recent signal changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to resolve a core tension in sequence modeling: how to keep unbounded historical context across long timescales without losing the ability to register abrupt short-term variations. Standard high-order polynomial projection operators in state space models force a choice between uniform measures that dilute recent information and exponential measures that discard global structure. By introducing fractional measures, the authors derive projection operators whose spectral properties can be characterized analytically and whose singularity index can be tuned to boost recent sensitivity while leaving the scale-invariant memory spectrum intact. This construction is placed inside a diagonalized state space model simply by adjusting the initialization of the input projection, yielding an average score of 87.11 percent on the Long Range Arena benchmark.

Core claim

By integrating fractional measure theory into recursive memory updates, FRACTAL derives projection operators with analytically characterized spectral properties and a tunable singularity index. This permits the model to amplify sensitivity to recent signal perturbations while preserving the spectral structure that encodes scale-invariant memory dynamics. The theoretical construction is realized inside a simplified diagonalized state space framework by modulating input projection initialization, enabling simultaneous capture of multi-scale temporal features and producing an average score of 87.11 percent on the Long Range Arena benchmark, including 61.85 percent on ListOps.

What carries the argument

Projection operators derived from fractional measure theory with a tunable singularity index, realized by modulating input projection initialization inside a diagonalized state space framework.

If this is right

The approach yields 87.11 percent average accuracy on the Long Range Arena, exceeding the S5 baseline.
Multi-scale temporal features are captured simultaneously without requiring extensive hyperparameter search.
Spectral structure for scale-invariant memory is retained while recent-signal sensitivity is increased.
The construction applies directly to any diagonalized state space model by a change in projection initialization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tunable singularity index could be scheduled during training to adapt to sequences whose dominant timescales shift over time.
The same fractional construction might be ported to non-diagonal state space realizations if the spectral analysis can be extended.
Tasks that combine long periodic structure with rare sharp events, such as certain sensor streams or financial tick data, become natural test cases.

Load-bearing premise

Modulating input projection initialization inside the diagonalized state space framework with the fractional measure produces simultaneous multi-scale capture without introducing uncharacterized performance trade-offs.

What would settle it

A measurement showing that the derived operators deviate from the claimed spectral properties or that the model loses accuracy on sequences that require both long-range scale invariance and high-resolution recent perturbations.

Figures

Figures reproduced from arXiv: 2605.08833 by Jinshuai Yang, Lixin Li, Mengqi Li, Wensheng Lin.

**Figure 1.** Figure 1: The computational architecture of FRACTAL. Top (Phase 1): Offline initialization. The multi-scale α derives fractional operators, spectrally decomposed to produce Λ and a physically-informed B˜phys. Bottom (Phase 2): Online computation. The instantiated parameters are discretized and executed via efficient MIMO parallel scans. 2. Background This section reviews the theoretical foundations of state space mo… view at source ↗

**Figure 2.** Figure 2: Memory measure comparison. (a) The fractional measure (FRACTAL, α = 0.5) compared with LegS (uniform), LagT (exponential), and LegT (window). The fractional measure achieves recency sensitivity while maintaining power-law tails for long-term retention. (b) Effect of the singularity index α: increasing α shifts focus toward recent history while preserving the scale-invariant structure. measures. The propose… view at source ↗

**Figure 3.** Figure 3: Structure of the fractional HiPPO matrix A(α) for different singularity indices. The lower-triangular structure is preserved across all α, with diagonal elements invariant at Ann = n + 1. Off-diagonal coupling increases with α, reflecting enhanced basis mixing. Impact Statement This paper advances the theoretical foundations of sequence modeling by introducing fractional calculus into the measure-theoreti… view at source ↗

read the original abstract

Effective sequence modeling fundamentally requires balancing the retention of unbounded history with the high-resolution detection of abrupt short-term variations common in real-world phenomena. However, existing state space models (SSMs) relying on high-order polynomial projection operators (HiPPO) face a critical trade-off where uniform measures dilute recent information to maintain timescale invariance, while exponential measures sacrifice global context to capture local dynamics. This paper proposes a Fractional Recurrent Architecture for Computational Temporal Analysis of Long sequences (FRACTAL), a novel architecture integrating fractional measure theory into recursive memory updates to address this limitation. By deriving projection operators with analytically characterized spectral properties and a tunable singularity index, the proposed method amplifies sensitivity to recent signal perturbations while preserving the spectral structure that encodes scale-invariant memory dynamics. This theoretical innovation is instantiated within a simplified diagonalized state space framework by modulating input projection initialization to enable simultaneous capture of multi-scale temporal features. FRACTAL achieves an average score of 87.11\% on the Long Range Arena benchmark, including 61.85\% on the ListOps task, outperforming the S5 model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FRACTAL tries to fix the long-short timescale trade-off in SSMs by adding a tunable singularity index from fractional measures to input projections, but the supporting derivations and controls are missing from what's shown.

read the letter

The main thing to know is that this paper takes the standard diagonalized SSM setup like S5 and modulates the input projection initialization using ideas from fractional measure theory. It introduces a singularity index as a free parameter meant to amplify sensitivity to recent changes while keeping the spectral structure for long-range memory intact, and it reports 87.11% average on Long Range Arena with a 61.85% on ListOps, beating the S5 baseline they compare against. That combination of fractional measures with recursive updates and the specific index looks like the actual new piece, not just another HiPPO variant. The framing of the problem is clear and the benchmark numbers are competitive enough to notice. The soft spots are more substantial. The abstract and summary give no equations for the projection operators, no eigenvalue analysis, and no stability bounds to show the claimed spectral preservation actually holds after the modulation. Without ablations on the singularity index or checks against tuned baselines, the gains could easily trace back to the extra hyperparameter rather than the fractional construction itself. The stress-test point about possible unanalyzed damping or phase shifts in the state transitions lands because nothing visible rules it out. This is aimed at people already working on efficient sequence models who might want to test fractional extensions in SSMs. A reader focused on long-context architectures could extract the core idea and try it, but the current version needs the math and controls filled in before it becomes reliable. I would send it to peer review so the authors can supply the derivations and run the necessary checks.

Referee Report

2 major / 1 minor

Summary. The paper introduces FRACTAL, a state space model (SSM) architecture that integrates fractional measure theory into recursive memory updates to balance retention of unbounded history with high-resolution detection of short-term variations. It claims to derive projection operators with analytically characterized spectral properties and a tunable singularity index that amplify sensitivity to recent perturbations while preserving the scale-invariant memory encoded in HiPPO-like spectra. This is instantiated in a simplified diagonalized SSM framework (S5-style) via modulation of input projection initialization, yielding 87.11% average accuracy on the Long Range Arena benchmark (including 61.85% on ListOps) and outperforming the S5 baseline.

Significance. If the theoretical derivations hold and performance gains are shown to stem from the fractional construction rather than tuning, the work could advance SSMs for long sequences by providing a principled mechanism for multi-scale temporal capture without the uniform-vs-exponential measure trade-off. The LRA results indicate potential practical value, and the approach builds directly on existing diagonalized frameworks, which may preserve computational efficiency.

major comments (2)

[Abstract and §3 (Theoretical Derivation)] Abstract and theoretical derivation sections: the central claim that projection operators derived from fractional measure theory have 'analytically characterized spectral properties' with a tunable singularity index that 'preserves the spectral structure' is unsupported, as no explicit operator forms, eigenvalue analysis, stability bounds, or proofs are provided to verify that modulation of input projections does not alter state-transition eigenvalues or introduce unanalyzed damping/phase shifts.
[§5 (Experiments) and Table 1 (LRA results)] Experimental evaluation: the reported 87.11% LRA average and ListOps score lack ablation studies, error analysis, or controls isolating the singularity index from initialization variance or hyperparameter effects, undermining the assertion that gains arise from simultaneous multi-scale capture without trade-offs.

minor comments (1)

[Abstract] The abstract refers to a 'simplified diagonalized state space framework' without clarifying its precise relation to or differences from the S5 model beyond input projection modulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, acknowledging areas where the presentation can be strengthened and outlining specific revisions we will make.

read point-by-point responses

Referee: [Abstract and §3 (Theoretical Derivation)] Abstract and theoretical derivation sections: the central claim that projection operators derived from fractional measure theory have 'analytically characterized spectral properties' with a tunable singularity index that 'preserves the spectral structure' is unsupported, as no explicit operator forms, eigenvalue analysis, stability bounds, or proofs are provided to verify that modulation of input projections does not alter state-transition eigenvalues or introduce unanalyzed damping/phase shifts.

Authors: We acknowledge that the theoretical derivations in §3 would benefit from greater explicitness to fully substantiate the claims. In the revised manuscript, we will expand §3 to include the explicit closed-form expressions for the fractional projection operators, a complete eigenvalue decomposition of the modulated state-transition matrices, rigorous stability bounds, and formal proofs that the singularity-index modulation of input projections leaves the original HiPPO-like eigenvalues unchanged and introduces neither additional damping nor phase shifts. These additions will directly support the abstract statements while preserving the diagonalized S5-style framework. revision: yes
Referee: [§5 (Experiments) and Table 1 (LRA results)] Experimental evaluation: the reported 87.11% LRA average and ListOps score lack ablation studies, error analysis, or controls isolating the singularity index from initialization variance or hyperparameter effects, undermining the assertion that gains arise from simultaneous multi-scale capture without trade-offs.

Authors: We agree that stronger experimental controls are required to isolate the contribution of the fractional construction. In the revision, we will add a dedicated ablation subsection that systematically varies the singularity index while holding initialization variance, learning-rate schedules, and all other hyperparameters fixed across multiple random seeds. We will report mean accuracies with standard deviations and statistical significance tests for the LRA tasks (including ListOps) to demonstrate that the observed gains are attributable to the multi-scale temporal capture enabled by the tunable singularity index rather than tuning artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity identified from provided abstract and context

full rationale

The abstract describes deriving projection operators from fractional measure theory with analytically characterized spectral properties and a tunable singularity index, then instantiating via modulating input projection initialization in a diagonalized SSM. No equations, self-citations, or reductions are quoted that would make any claimed result equivalent to its inputs by construction. The LRA performance is presented as an empirical outcome rather than a forced prediction. Per hard rules, without explicit quotes exhibiting self-definitional, fitted-input, or self-citation load-bearing steps, the finding is no significant circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the unverified assumption that fractional measure theory yields analytically characterizable spectral properties when inserted into SSM updates, plus a tunable parameter whose value is not derived from first principles.

free parameters (1)

singularity index
Tunable parameter introduced to control sensitivity to recent perturbations while aiming to preserve scale-invariance.

axioms (1)

domain assumption Fractional measure theory supplies projection operators with analytically characterized spectral properties suitable for recursive memory updates.
Invoked as the foundation for the theoretical innovation in the abstract.

pith-pipeline@v0.9.0 · 5496 in / 1212 out tokens · 58681 ms · 2026-05-12T02:34:50.852894+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
By deriving projection operators with analytically characterized spectral properties and a tunable singularity index, the proposed method amplifies sensitivity to recent signal perturbations while preserving the spectral structure that encodes scale-invariant memory dynamics.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
The state transition matrix A(α) ... Diagonal Invariance: Ann = n + 1 for all n ≥ 0, independent of α.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 2 internal anchors

[1]

Blelloch, G. E. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990

work page 1990
[2]

and Dao, T

Gu, A. and Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In First conference on language modeling, 2024

work page 2024
[3]

HiPPO : Recurrent memory with optimal polynomial projections

Gu, A., Dao, T., Ermon, S., Rudra, A., and R \'e , C. HiPPO : Recurrent memory with optimal polynomial projections. In Advances in Neural Information Processing Systems, volume 33, pp.\ 1474--1487, 2020

work page 2020
[4]

On the parameterization and initialization of diagonal state space models

Gu, A., Goel, K., Gupta, A., and R \'e , C. On the parameterization and initialization of diagonal state space models. In Advances in Neural Information Processing Systems, volume 35, pp.\ 35971--35983, 2022 a

work page 2022
[5]

Efficiently modeling long sequences with structured state spaces

Gu, A., Goel, K., and R \'e , C. Efficiently modeling long sequences with structured state spaces. In The International Conference on Learning Representations, 2022 b

work page 2022
[6]

Diagonal state spaces are as effective as structured state spaces

Gupta, A., Gu, A., and Berant, J. Diagonal state spaces are as effective as structured state spaces. In Advances in Neural Information Processing Systems, volume 35, pp.\ 22982--22994, 2022

work page 2022
[7]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9 0 (8): 0 1735--1780, 1997

work page 1997
[8]

E., Taqqu, M

Leland, W. E., Taqqu, M. S., Willinger, W., and Wilson, D. V. On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Transactions on networking, 2 0 (1): 0 1--15, 2002

work page 2002
[9]

Mandelbrot, B. B. Fractals and scaling in finance: Discontinuity, concentration, risk. Selecta volume E. Springer Science & Business Media, 2013

work page 2013
[10]

Pang, G., Lu, L., and Karniadakis, G. E. fpinns: Fractional physics-informed neural networks. SIAM Journal on Scientific Computing, 41 0 (4): 0 A2603--A2626, 2019

work page 2019
[11]

Fractional differential equations: an introduction to fractional derivatives, fractional differential equations, to methods of their solution and some of their applications

Podlubny, I. Fractional differential equations: an introduction to fractional derivatives, fractional differential equations, to methods of their solution and some of their applications. Elsevier, 1998

work page 1998
[12]

T., Warrington, A., and Linderman, S

Smith, J. T., Warrington, A., and Linderman, S. W. Simplified state space layers for sequence modeling. In The International Conference on Learning Representations, 2023

work page 2023
[13]

Long range arena: A benchmark for efficient transformers

Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J., Yang, L., Ruder, S., and Metzler, D. Long range arena: A benchmark for efficient transformers. 2021

work page 2021
[14]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[15]

Legendre memory units: Continuous-time representation in recurrent neural networks

Voelker, A., Kaji \'c , I., and Eliasmith, C. Legendre memory units: Continuous-time representation in recurrent neural networks. In Advances in Neural Information Processing Systems, volume 32, pp.\ 15544--15553, 2019

work page 2019
[16]

Fractional-order gradient descent learning of bp neural networks with caputo derivative

Wang, J., Wen, Y., Gou, Y., Ye, Z., and Chen, H. Fractional-order gradient descent learning of bp neural networks with caputo derivative. Neural networks, 89: 0 19--30, 2017

work page 2017
[17]

West, B. J. Fractal physiology and chaos in medicine, volume 16. World Scientific, 2012

work page 2012
[18]

Advances in Neural Information Processing Systems , volume=

Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and R. Advances in Neural Information Processing Systems , volume=

work page
[19]

The International Conference on Learning Representations , year=

Efficiently Modeling Long Sequences with Structured State Spaces , author=. The International Conference on Learning Representations , year=

work page
[20]

The International Conference on Learning Representations , year=

Simplified State Space Layers for Sequence Modeling , author=. The International Conference on Learning Representations , year=

work page
[21]

IEEE Transactions on Neural Networks , volume=

Learning Long-Term Dependencies with Gradient Descent is Difficult , author=. IEEE Transactions on Neural Networks , volume=

work page
[22]

Advances in Neural Information Processing Systems , volume=

Attention is All You Need , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

International Conference on Learning Representations , year=

Long Range Arena: A Benchmark for Efficient Transformers , author=. International Conference on Learning Representations , year=

work page
[24]

ACM Computing Surveys , volume=

Efficient Transformers: A Survey , author=. ACM Computing Surveys , volume=

work page
[25]

Learning Phrase Representations using

Cho, Kyunghyun and Van Merri. Learning Phrase Representations using. Proceedings of the Conference on Empirical Methods in Natural Language Processing , pages=

work page
[26]

Physics Reports , volume=

The Random Walk's Guide to Anomalous Diffusion: A Fractional Dynamics Approach , author=. Physics Reports , volume=

work page
[27]

1993 , publisher=

Fractional Integrals and Derivatives: Theory and Applications , author=. 1993 , publisher=

work page 1993
[28]

Memory & Cognition , volume=

Genuine Power Curves in Forgetting: A Quantitative Analysis of Individual Subject Forgetting Functions , author=. Memory & Cognition , volume=

work page
[29]

Psychological Science , volume=

Reflections of the Environment in Memory , author=. Psychological Science , volume=

work page
[30]

Journal of Statistical Physics , volume=

Critical Behavior and Universality Classes of a Parallel Generative Neural Network , author=. Journal of Statistical Physics , volume=

work page
[31]

International Conference on Learning Representations , year=

Rethinking Attention with Performers , author=. International Conference on Learning Representations , year=

work page
[32]

Transformers are

Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. International Conference on Machine Learning , pages=

work page
[33]

Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University , year=

Prefix Sums and Their Applications , author=. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University , year=

work page
[34]

Advances in Neural Information Processing Systems , volume=

Neural Ordinary Differential Equations , author=. Advances in Neural Information Processing Systems , volume=

work page
[35]

2010 , publisher=

Fractional-Order Systems and Controls: Fundamentals and Applications , author=. 2010 , publisher=

work page 2010
[36]

and Li, Lijun and Lu, Jiang and Young, Melissa and Chiang, Po-Hsuan Cameron and Bhattacharya, Sourabh and Bhattacharya, Shinjini , journal=

Chien, Hsiang-Yun Sherry and Goh, Hanlin and Sandborn, Christopher M. and Li, Lijun and Lu, Jiang and Young, Melissa and Chiang, Po-Hsuan Cameron and Bhattacharya, Sourabh and Bhattacharya, Shinjini , journal=. Slower is Better: Revisiting the Forgetting Mechanism in

work page
[37]

1939 , publisher=

Orthogonal Polynomials , author=. 1939 , publisher=

work page 1939
[38]

A structured self-attentive sentence embedding.arXiv preprint arXiv:1703.03130,

A Structured Self-Attentive Sentence Embedding , author=. arXiv preprint arXiv:1703.03130 , year=

work page arXiv
[39]

2008 , institution=

An extended collection of matrix derivative results for forward and reverse mode automatic differentiation , author=. 2008 , institution=

work page 2008
[40]

International Conference on Machine Learning , pages=

On layer normalization in the transformer architecture , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[41]

International Conference on Machine Learning , pages=

Language modeling with gated convolutional networks , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[42]

arXiv preprint , year=

Differentiation of the eigenvalue decomposition , author=. arXiv preprint , year=

work page
[43]

Psychological Science , volume=

Reflections of the environment in memory , author=. Psychological Science , volume=

work page
[44]

Advances in Neural Information Processing Systems , volume=

On the Parameterization and Initialization of Diagonal State Space Models , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

Advances in Neural Information Processing Systems , volume=

Diagonal State Spaces are as Effective as Structured State Spaces , author=. Advances in Neural Information Processing Systems , volume=

work page
[46]

First conference on language modeling , year=

Mamba: Linear-time sequence modeling with selective state spaces , author=. First conference on language modeling , year=

work page
[47]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality , author=. arXiv preprint arXiv:2405.21060 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Jamba: A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model , author=. arXiv preprint arXiv:2403.19887 , year=

work page internal anchor Pith review arXiv
[49]

IEEE Transactions on Neural Networks , volume=

Learning Long-term Dependencies with Gradient Descent is Difficult , author=. IEEE Transactions on Neural Networks , volume=

work page
[50]

Advances in Neural Information Processing Systems , volume=

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[51]

Nature Neuroscience , volume=

Fractional Differentiation by Neocortical Pyramidal Neurons , author=. Nature Neuroscience , volume=

work page
[52]

2010 , publisher=

The Analysis of Fractional Differential Equations , author=. 2010 , publisher=

work page 2010
[53]

Science China Information Sciences , volume=

A New Fractional Order Gradient Descent Optimization Algorithm , author=. Science China Information Sciences , volume=

work page
[54]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

work page 1997
[55]

Selecta volume E , author=

Fractals and scaling in finance: Discontinuity, concentration, risk. Selecta volume E , author=. 2013 , publisher=

work page 2013
[56]

2012 , publisher=

Fractal physiology and chaos in medicine , author=. 2012 , publisher=

work page 2012
[57]

IEEE/ACM Transactions on networking , volume=

On the self-similar nature of Ethernet traffic (extended version) , author=. IEEE/ACM Transactions on networking , volume=. 2002 , publisher=

work page 2002
[58]

1998 , publisher=

Fractional differential equations: an introduction to fractional derivatives, fractional differential equations, to methods of their solution and some of their applications , author=. 1998 , publisher=

work page 1998
[59]

Neural networks , volume=

Fractional-order gradient descent learning of BP neural networks with Caputo derivative , author=. Neural networks , volume=. 2017 , publisher=

work page 2017
[60]

SIAM Journal on Scientific Computing , volume=

fPINNs: Fractional physics-informed neural networks , author=. SIAM Journal on Scientific Computing , volume=. 2019 , publisher=

work page 2019