arxiv: 2604.23518 · v1 · submitted 2026-04-26 · 💻 cs.LG · cs.AI

Recognition: unknown

Autocorrelation Reintroduces Spectral Bias in KANs for Time Series Forecasting

Chen Zeng , Jiahui Wang , Qiao Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Kolmogorov-Arnold Networksspectral biastime series forecastingautocorrelationDiscrete Cosine Transformneural networkslagged inputs

0 comments

The pith

Temporal autocorrelation among lagged inputs reintroduces spectral bias in KANs for time series forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing theory holds that Kolmogorov-Arnold Networks avoid spectral bias when inputs are statistically independent. In time series forecasting, however, inputs consist of lagged observations that carry strong temporal autocorrelation, and this dependence causes the networks to favor low frequencies again. The bias grows stronger as the autocorrelation level rises. The authors confirm the pattern through analysis and experiments, then show that Discrete Cosine Transform preprocessing reduces input correlations and removes most of the observed low-frequency preference.

Core claim

The paper establishes that temporal autocorrelation reintroduces spectral bias in KANs, with the low-frequency preference becoming more pronounced as autocorrelation increases. This occurs because the independence assumption underlying existing KAN theory is violated by the lagged inputs used in forecasting. Applying the Discrete Cosine Transform to decorrelate the inputs substantially reduces the bias, confirming that autocorrelation among input variables is the inducing factor.

What carries the argument

Discrete Cosine Transform preprocessing that reduces correlations among lagged input variables, thereby removing the reintroduced spectral bias.

Load-bearing premise

Existing theory assumes network inputs are statistically independent, yet lagged observations in time series are autocorrelated.

What would settle it

A controlled experiment in which KANs trained on time series inputs with artificially removed autocorrelation show no spectral bias, while the same networks on inputs with increasing autocorrelation show progressively stronger low-frequency preference.

Figures

Figures reproduced from arXiv: 2604.23518 by Chen Zeng, Jiahui Wang, Qiao Wang.

**Figure 1.** Figure 1: Epoch-wise training dynamics of KAN and DCT-KAN ( view at source ↗

**Figure 2.** Figure 2: Autocorrelation-wise performance of KAN and DCT-KAN. view at source ↗

read the original abstract

Existing theory suggests that Kolmogorov-Arnold Networks (KANs) can overcome the spectral bias commonly observed in neural networks under the assumption that inputs are statistically independent. However, this assumption does not hold in time series forecasting (TSF), where inputs are lagged observations with strong temporal autocorrelation. Through theoretical analysis and empirical validation, we obtain an unexpected finding: temporal autocorrelation reintroduces spectral bias in KANs, and the bias becomes increasingly pronounced as the degree of autocorrelation increases. This suggests that standard KANs may face substantial difficulties in TSF with strongly autocorrelated inputs. To address this problem, we introduce the Discrete Cosine Transform (DCT) to reduce the correlations among the network inputs. As expected, experimental results reveal that DCT preprocessing substantially reduces the observed low-frequency preference in TSF. This result also corroborates that the spectral bias of KANs in TSF tasks is indeed induced by the autocorrelation among input variables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that Kolmogorov-Arnold Networks (KANs) overcome spectral bias under the assumption of statistically independent inputs, but this assumption is violated in time series forecasting where inputs are lagged observations exhibiting temporal autocorrelation. Theoretical analysis shows that autocorrelation reintroduces spectral bias in KANs, with the bias strengthening as autocorrelation increases. Empirical experiments demonstrate that Discrete Cosine Transform (DCT) preprocessing reduces input correlations and substantially mitigates the observed low-frequency preference, supporting the conclusion that autocorrelation induces the bias.

Significance. If the central claim holds after clarification, the work is significant for highlighting a practical limitation of KANs in dependent-data regimes typical of time series, extending spectral bias analysis beyond standard MLPs. It offers a simple, effective preprocessing mitigation via DCT. Credit is given for combining theoretical analysis with targeted empirical validation to test the hypothesis directly. The result would be of interest to researchers applying KANs or similar architectures to sequential data.

major comments (2)

[Theoretical Analysis] Theoretical Analysis section: the derivation does not explicitly connect the input covariance matrix (induced by autocorrelation) to the inability of KAN univariate splines to fit high-frequency components. Without this step, it remains unclear whether the reintroduced bias follows directly from autocorrelation or from other properties of lagged time-series inputs.
[Empirical Validation] Empirical Validation section: DCT preprocessing simultaneously reduces correlations and aligns inputs to a frequency-localized basis. The experiments do not include a control using an alternative decorrelation method (e.g., PCA or whitening) that preserves the original coordinate system; this leaves open the possibility that bias reduction arises from basis alignment rather than decorrelation alone, weakening the causal attribution to autocorrelation.

minor comments (2)

The abstract and experimental descriptions should specify the exact time-series datasets, autocorrelation levels tested, and quantitative metrics (with error bars) used to measure low-frequency preference.
Notation for the KAN spline functions and the precise definition of spectral bias (e.g., via frequency decomposition of the learned univariate functions) should be introduced earlier and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful and constructive comments, which help clarify and strengthen our presentation of the results. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Theoretical Analysis] Theoretical Analysis section: the derivation does not explicitly connect the input covariance matrix (induced by autocorrelation) to the inability of KAN univariate splines to fit high-frequency components. Without this step, it remains unclear whether the reintroduced bias follows directly from autocorrelation or from other properties of lagged time-series inputs.

Authors: We thank the referee for highlighting this point of clarity. Our theoretical analysis models the KAN as a composition of univariate spline functions on the lagged inputs and incorporates the autocorrelation through the covariance of these inputs when analyzing the expected approximation error. The covariance structure is used to show that the effective contribution of higher-order spline terms (which capture high-frequency content) is attenuated as autocorrelation increases. To make the connection fully explicit, we will revise the Theoretical Analysis section to insert an intermediate derivation step that directly relates the eigenvalues of the input covariance matrix to the frequency content representable by the univariate splines under the KAN architecture. This will confirm that the bias is induced specifically by the autocorrelation-induced dependencies. revision: yes
Referee: [Empirical Validation] Empirical Validation section: DCT preprocessing simultaneously reduces correlations and aligns inputs to a frequency-localized basis. The experiments do not include a control using an alternative decorrelation method (e.g., PCA or whitening) that preserves the original coordinate system; this leaves open the possibility that bias reduction arises from basis alignment rather than decorrelation alone, weakening the causal attribution to autocorrelation.

Authors: We agree that this is an important consideration for causal attribution. DCT achieves decorrelation while projecting onto a frequency basis, so the observed mitigation could partly stem from the basis change. To isolate the role of decorrelation, we will add a control experiment in the revised Empirical Validation section using PCA whitening on the lagged inputs. PCA decorrelates the inputs via an orthogonal transformation without imposing a frequency-localized basis. We will report the resulting spectral bias metrics for KANs trained on PCA-preprocessed inputs and compare them to both the raw and DCT-preprocessed cases. This will provide additional evidence that the bias reduction is driven by reduced input correlations. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation rests on external theory plus independent experiments

full rationale

The paper references an existing external theory (KANs overcome spectral bias only under input independence) and then performs new theoretical analysis plus empirical tests with DCT preprocessing on autocorrelated time-series inputs. The key result—that autocorrelation reintroduces bias—is corroborated by the observed reduction in low-frequency preference after decorrelation, without any quantity being defined in terms of itself or any prediction being statistically forced by a prior fit on the same data. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results appear in the load-bearing steps. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the independence assumption stated in prior KAN theory and treats DCT as a standard, off-the-shelf transform; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption KANs overcome spectral bias when inputs are statistically independent
This is the existing theory referenced in the abstract that the paper challenges when applied to autocorrelated time-series inputs.

pith-pipeline@v0.9.0 · 5462 in / 1235 out tokens · 27570 ms · 2026-05-08T06:24:51.696357+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models,

M. Praveen, S. Dekka, D. M. Sai, D. P. Chennamsetty, and D. P. Chinta, “Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models,”Computational Economics, vol. 67, no. 2, pp. 963–989, 2026

2026
[2]

Dwfh: An improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm),

K. Venkatachalam, P. Trojovsk `y, D. Pamucar, N. Bacanin, and V . Simic, “Dwfh: An improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm),”Expert systems with applications, vol. 213, p. 119270, 2023

2023
[3]

Energy forecasting: A comprehensive review of techniques and technologies,

A. Mystakidis, P. Koukaras, N. Tsalikidis, D. Ioannidis, and C. Tjortjis, “Energy forecasting: A comprehensive review of techniques and technologies,” Energies, vol. 17, no. 7, p. 1662, 2024

2024
[4]

An adaptive composite time series forecasting model for short-term traffic flow,

Q. Shao, X. Piao, X. Yao, Y . Kong, Y . Hu, B. Yin, and Y . Zhang, “An adaptive composite time series forecasting model for short-term traffic flow,” Journal of Big Data, vol. 11, no. 1, p. 102, 2024

2024
[5]

A neural network approach to time series forecasting,

I. A. Gheyas and L. S. Smith, “A neural network approach to time series forecasting,” 2009. [Online]. Available: https://api.semanticscholar.org/CorpusID: 2266156

2009
[6]

Recurrent neural networks,

L. R. Medsker, L. Jainet al., “Recurrent neural networks,”Design and applications, vol. 5, no. 64-67, p. 2, 2001

2001
[7]

Long Short-Term memory,

S. Hochreiter and J. Schmidhuber, “Long Short-Term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 11 1997

1997
[8]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[9]

Informer: Beyond efficient transformer for long sequence Time-Series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence Time-Series forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11 106–11 115, 5 2021

2021
[10]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 3 2018. [Online]. Available: https://arxiv.org/abs/1803.01271

work page internal anchor Pith review arXiv 2018
[11]

Timesnet: Temporal 2d-variation modeling for general time series analysis,

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” inThe Eleventh International Conference on Learning Representations, 2023

2023
[12]

Time-llm: Time series forecasting by reprogramming large language models,

M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P.-Y . Chen, Y . Liang, Y .-F. Li, S. Panet al., “Time-llm: Time series forecasting by reprogramming large language models,” inThe Twelfth International Conference on Learning Representations, 2024

2024
[13]

Kan: Kolmogorov–arnold networks,

Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljacic, T. Y . Hou, and M. Tegmark, “Kan: Kolmogorov–arnold networks,” inThe Thirteenth International Conference on Learning Representations, 2025

2025
[14]

On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition,

A. N. Kolmogorov, “On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition,”Doklady Akademii Nauk SSSR, vol. 114, no. 5, pp. 953–956, 1957

1957
[15]

On functions of three variables,

V . I. Arnold, “On functions of three variables,”Doklady Akademii Nauk SSSR, vol. 114, no. 4, pp. 679–681, 1957

1957
[16]

On the spectral bias of neural networks,

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 5301–5310

2019
[17]

On the Activation Function Dependence of the Spectral Bias of Neural Networks,

Q. Hong, J. W. Siegel, Q. Tan, and J. Xu, “On the Activation Function Dependence of the Spectral Bias of Neural Networks,” 8 2022. [Online]. Available: https://arxiv.org/abs/2208.04924

work page arXiv 2022
[18]

On the expressiveness and spectral bias of kans,

Y . Wang, J. W. Siegel, Z. Liu, and T. Y . Hou, “On the expressiveness and spectral bias of kans,” inThe Thirteenth International Conference on Learning Representations, 2025

2025
[19]

Reduced effectiveness of kolmogorov-arnold networks on functions with noise,

H. Shen, C. Zeng, J. Wang, and Q. Wang, “Reduced effectiveness of kolmogorov-arnold networks on functions with noise,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5. 18

2025
[20]

Kan versus mlp on irregular or noisy function,

C. Zeng, J. Wang, H. Shen, and Q. Wang, “Kan versus mlp on irregular or noisy function,” in2025 15th IEEE International Conference on Pattern Recognition Systems (ICPRS). IEEE, 2025, pp. 1–7

2025
[21]

Neural tangent kernel: Convergence and generalization in neural networks,

A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,”Advances in neural information processing systems, vol. 31, 2018

2018
[22]

On exact computation with an infinitely wide neural net,

S. Arora, S. S. Du, W. Hu, Z. Li, R. R. Salakhutdinov, and R. Wang, “On exact computation with an infinitely wide neural net,”Advances in neural information processing systems, vol. 32, 2019

2019
[23]

Time series analysis forecasting and control

D. J. Bartholomew, “Time series analysis forecasting and control.” 1971

1971
[24]

J. D. Cryer,Time series analysis. Duxbury Press Boston, 1986, vol. 286

1986
[25]

FREIE: Low-Frequency spectral bias in neural networks for Time-Series tasks,

J. Sun, X. Ling, J. Zou, J. Kang, and K. Zhang, “FREIE: Low-Frequency spectral bias in neural networks for Time-Series tasks,” 10 2025. [Online]. Available: https://arxiv.org/abs/2510.25800

work page arXiv 2025
[26]

Exploring spectral bias in time series long sequence forecasting,

K. N. Ackaah-Gyasi, S. Valdez, Y . Gao, and L. Zhang, “Exploring spectral bias in time series long sequence forecasting,” 2023

2023
[27]

Discrete cosine transform,

N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,”IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974

1974
[28]

Sovrano et al

Z. Li, “Kolmogorov-Arnold Networks are Radial Basis Function Networks,” 5 2024. [Online]. Available: https://arxiv.org/abs/2405.06721

work page arXiv 2024