Recognition: unknown
Autocorrelation Reintroduces Spectral Bias in KANs for Time Series Forecasting
Pith reviewed 2026-05-08 06:24 UTC · model grok-4.3
The pith
Temporal autocorrelation among lagged inputs reintroduces spectral bias in KANs for time series forecasting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that temporal autocorrelation reintroduces spectral bias in KANs, with the low-frequency preference becoming more pronounced as autocorrelation increases. This occurs because the independence assumption underlying existing KAN theory is violated by the lagged inputs used in forecasting. Applying the Discrete Cosine Transform to decorrelate the inputs substantially reduces the bias, confirming that autocorrelation among input variables is the inducing factor.
What carries the argument
Discrete Cosine Transform preprocessing that reduces correlations among lagged input variables, thereby removing the reintroduced spectral bias.
Load-bearing premise
Existing theory assumes network inputs are statistically independent, yet lagged observations in time series are autocorrelated.
What would settle it
A controlled experiment in which KANs trained on time series inputs with artificially removed autocorrelation show no spectral bias, while the same networks on inputs with increasing autocorrelation show progressively stronger low-frequency preference.
Figures
read the original abstract
Existing theory suggests that Kolmogorov-Arnold Networks (KANs) can overcome the spectral bias commonly observed in neural networks under the assumption that inputs are statistically independent. However, this assumption does not hold in time series forecasting (TSF), where inputs are lagged observations with strong temporal autocorrelation. Through theoretical analysis and empirical validation, we obtain an unexpected finding: temporal autocorrelation reintroduces spectral bias in KANs, and the bias becomes increasingly pronounced as the degree of autocorrelation increases. This suggests that standard KANs may face substantial difficulties in TSF with strongly autocorrelated inputs. To address this problem, we introduce the Discrete Cosine Transform (DCT) to reduce the correlations among the network inputs. As expected, experimental results reveal that DCT preprocessing substantially reduces the observed low-frequency preference in TSF. This result also corroborates that the spectral bias of KANs in TSF tasks is indeed induced by the autocorrelation among input variables.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that Kolmogorov-Arnold Networks (KANs) overcome spectral bias under the assumption of statistically independent inputs, but this assumption is violated in time series forecasting where inputs are lagged observations exhibiting temporal autocorrelation. Theoretical analysis shows that autocorrelation reintroduces spectral bias in KANs, with the bias strengthening as autocorrelation increases. Empirical experiments demonstrate that Discrete Cosine Transform (DCT) preprocessing reduces input correlations and substantially mitigates the observed low-frequency preference, supporting the conclusion that autocorrelation induces the bias.
Significance. If the central claim holds after clarification, the work is significant for highlighting a practical limitation of KANs in dependent-data regimes typical of time series, extending spectral bias analysis beyond standard MLPs. It offers a simple, effective preprocessing mitigation via DCT. Credit is given for combining theoretical analysis with targeted empirical validation to test the hypothesis directly. The result would be of interest to researchers applying KANs or similar architectures to sequential data.
major comments (2)
- [Theoretical Analysis] Theoretical Analysis section: the derivation does not explicitly connect the input covariance matrix (induced by autocorrelation) to the inability of KAN univariate splines to fit high-frequency components. Without this step, it remains unclear whether the reintroduced bias follows directly from autocorrelation or from other properties of lagged time-series inputs.
- [Empirical Validation] Empirical Validation section: DCT preprocessing simultaneously reduces correlations and aligns inputs to a frequency-localized basis. The experiments do not include a control using an alternative decorrelation method (e.g., PCA or whitening) that preserves the original coordinate system; this leaves open the possibility that bias reduction arises from basis alignment rather than decorrelation alone, weakening the causal attribution to autocorrelation.
minor comments (2)
- The abstract and experimental descriptions should specify the exact time-series datasets, autocorrelation levels tested, and quantitative metrics (with error bars) used to measure low-frequency preference.
- Notation for the KAN spline functions and the precise definition of spectral bias (e.g., via frequency decomposition of the learned univariate functions) should be introduced earlier and used consistently.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments, which help clarify and strengthen our presentation of the results. We address each major comment below and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Theoretical Analysis] Theoretical Analysis section: the derivation does not explicitly connect the input covariance matrix (induced by autocorrelation) to the inability of KAN univariate splines to fit high-frequency components. Without this step, it remains unclear whether the reintroduced bias follows directly from autocorrelation or from other properties of lagged time-series inputs.
Authors: We thank the referee for highlighting this point of clarity. Our theoretical analysis models the KAN as a composition of univariate spline functions on the lagged inputs and incorporates the autocorrelation through the covariance of these inputs when analyzing the expected approximation error. The covariance structure is used to show that the effective contribution of higher-order spline terms (which capture high-frequency content) is attenuated as autocorrelation increases. To make the connection fully explicit, we will revise the Theoretical Analysis section to insert an intermediate derivation step that directly relates the eigenvalues of the input covariance matrix to the frequency content representable by the univariate splines under the KAN architecture. This will confirm that the bias is induced specifically by the autocorrelation-induced dependencies. revision: yes
-
Referee: [Empirical Validation] Empirical Validation section: DCT preprocessing simultaneously reduces correlations and aligns inputs to a frequency-localized basis. The experiments do not include a control using an alternative decorrelation method (e.g., PCA or whitening) that preserves the original coordinate system; this leaves open the possibility that bias reduction arises from basis alignment rather than decorrelation alone, weakening the causal attribution to autocorrelation.
Authors: We agree that this is an important consideration for causal attribution. DCT achieves decorrelation while projecting onto a frequency basis, so the observed mitigation could partly stem from the basis change. To isolate the role of decorrelation, we will add a control experiment in the revised Empirical Validation section using PCA whitening on the lagged inputs. PCA decorrelates the inputs via an orthogonal transformation without imposing a frequency-localized basis. We will report the resulting spectral bias metrics for KANs trained on PCA-preprocessed inputs and compare them to both the raw and DCT-preprocessed cases. This will provide additional evidence that the bias reduction is driven by reduced input correlations. revision: yes
Circularity Check
No circularity; derivation rests on external theory plus independent experiments
full rationale
The paper references an existing external theory (KANs overcome spectral bias only under input independence) and then performs new theoretical analysis plus empirical tests with DCT preprocessing on autocorrelated time-series inputs. The key result—that autocorrelation reintroduces bias—is corroborated by the observed reduction in low-frequency preference after decorrelation, without any quantity being defined in terms of itself or any prediction being statistically forced by a prior fit on the same data. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results appear in the load-bearing steps. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption KANs overcome spectral bias when inputs are statistically independent
Reference graph
Works this paper leans on
-
[1]
Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models,
M. Praveen, S. Dekka, D. M. Sai, D. P. Chennamsetty, and D. P. Chinta, “Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models,”Computational Economics, vol. 67, no. 2, pp. 963–989, 2026
2026
-
[2]
Dwfh: An improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm),
K. Venkatachalam, P. Trojovsk `y, D. Pamucar, N. Bacanin, and V . Simic, “Dwfh: An improved data-driven deep weather forecasting hybrid model using transductive long short term memory (t-lstm),”Expert systems with applications, vol. 213, p. 119270, 2023
2023
-
[3]
Energy forecasting: A comprehensive review of techniques and technologies,
A. Mystakidis, P. Koukaras, N. Tsalikidis, D. Ioannidis, and C. Tjortjis, “Energy forecasting: A comprehensive review of techniques and technologies,” Energies, vol. 17, no. 7, p. 1662, 2024
2024
-
[4]
An adaptive composite time series forecasting model for short-term traffic flow,
Q. Shao, X. Piao, X. Yao, Y . Kong, Y . Hu, B. Yin, and Y . Zhang, “An adaptive composite time series forecasting model for short-term traffic flow,” Journal of Big Data, vol. 11, no. 1, p. 102, 2024
2024
-
[5]
A neural network approach to time series forecasting,
I. A. Gheyas and L. S. Smith, “A neural network approach to time series forecasting,” 2009. [Online]. Available: https://api.semanticscholar.org/CorpusID: 2266156
2009
-
[6]
Recurrent neural networks,
L. R. Medsker, L. Jainet al., “Recurrent neural networks,”Design and applications, vol. 5, no. 64-67, p. 2, 2001
2001
-
[7]
Long Short-Term memory,
S. Hochreiter and J. Schmidhuber, “Long Short-Term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 11 1997
1997
-
[8]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[9]
Informer: Beyond efficient transformer for long sequence Time-Series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence Time-Series forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11 106–11 115, 5 2021
2021
-
[10]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 3 2018. [Online]. Available: https://arxiv.org/abs/1803.01271
work page internal anchor Pith review arXiv 2018
-
[11]
Timesnet: Temporal 2d-variation modeling for general time series analysis,
H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,” inThe Eleventh International Conference on Learning Representations, 2023
2023
-
[12]
Time-llm: Time series forecasting by reprogramming large language models,
M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P.-Y . Chen, Y . Liang, Y .-F. Li, S. Panet al., “Time-llm: Time series forecasting by reprogramming large language models,” inThe Twelfth International Conference on Learning Representations, 2024
2024
-
[13]
Kan: Kolmogorov–arnold networks,
Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljacic, T. Y . Hou, and M. Tegmark, “Kan: Kolmogorov–arnold networks,” inThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[14]
On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition,
A. N. Kolmogorov, “On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition,”Doklady Akademii Nauk SSSR, vol. 114, no. 5, pp. 953–956, 1957
1957
-
[15]
On functions of three variables,
V . I. Arnold, “On functions of three variables,”Doklady Akademii Nauk SSSR, vol. 114, no. 4, pp. 679–681, 1957
1957
-
[16]
On the spectral bias of neural networks,
N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 5301–5310
2019
-
[17]
On the Activation Function Dependence of the Spectral Bias of Neural Networks,
Q. Hong, J. W. Siegel, Q. Tan, and J. Xu, “On the Activation Function Dependence of the Spectral Bias of Neural Networks,” 8 2022. [Online]. Available: https://arxiv.org/abs/2208.04924
-
[18]
On the expressiveness and spectral bias of kans,
Y . Wang, J. W. Siegel, Z. Liu, and T. Y . Hou, “On the expressiveness and spectral bias of kans,” inThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[19]
Reduced effectiveness of kolmogorov-arnold networks on functions with noise,
H. Shen, C. Zeng, J. Wang, and Q. Wang, “Reduced effectiveness of kolmogorov-arnold networks on functions with noise,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5. 18
2025
-
[20]
Kan versus mlp on irregular or noisy function,
C. Zeng, J. Wang, H. Shen, and Q. Wang, “Kan versus mlp on irregular or noisy function,” in2025 15th IEEE International Conference on Pattern Recognition Systems (ICPRS). IEEE, 2025, pp. 1–7
2025
-
[21]
Neural tangent kernel: Convergence and generalization in neural networks,
A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,”Advances in neural information processing systems, vol. 31, 2018
2018
-
[22]
On exact computation with an infinitely wide neural net,
S. Arora, S. S. Du, W. Hu, Z. Li, R. R. Salakhutdinov, and R. Wang, “On exact computation with an infinitely wide neural net,”Advances in neural information processing systems, vol. 32, 2019
2019
-
[23]
Time series analysis forecasting and control
D. J. Bartholomew, “Time series analysis forecasting and control.” 1971
1971
-
[24]
J. D. Cryer,Time series analysis. Duxbury Press Boston, 1986, vol. 286
1986
-
[25]
FREIE: Low-Frequency spectral bias in neural networks for Time-Series tasks,
J. Sun, X. Ling, J. Zou, J. Kang, and K. Zhang, “FREIE: Low-Frequency spectral bias in neural networks for Time-Series tasks,” 10 2025. [Online]. Available: https://arxiv.org/abs/2510.25800
-
[26]
Exploring spectral bias in time series long sequence forecasting,
K. N. Ackaah-Gyasi, S. Valdez, Y . Gao, and L. Zhang, “Exploring spectral bias in time series long sequence forecasting,” 2023
2023
-
[27]
Discrete cosine transform,
N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,”IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974
1974
-
[28]
Z. Li, “Kolmogorov-Arnold Networks are Radial Basis Function Networks,” 5 2024. [Online]. Available: https://arxiv.org/abs/2405.06721
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.