Autorelevance function and other feature relevance measures for univariate time series
Pith reviewed 2026-07-03 06:08 UTC · model grok-4.3
The pith
Autorelevance functions recover expected lag structures in univariate time series forecasts via Shapley values.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the autorelevance and partial autorelevance functions, computed from additive importance measures and Shapley values on ghost variables, successfully capture the lag structure of forecasting models for univariate time series when absent features in coalitions are replaced by one-step forecasts from the same model.
What carries the argument
The autorelevance function, which assigns an importance score to each lag by averaging its marginal contribution over all coalitions while replacing missing lags with the model's own one-step forecast.
If this is right
- The calculated relevance measures recover the expected lag structure in almost all simulation and real-data cases.
- The replacement method for absent features works across both linear ARMA-family models and nonlinear recurrent neural networks.
- The auto-relevance function and its partial version can be applied directly to any univariate forecasting model without retraining.
- The framework combines ghost variables, Shapley values, and additive importance measures into a single time-series-specific procedure.
Where Pith is reading between the lines
- Forecasters could use the functions to prune irrelevant lags and obtain simpler yet still accurate models.
- The same replacement rule might be tested on multivariate series to measure cross-series lag importance.
- Comparing autorelevance scores between linear and nonlinear models on identical data could reveal where nonlinearity changes which lags matter.
Load-bearing premise
Replacing absent features in Shapley coalitions with a one-step forecast from the same model validly captures feature contributions without introducing bias in the relevance calculation for time series data.
What would settle it
A simulation on an ARMA process with known significant lags where the computed autorelevance function assigns near-zero importance to those lags.
read the original abstract
We propose a model agnostic methodology to measure lag relevance in machine learning forecasting models applied to univariate time series. Particularly, we are working in the context of time series using the frameworks of Ghost variables and Shapley values, together with additive importance measures, to introduce the auto-relevance and partial auto-relevance functions as the lag importance values. Additionally, we propose a novel method to replace absent features in coalition based methods with a one step forecast from the same model. We evaluate these proposals under different simulations and real data cases. This combined framework perspective is particularly suitable for time series. In addition, to show our discoveries we use a pull of models from the seasonal ARMA family and recurrent neural networks. We found that the calculated relevance measures successfully demonstrate the expected lag structure in almost all cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a model-agnostic framework for measuring lag relevance in univariate time series forecasting models. It combines ghost variables, Shapley values, and additive importance measures to define autorelevance and partial autorelevance functions. A novel replacement rule substitutes absent features in Shapley coalitions with a one-step forecast generated by the same model. The method is tested on seasonal ARMA and RNN models across simulations and real data, with the central claim that the resulting relevance measures recover the expected lag structure in almost all cases.
Significance. If the central claim holds after validation, the autorelevance functions would supply a practical, model-agnostic tool for interpreting lag importance in black-box time-series forecasters, extending Shapley-based attribution to respect temporal dependence via ghost variables. The forecast-replacement heuristic could simplify coalition evaluation, but its validity is load-bearing for any downstream claims.
major comments (2)
- [novel replacement method for absent features] The section describing the novel replacement of absent coalition features by a one-step forecast from the fitted model provides no derivation showing that the substitution preserves the marginal-contribution property of Shapley values, nor any test that forecast error is uncorrelated with the target lags. This substitution is load-bearing for the empirical success claim, because systematic bias in the replacement would artifactually reinforce the very lag structure the method is asserted to recover.
- [evaluation description and abstract claim] No quantitative results, error bars, exclusion criteria, or simulation details are supplied to support the statement that the relevance measures 'successfully demonstrate the expected lag structure in almost all cases.' Without these, the soundness of the central empirical claim cannot be evaluated.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major comments point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [novel replacement method for absent features] The section describing the novel replacement of absent coalition features by a one-step forecast from the fitted model provides no derivation showing that the substitution preserves the marginal-contribution property of Shapley values, nor any test that forecast error is uncorrelated with the target lags. This substitution is load-bearing for the empirical success claim, because systematic bias in the replacement would artifactually reinforce the very lag structure the method is asserted to recover.
Authors: We agree with the referee that the manuscript does not include a formal derivation demonstrating that the one-step forecast replacement exactly preserves the marginal contribution property of Shapley values. This replacement is presented as a practical, model-consistent heuristic for handling absent features in time series contexts, where traditional replacements (e.g., feature means) would break temporal dependencies. We will revise the paper to include a clearer explanation of the heuristic's motivation and its potential limitations, including the risk of bias if forecast errors correlate with lags. We will also add empirical checks for such correlations in the simulation studies. This addresses the concern without claiming exact preservation. revision: partial
-
Referee: [evaluation description and abstract claim] No quantitative results, error bars, exclusion criteria, or simulation details are supplied to support the statement that the relevance measures 'successfully demonstrate the expected lag structure in almost all cases.' Without these, the soundness of the central empirical claim cannot be evaluated.
Authors: The referee is correct that the abstract and summary lack quantitative details supporting the claim. While the full manuscript includes simulation setups and results, we acknowledge that more precise reporting is needed to substantiate 'almost all cases.' In the revised version, we will expand the evaluation section with quantitative metrics (e.g., recovery rates across simulations), include error bars from repeated runs, specify any exclusion criteria, and update the abstract accordingly to provide a more rigorous presentation of the empirical findings. revision: yes
Circularity Check
No significant circularity; derivation relies on external frameworks with independent evaluation
full rationale
The paper defines auto-relevance and partial auto-relevance via Shapley values and ghost variables applied to time series lags, then proposes a one-step forecast replacement for absent coalition features as a novel but non-tautological ansatz. Evaluation proceeds by fitting ARMA/RNN models on simulations with known lag structures and real data, then checking recovery of those structures; this is an empirical test rather than a reduction of the output to the input by construction. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided description. The method is self-contained against external benchmarks (Shapley axioms, simulation ground truth) and does not reduce the claimed success to a definitional identity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Molnar, C.: Interpretable Machine Learning, 3rd edn. (2025). https://christophm.github.io/interpretable-ml-book
2025
-
[2]
Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/
Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/
2021
-
[3]
Machine Learning45(1), 5–32 (2001) https://doi
Breiman, L.: Random forests. Machine Learning45(1), 5–32 (2001) https://doi. org/10.1023/A:1010933404324
-
[4]
Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R.J., Wasserman, L.: Distribution-free predictive inference for regression. Journal of the American Statistical Association 113(523), 1094–1111 (2018) https://doi.org/10.1080/01621459.2017.1307116
-
[5]
TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x
Delicado, P., Pe˜ na, D.: Understanding complex predictive models with ghost vari- ables. TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x
-
[6]
In: Kuhn, H.W., Tucker, A.W
Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games II. Annals of Mathematics Studies, vol. 28, pp. 307–317. Princeton University Press, Princeton, New Jersey (1953)
1953
-
[7]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predic- tions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 4768–4777. Curran Associates Inc., California (2017)
2017
-
[8]
Covert, I., Lundberg, S., Lee, S.-I.: Understanding global feature contributions with additive importance measures. Advances in neural information processing systems33, 17212–17223 (2020) https://doi.org/10.5555/3495724.3497168
-
[9]
In: Chiappa, S., Calandra, R
Janzing, D., Minorics, L., Bloebaum, P.: Feature relevance quantification in explainable ai: A causal problem. In: Chiappa, S., Calandra, R. (eds.) Proceed- ings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 2907–2916. PMLR, Online (2020).https://proceedings....
2020
-
[10]
Williamson, B., Feng, J.: Efficient nonparametric statistical inference on popu- lation feature importance using Shapley values. Proceedings of machine learning research119, 10282–10291 (2020) https://doi.org/10.5555/3524938.3525890
-
[11]
In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing
Bento, J.a., Saleiro, P., Cruz, A.F., Figueiredo, M.A.T., Bizarro, P.: Timeshap: Explaining recurrent models through sequence perturbations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing. KDD ’21, pp. 2565–2573. Association for Computing Machinery, Singapore (2021) 20
2021
-
[12]
Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10
Nayebi, A., Tipirneni, S., Reddy, C.K., Foreman, B., Subbian, V.: Windowshap: An efficient framework for explaining time-series classifiers based on shapley val- ues. Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10. 1016/j.jbi.2023.104438
-
[13]
https://arxiv.org/abs/2303
Raykar, V.C., Jati, A., Mukherjee, S., Aggarwal, N., Sarpatwar, K., Gana- pavarapu, G., Vaculin, R.: TsSHAP: Robust model agnostic feature-based explainability for time series forecasting (2023). https://arxiv.org/abs/2303. 12316
2023
-
[14]
CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176
Villani, M.J., Lockhart, J., Magazzeni, D.: Feature importance for time series data: Improving kernelSHAP. CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176
-
[15]
Production33, 20220035 (2023)
Arboleda-Florez, M., Castro-Zuluaga, C.: Interpreting direct sales’ demand forecasts using shap values. Production33, 20220035 (2023)
2023
-
[16]
[Online; accessed 04-03-2026] (2017)
Smith, T.G., et al.: pmdarima: ARIMA estimators for Python. [Online; accessed 04-03-2026] (2017). http://www.alkaline-ml.com/pmdarima 21
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.