Autorelevance function and other feature relevance measures for univariate time series

Jamie Arjona; Julian Cardenas; Pedro Delicado

arxiv: 2607.01959 · v1 · pith:5CR4XVFSnew · submitted 2026-07-02 · 📊 stat.ML · cs.LG· stat.ME

Autorelevance function and other feature relevance measures for univariate time series

Julian Cardenas , Jamie Arjona , Pedro Delicado This is my paper

Pith reviewed 2026-07-03 06:08 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords autorelevance functionpartial autorelevancelag relevanceShapley valuestime series forecastingfeature importanceunivariate series

0 comments

The pith

Autorelevance functions recover expected lag structures in univariate time series forecasts via Shapley values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model-agnostic method to quantify how much each past lag contributes to the next forecast in a univariate time series. It adapts Shapley value calculations by defining lags as features and replacing any absent lag in a coalition with a one-step forecast produced by the same model. From this it defines an auto-relevance function that reports direct importance of each lag and a partial auto-relevance function that reports conditional importance. When applied to seasonal ARMA models on simulated series and to recurrent neural networks on real series, the functions recover the known lag patterns in almost every tested case. The approach is presented as especially natural for time series because the replacement rule respects the temporal ordering of the data.

Core claim

The central claim is that the autorelevance and partial autorelevance functions, computed from additive importance measures and Shapley values on ghost variables, successfully capture the lag structure of forecasting models for univariate time series when absent features in coalitions are replaced by one-step forecasts from the same model.

What carries the argument

The autorelevance function, which assigns an importance score to each lag by averaging its marginal contribution over all coalitions while replacing missing lags with the model's own one-step forecast.

If this is right

The calculated relevance measures recover the expected lag structure in almost all simulation and real-data cases.
The replacement method for absent features works across both linear ARMA-family models and nonlinear recurrent neural networks.
The auto-relevance function and its partial version can be applied directly to any univariate forecasting model without retraining.
The framework combines ghost variables, Shapley values, and additive importance measures into a single time-series-specific procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Forecasters could use the functions to prune irrelevant lags and obtain simpler yet still accurate models.
The same replacement rule might be tested on multivariate series to measure cross-series lag importance.
Comparing autorelevance scores between linear and nonlinear models on identical data could reveal where nonlinearity changes which lags matter.

Load-bearing premise

Replacing absent features in Shapley coalitions with a one-step forecast from the same model validly captures feature contributions without introducing bias in the relevance calculation for time series data.

What would settle it

A simulation on an ARMA process with known significant lags where the computed autorelevance function assigns near-zero importance to those lags.

read the original abstract

We propose a model agnostic methodology to measure lag relevance in machine learning forecasting models applied to univariate time series. Particularly, we are working in the context of time series using the frameworks of Ghost variables and Shapley values, together with additive importance measures, to introduce the auto-relevance and partial auto-relevance functions as the lag importance values. Additionally, we propose a novel method to replace absent features in coalition based methods with a one step forecast from the same model. We evaluate these proposals under different simulations and real data cases. This combined framework perspective is particularly suitable for time series. In addition, to show our discoveries we use a pull of models from the seasonal ARMA family and recurrent neural networks. We found that the calculated relevance measures successfully demonstrate the expected lag structure in almost all cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New lag relevance measures via adapted Shapley values with a forecast replacement trick, but the abstract supplies no numbers or checks on whether that trick biases the results.

read the letter

The paper's main contribution is the autorelevance and partial autorelevance functions, which extend Shapley values and ghost variables to score lag importance in univariate time series models. It also introduces replacing absent features in coalitions with a one-step forecast from the fitted model. That replacement is the concrete technical step they add.

The framing makes sense for time series, where standard feature importance does not map cleanly onto ordered lags. They test on seasonal ARMA and RNN models, which is a reasonable choice for the domain.

The evaluation is the clear weak point. The abstract claims the measures recover expected lag structure in almost all cases, yet it gives no tables, error rates, or even basic counts of success versus failure. Without those numbers it is impossible to tell whether the method works or whether the forecast substitution simply feeds the model's own predictions back into the relevance scores. The stress-test concern about bias is therefore live: if forecast error correlates with the target lags, the empirical success becomes non-diagnostic.

The citation pattern is straightforward and does not overclaim prior work. No machine-checked proofs or released code are mentioned.

This is aimed at practitioners who already fit forecasting models and want a post-hoc lag diagnostic. A reader already working on time-series interpretability could extract the replacement idea and try it, but would have to implement and validate the rest themselves.

It is worth sending to peer review so the authors can supply the missing quantitative results and test the replacement step against bias. The idea is narrow enough that a referee could give focused feedback without a large time investment.

Referee Report

2 major / 0 minor

Summary. The paper proposes a model-agnostic framework for measuring lag relevance in univariate time series forecasting models. It combines ghost variables, Shapley values, and additive importance measures to define autorelevance and partial autorelevance functions. A novel replacement rule substitutes absent features in Shapley coalitions with a one-step forecast generated by the same model. The method is tested on seasonal ARMA and RNN models across simulations and real data, with the central claim that the resulting relevance measures recover the expected lag structure in almost all cases.

Significance. If the central claim holds after validation, the autorelevance functions would supply a practical, model-agnostic tool for interpreting lag importance in black-box time-series forecasters, extending Shapley-based attribution to respect temporal dependence via ghost variables. The forecast-replacement heuristic could simplify coalition evaluation, but its validity is load-bearing for any downstream claims.

major comments (2)

[novel replacement method for absent features] The section describing the novel replacement of absent coalition features by a one-step forecast from the fitted model provides no derivation showing that the substitution preserves the marginal-contribution property of Shapley values, nor any test that forecast error is uncorrelated with the target lags. This substitution is load-bearing for the empirical success claim, because systematic bias in the replacement would artifactually reinforce the very lag structure the method is asserted to recover.
[evaluation description and abstract claim] No quantitative results, error bars, exclusion criteria, or simulation details are supplied to support the statement that the relevance measures 'successfully demonstrate the expected lag structure in almost all cases.' Without these, the soundness of the central empirical claim cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major comments point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [novel replacement method for absent features] The section describing the novel replacement of absent coalition features by a one-step forecast from the fitted model provides no derivation showing that the substitution preserves the marginal-contribution property of Shapley values, nor any test that forecast error is uncorrelated with the target lags. This substitution is load-bearing for the empirical success claim, because systematic bias in the replacement would artifactually reinforce the very lag structure the method is asserted to recover.

Authors: We agree with the referee that the manuscript does not include a formal derivation demonstrating that the one-step forecast replacement exactly preserves the marginal contribution property of Shapley values. This replacement is presented as a practical, model-consistent heuristic for handling absent features in time series contexts, where traditional replacements (e.g., feature means) would break temporal dependencies. We will revise the paper to include a clearer explanation of the heuristic's motivation and its potential limitations, including the risk of bias if forecast errors correlate with lags. We will also add empirical checks for such correlations in the simulation studies. This addresses the concern without claiming exact preservation. revision: partial
Referee: [evaluation description and abstract claim] No quantitative results, error bars, exclusion criteria, or simulation details are supplied to support the statement that the relevance measures 'successfully demonstrate the expected lag structure in almost all cases.' Without these, the soundness of the central empirical claim cannot be evaluated.

Authors: The referee is correct that the abstract and summary lack quantitative details supporting the claim. While the full manuscript includes simulation setups and results, we acknowledge that more precise reporting is needed to substantiate 'almost all cases.' In the revised version, we will expand the evaluation section with quantitative metrics (e.g., recovery rates across simulations), include error bars from repeated runs, specify any exclusion criteria, and update the abstract accordingly to provide a more rigorous presentation of the empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external frameworks with independent evaluation

full rationale

The paper defines auto-relevance and partial auto-relevance via Shapley values and ghost variables applied to time series lags, then proposes a one-step forecast replacement for absent coalition features as a novel but non-tautological ansatz. Evaluation proceeds by fitting ARMA/RNN models on simulations with known lag structures and real data, then checking recovery of those structures; this is an empirical test rather than a reduction of the output to the input by construction. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided description. The method is self-contained against external benchmarks (Shapley axioms, simulation ground truth) and does not reduce the claimed success to a definitional identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; assessment requires full text.

pith-pipeline@v0.9.1-grok · 5667 in / 1146 out tokens · 32821 ms · 2026-07-03T06:08:19.668883+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 7 canonical work pages

[1]

Molnar, C.: Interpretable Machine Learning, 3rd edn. (2025). https://christophm.github.io/interpretable-ml-book

2025
[2]

Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/

Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/

2021
[3]

Machine Learning45(1), 5–32 (2001) https://doi

Breiman, L.: Random forests. Machine Learning45(1), 5–32 (2001) https://doi. org/10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001
[4]

Journal of the American Statistical Association 113(523), 1094–1111 (2018) https://doi.org/10.1080/01621459.2017.1307116

Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R.J., Wasserman, L.: Distribution-free predictive inference for regression. Journal of the American Statistical Association 113(523), 1094–1111 (2018) https://doi.org/10.1080/01621459.2017.1307116

work page doi:10.1080/01621459.2017.1307116 2018
[5]

TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x

Delicado, P., Pe˜ na, D.: Understanding complex predictive models with ghost vari- ables. TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x

work page doi:10.1007/s11749-022-00826-x 2023
[6]

In: Kuhn, H.W., Tucker, A.W

Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games II. Annals of Mathematics Studies, vol. 28, pp. 307–317. Princeton University Press, Princeton, New Jersey (1953)

1953
[7]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predic- tions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 4768–4777. Curran Associates Inc., California (2017)

2017
[8]

Advances in neural information processing systems33, 17212–17223 (2020) https://doi.org/10.5555/3495724.3497168

Covert, I., Lundberg, S., Lee, S.-I.: Understanding global feature contributions with additive importance measures. Advances in neural information processing systems33, 17212–17223 (2020) https://doi.org/10.5555/3495724.3497168

work page doi:10.5555/3495724.3497168 2020
[9]

In: Chiappa, S., Calandra, R

Janzing, D., Minorics, L., Bloebaum, P.: Feature relevance quantification in explainable ai: A causal problem. In: Chiappa, S., Calandra, R. (eds.) Proceed- ings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 2907–2916. PMLR, Online (2020).https://proceedings....

2020
[10]

Proceedings of machine learning research119, 10282–10291 (2020) https://doi.org/10.5555/3524938.3525890

Williamson, B., Feng, J.: Efficient nonparametric statistical inference on popu- lation feature importance using Shapley values. Proceedings of machine learning research119, 10282–10291 (2020) https://doi.org/10.5555/3524938.3525890

work page doi:10.5555/3524938.3525890 2020
[11]

In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing

Bento, J.a., Saleiro, P., Cruz, A.F., Figueiredo, M.A.T., Bizarro, P.: Timeshap: Explaining recurrent models through sequence perturbations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing. KDD ’21, pp. 2565–2573. Association for Computing Machinery, Singapore (2021) 20

2021
[12]

Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10

Nayebi, A., Tipirneni, S., Reddy, C.K., Foreman, B., Subbian, V.: Windowshap: An efficient framework for explaining time-series classifiers based on shapley val- ues. Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10. 1016/j.jbi.2023.104438

work page arXiv 2023
[13]

https://arxiv.org/abs/2303

Raykar, V.C., Jati, A., Mukherjee, S., Aggarwal, N., Sarpatwar, K., Gana- pavarapu, G., Vaculin, R.: TsSHAP: Robust model agnostic feature-based explainability for time series forecasting (2023). https://arxiv.org/abs/2303. 12316

2023
[14]

CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176

Villani, M.J., Lockhart, J., Magazzeni, D.: Feature importance for time series data: Improving kernelSHAP. CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176

work page doi:10.48550/arxiv.2210.02176 2022
[15]

Production33, 20220035 (2023)

Arboleda-Florez, M., Castro-Zuluaga, C.: Interpreting direct sales’ demand forecasts using shap values. Production33, 20220035 (2023)

2023
[16]

[Online; accessed 04-03-2026] (2017)

Smith, T.G., et al.: pmdarima: ARIMA estimators for Python. [Online; accessed 04-03-2026] (2017). http://www.alkaline-ml.com/pmdarima 21

2026

[1] [1]

Molnar, C.: Interpretable Machine Learning, 3rd edn. (2025). https://christophm.github.io/interpretable-ml-book

2025

[2] [2]

Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/

Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hal- l/CRC, New York (2021).https://pbiecek.github.io/ema/

2021

[3] [3]

Machine Learning45(1), 5–32 (2001) https://doi

Breiman, L.: Random forests. Machine Learning45(1), 5–32 (2001) https://doi. org/10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001

[4] [4]

Journal of the American Statistical Association 113(523), 1094–1111 (2018) https://doi.org/10.1080/01621459.2017.1307116

Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R.J., Wasserman, L.: Distribution-free predictive inference for regression. Journal of the American Statistical Association 113(523), 1094–1111 (2018) https://doi.org/10.1080/01621459.2017.1307116

work page doi:10.1080/01621459.2017.1307116 2018

[5] [5]

TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x

Delicado, P., Pe˜ na, D.: Understanding complex predictive models with ghost vari- ables. TEST32(1), 107–145 (2023) https://doi.org/10.1007/s11749-022-00826-x

work page doi:10.1007/s11749-022-00826-x 2023

[6] [6]

In: Kuhn, H.W., Tucker, A.W

Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games II. Annals of Mathematics Studies, vol. 28, pp. 307–317. Princeton University Press, Princeton, New Jersey (1953)

1953

[7] [7]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predic- tions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 4768–4777. Curran Associates Inc., California (2017)

2017

[8] [8]

Advances in neural information processing systems33, 17212–17223 (2020) https://doi.org/10.5555/3495724.3497168

Covert, I., Lundberg, S., Lee, S.-I.: Understanding global feature contributions with additive importance measures. Advances in neural information processing systems33, 17212–17223 (2020) https://doi.org/10.5555/3495724.3497168

work page doi:10.5555/3495724.3497168 2020

[9] [9]

In: Chiappa, S., Calandra, R

Janzing, D., Minorics, L., Bloebaum, P.: Feature relevance quantification in explainable ai: A causal problem. In: Chiappa, S., Calandra, R. (eds.) Proceed- ings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 2907–2916. PMLR, Online (2020).https://proceedings....

2020

[10] [10]

Proceedings of machine learning research119, 10282–10291 (2020) https://doi.org/10.5555/3524938.3525890

Williamson, B., Feng, J.: Efficient nonparametric statistical inference on popu- lation feature importance using Shapley values. Proceedings of machine learning research119, 10282–10291 (2020) https://doi.org/10.5555/3524938.3525890

work page doi:10.5555/3524938.3525890 2020

[11] [11]

In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing

Bento, J.a., Saleiro, P., Cruz, A.F., Figueiredo, M.A.T., Bizarro, P.: Timeshap: Explaining recurrent models through sequence perturbations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Min- ing. KDD ’21, pp. 2565–2573. Association for Computing Machinery, Singapore (2021) 20

2021

[12] [12]

Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10

Nayebi, A., Tipirneni, S., Reddy, C.K., Foreman, B., Subbian, V.: Windowshap: An efficient framework for explaining time-series classifiers based on shapley val- ues. Journal of Biomedical Informatics144, 104438 (2023) https://doi.org/10. 1016/j.jbi.2023.104438

work page arXiv 2023

[13] [13]

https://arxiv.org/abs/2303

Raykar, V.C., Jati, A., Mukherjee, S., Aggarwal, N., Sarpatwar, K., Gana- pavarapu, G., Vaculin, R.: TsSHAP: Robust model agnostic feature-based explainability for time series forecasting (2023). https://arxiv.org/abs/2303. 12316

2023

[14] [14]

CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176

Villani, M.J., Lockhart, J., Magazzeni, D.: Feature importance for time series data: Improving kernelSHAP. CoRRabs/2210.02176(2022) https://doi.org/ 10.48550/arxiv.2210.02176

work page doi:10.48550/arxiv.2210.02176 2022

[15] [15]

Production33, 20220035 (2023)

Arboleda-Florez, M., Castro-Zuluaga, C.: Interpreting direct sales’ demand forecasts using shap values. Production33, 20220035 (2023)

2023

[16] [16]

[Online; accessed 04-03-2026] (2017)

Smith, T.G., et al.: pmdarima: ARIMA estimators for Python. [Online; accessed 04-03-2026] (2017). http://www.alkaline-ml.com/pmdarima 21

2026