arxiv: 2604.17616 · v1 · submitted 2026-04-19 · 💻 cs.LG

Recognition: unknown

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

Cedric Schockaert, Didier Stricker, Jason Rambach, Karan Patil, Shashank Mishra

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords root cause analysistime-series anomaly detectionconditional attributionvariational autoencoderUMAP embeddingsexplainable AISWaT benchmarkMSDS benchmark

0 comments

The pith

Explaining anomalies by retrieving similar normal states in learned latent spaces yields more reliable root cause attributions for time-series data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a conditional attribution method that accounts for anomalies in time series by retrieving representative normal instances matched to the specific anomalous observation, rather than relying on random or marginal baselines. Retrieval happens in low-dimensional spaces from variational autoencoders and UMAP to scale to high-dimensional data while keeping temporal and cross-feature relations intact. This produces explanations that stay faithful to the system's actual behavior and avoid out-of-distribution artifacts. A reader would care because existing perturbation-based methods often break dependencies and generate attributions that do not correspond to how the real system operates, which can lead to incorrect diagnoses in monitoring applications.

Core claim

We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution.

What carries the argument

Conditional attribution via retrieval of normal instances in VAE latent space and UMAP manifold embeddings, which supplies context-specific baselines that preserve temporal and cross-feature dependencies.

If this is right

Root-cause identification accuracy rises consistently across multiple anomaly detection models on the SWaT and MSDS benchmarks.
Temporal localization of anomalies improves because the baselines respect the original sequence dependencies.
Explanations become more robust to changes in the underlying anomaly detector.
Computational efficiency is maintained by operating in low-dimensional embeddings rather than the full high-dimensional space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval idea could be tested on other high-dimensional dependent data such as multivariate sensor streams from manufacturing or network traffic.
If the manifold retrieval step is replaced by a different embedding technique, the fidelity of attributions might degrade, offering a way to isolate the contribution of VAE and UMAP.
The confidence-aware metrics introduced here could be adopted as standard evaluation tools for any attribution method on time-series data.

Load-bearing premise

That retrieval of normal instances in the learned VAE latent space and UMAP manifold embeddings preserves temporal and cross-feature dependencies and yields operationally meaningful explanations without introducing out-of-distribution artifacts.

What would settle it

A direct comparison on the SWaT benchmark showing that the conditional retrieval method produces lower root-cause identification accuracy or worse temporal localization than standard random-baseline attribution methods across the tested anomaly detectors.

Figures

Figures reproduced from arXiv: 2604.17616 by Cedric Schockaert, Didier Stricker, Jason Rambach, Karan Patil, Shashank Mishra.

**Figure 2.** Figure 2: SWaT Manifold Topology (d ∈ 4, 8, 16, 32). Normal (green) and anomalous (red) regions overlap significantly, indicating anomalies can be locally indistinguishable from normal operating modes. This highlights the need for a conditional framework. 4.1 Representation-Guided Neighborhood Construction Direct neighborhood construction in the input space becomes unreliable in highdimensional multivariate time-… view at source ↗

**Figure 3.** Figure 3: Root cause feature heatmaps for representative Paul Wurth samples. The left panel shows a real test sample with feature26 identified as the root cause, while the right panel shows a synthetic anomaly with perturbations injected in feature2 and feature3 after a given time point. 8.3 Operational Requirements at Paul Wurth For Paul Wurth, anomaly detection is only actionable if it identifies the specific cau… view at source ↗

**Figure 1.** Figure 1: Comparison of retrieval performance across spaces. Representation-space [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗

**Figure 2.** Figure 2: Unconditional versus conditional retrieval. Conditional retrieval produces cleaner and more localized attributions by focusing on more relevant reference patterns. B.6 Unconditional vs Conditional Retrieval We study the impact of retrieval context on anomaly attribution. Unconditional retrieval constructs the reference set using K randomly sampled normal windows (global baseline), ignoring the current op… view at source ↗

**Figure 3.** Figure 3: Examples of synthetic anomaly types used in our experiments: spike, [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Spike anomaly on Feature 58. The figure shows a representative synthetic spike anomaly, where Feature 58 exhibits large abnormal values at randomly selected time steps, simulating a sudden transient disturbance or brief sensor surge [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Shift anomaly on Feature 26. Feature 26 is affected by a synthetic shift anomaly, where a constant offset is added across the window, altering the baseline level of the signal [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Noise anomaly on Feature 22. Feature 22 is perturbed by a mild synthetic noise anomaly, and the corresponding attribution pattern shows that the anomalous signal is still correctly detected despite the relatively small disturbance [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Drift anomaly on Feature 45. Feature 45 exhibits a gradual temporal change corresponding to a synthetic drift anomaly, which is correctly identified by the attribution method [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: Saturation anomaly on Feature 3. Feature 3 exhibits a synthetic saturation anomaly, in which the signal is clipped at a limit, and the anomaly is correctly identified [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Signal dropout anomaly on Feature 11. Feature 11 is affected by a synthetic signal dropout anomaly, where the signal abruptly drops to zero over the affected interval, and the attribution pattern correctly highlights the anomalous feature [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

read the original abstract

Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Conditional retrieval from VAE and UMAP spaces for attributions is a sensible practical idea, but the reported gains need full methods and stats to judge their size.

read the letter

The core contribution is a retrieval step that pulls normal instances from the learned VAE latent space and UMAP manifold, conditioned on the anomalous point, rather than using marginal or random baselines for attribution. This is meant to keep temporal and cross-feature structure intact and avoid out-of-distribution artifacts when explaining time-series anomalies. They add confidence-aware and temporal metrics on top of standard root-cause accuracy and localization measures. Experiments are run on the usual SWaT and MSDS benchmarks across several detectors and claim consistent improvements in identification accuracy, localization, and robustness. Code release is promised, which helps.

Referee Report

3 major / 2 minor

Summary. The paper proposes a conditional attribution framework for root cause analysis (RCA) in time-series anomaly detection. Instead of marginal or random baselines, it retrieves contextually similar normal instances conditioned on the anomalous observation, performed in VAE latent spaces and UMAP manifold embeddings to preserve temporal and cross-feature dependencies while avoiding OOD artifacts. New confidence-aware and temporal metrics are introduced for evaluating explanation reliability. Experiments on the SWaT and MSDS benchmarks report consistent gains in root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models.

Significance. If the empirical results hold, the work provides a practical improvement over perturbation-based RCA methods by grounding explanations in the system's learned manifold, which is particularly relevant for industrial time-series systems. The public code release commitment supports reproducibility, a clear strength.

major comments (3)

[§3.2] §3.2 (Conditional Retrieval): the claim that nearest-neighbor retrieval in VAE/UMAP space preserves temporal and cross-feature dependencies without introducing OOD artifacts is central to the method but lacks a quantitative validation (e.g., distribution shift metrics between retrieved and original instances); this directly affects whether the explanations are operationally meaningful.
[§5.1] §5.1 (Results on SWaT/MSDS): the reported consistent improvements lack error bars, statistical significance tests, or ablation tables isolating the contribution of conditional retrieval versus standard VAE/UMAP training; without these, the robustness claim across detectors cannot be fully assessed.
[§4.3] §4.3 (Evaluation Metrics): the definitions of the new confidence-aware and temporal metrics are introduced but not formalized with equations or compared to existing RCA metrics (e.g., precision@K or temporal IoU); this weakens the ability to interpret the reported gains.

minor comments (2)

[Abstract] Abstract: the phrase 'consistently improves' would benefit from a brief quantitative summary of the gains (e.g., average percentage improvement) to better convey the strength of the empirical results.
[§3] Notation: the distinction between VAE latent codes and UMAP embeddings in the retrieval step could be clarified with a single diagram or pseudocode to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our work. We address each major comment point by point below, indicating the revisions we will make to the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Conditional Retrieval): the claim that nearest-neighbor retrieval in VAE/UMAP space preserves temporal and cross-feature dependencies without introducing OOD artifacts is central to the method but lacks a quantitative validation (e.g., distribution shift metrics between retrieved and original instances); this directly affects whether the explanations are operationally meaningful.

Authors: We agree that explicit quantitative validation would strengthen the central claim. In the revised manuscript we will add distribution-shift metrics (MMD and Wasserstein distance) computed between the retrieved normal instances and the original normal data in both VAE latent space and UMAP embedding space. We will also include qualitative visualizations of retrieved time-series segments to illustrate preservation of temporal structure and cross-feature correlations. These additions will appear in an expanded §3.2 and in the experimental analysis. revision: yes
Referee: [§5.1] §5.1 (Results on SWaT/MSDS): the reported consistent improvements lack error bars, statistical significance tests, or ablation tables isolating the contribution of conditional retrieval versus standard VAE/UMAP training; without these, the robustness claim across detectors cannot be fully assessed.

Authors: We acknowledge the absence of these statistical elements in the current version. The revised manuscript will report mean performance with standard-deviation error bars over five independent runs, include paired t-test p-values for all reported gains, and add an ablation table that isolates the effect of conditional retrieval against random sampling, marginal baselines, and non-conditional VAE/UMAP embeddings. These results will be placed in §5.1 together with the existing tables. revision: yes
Referee: [§4.3] §4.3 (Evaluation Metrics): the definitions of the new confidence-aware and temporal metrics are introduced but not formalized with equations or compared to existing RCA metrics (e.g., precision@K or temporal IoU); this weakens the ability to interpret the reported gains.

Authors: We thank the referee for highlighting this presentational gap. Section 4.3 will be expanded with formal mathematical definitions of both the confidence-aware and temporal metrics. We will also add a short comparative discussion (and a small table) relating our metrics to precision@K and temporal IoU, clarifying the additional information each provides for time-series RCA. These changes will be made without altering the experimental numbers. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims independent of inputs

full rationale

The paper introduces a conditional attribution method that retrieves normal instances via VAE latent codes and UMAP embeddings rather than marginal baselines, then evaluates root-cause accuracy, temporal localization, and robustness on the external SWaT and MSDS benchmarks across multiple detectors. No derivation chain, equation, or self-citation is shown to reduce the reported improvements to a quantity defined by the same fitted parameters or data used for evaluation. The central result is presented as an empirical outcome of the design choice, not a tautological renaming or self-referential prediction. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that VAE and UMAP embeddings faithfully capture the dependencies needed for meaningful retrieval. No new physical entities are introduced. Hyperparameters of the embedding models are free parameters whose specific values are not reported in the abstract.

free parameters (2)

VAE latent dimension
Controls the compression level of the time-series representation; value not stated in abstract.
UMAP hyperparameters (n_neighbors, min_dist)
Determine the manifold structure used for retrieval; values not stated in abstract.

axioms (1)

domain assumption Learned low-dimensional representations preserve the temporal and cross-feature dependencies present in the original high-dimensional time-series.
Invoked when claiming that retrieval in latent/UMAP space avoids out-of-distribution artifacts and produces dependency-preserving explanations.

pith-pipeline@v0.9.0 · 5509 in / 1424 out tokens · 49480 ms · 2026-05-10T06:12:40.915453+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Cyber-Physical Systems Security: A Comprehensive Review of Anomaly Detection Techniques

Abshari, D., Sridhar, M.: A survey of anomaly detection in cyber-physical systems. arXiv preprint arXiv:2502.13256 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

work page internal anchor Pith review arXiv 2018
[3]

Covert, I., Lundberg, S., Lee, S.I.: Understanding global feature contributions with additive importance measures (2020), https://arxiv.org/abs/2004.00668

work page arXiv 2020
[4]

Future Generation Computer Systems p

De La Peña, M.F., Gómez, Á.L.P., Maimó, L.F.: Shats: A shapley-based explain- ability method for time series artificial intelligence models. Future Generation Computer Systems p. 108178 (2025)

2025
[5]

In: The Thirteenth Interna- tional Conference on Learning Representations (2025)

Han, X., Absar, S., Zhang, L., Yuan, S.: Root cause analysis of anomalies in mul- tivariate time series through granger causal discovery. In: The Thirteenth Interna- tional Conference on Learning Representations (2025)

2025
[6]

Nature Reviews Methods Primers4(1), 82 (2024)

Healy, J., McInnes, L.: Uniform manifold approximation and projection. Nature Reviews Methods Primers4(1), 82 (2024)

2024
[7]

science313(5786), 504–507 (2006)

Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neu- ral networks. science313(5786), 504–507 (2006)

2006
[8]

Advances in Neural Information Processing Systems35, 31158–31170 (2022)

Ikram, A., Chakraborty, S., Mitra, S., Saini, S., Bagchi, S., Kocaoglu, M.: Root cause analysis of failures in microservices through causal discovery. Advances in Neural Information Processing Systems35, 31158–31170 (2022)

2022
[9]

arXiv preprint arXiv:2010.05073 (2020)

Jacob, V., Song, F., Stiegler, A., Rad, B., Diao, Y., Tatbul, N.: Exathlon: A benchmark for explainable anomaly detection over time series. arXiv preprint arXiv:2010.05073 (2020)

work page arXiv 2010
[10]

Jullum, M., Redelmeier, A., Aas, K.: groupshapley: Efficient prediction explanation with shapley values for feature groups (2021), https://arxiv.org/abs/2106.12228

work page arXiv 2021
[11]

Auto-Encoding Variational Bayes

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[12]

In: 2025 International Conference on Knowledge Engineering and Communication Systems (ICKECS)

Kumar, P., Pandi, S.S., Kumar, L.B., Karthick, R.: Anomaly detection in indus- trial control systems using machine learning. In: 2025 International Conference on Knowledge Engineering and Communication Systems (ICKECS). pp. 1–6. IEEE (2025) 16 S. Mishra et al

2025
[13]

In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining

Li, M., Li, Z., Yin, K., Nie, X., Zhang, W., Sui, K., Pei, D.: Causal inference-based root cause analysis for online service systems with intervention recognition. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. pp. 3230–3240 (2022)

2022
[14]

In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining

Li, Z., Zhao, Y., Han, J., Su, Y., Jiao, R., Wen, X., Pei, D.: Multivariate time series anomaly detection and interpretation using hierarchical inter-metric and temporal embedding. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp. 3220–3230 (2021)

2021
[16]

Advances in neural information processing systems30(2017)

Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. Advances in neural information processing systems30(2017)

2017
[17]

In: European conference on service-oriented and cloud computing

Nedelkoski, S., Bogatinovski, J., Mandapati, A.K., Becker, S., Cardoso, J., Kao, O.: Multi-source distributed system data for ai-powered analytics. In: European conference on service-oriented and cloud computing. pp. 161–176. Springer (2020)

2020
[18]

why should i trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD interna- tional conference on knowledge discovery and data mining. pp. 1135–1144 (2016)

2016
[19]

In: The World Wide Web Conference

Shan, H., Chen, Y., Liu, H., Zhang, Y., Xiao, X., He, X., Li, M., Ding, W.: ?- diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms. In: The World Wide Web Conference. pp. 3215–3222 (2019)

2019
[20]

In: Proceed- ings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining

Su, Y., Zhao, Y., Niu, C., Liu, R., Sun, W., Pei, D.: Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceed- ings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp. 2828–2837 (2019)

2019
[21]

In: International conference on machine learning

Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International conference on machine learning. pp. 3319–3328. PMLR (2017)

2017
[22]

arXiv preprint arXiv:2006.07985 (2020)

Vlassopoulos, G., van Erven, T., Brighton, H., Menkovski, V.: Explaining predictions by approximating the local decision boundary. arXiv preprint arXiv:2006.07985 (2020)

work page arXiv 2006
[23]

In: 2018 18th IEEE/ACM In- ternational Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Wang, P., Xu, J., Ma, M., Lin, W., Pan, D., Wang, Y., Chen, P.: Cloudranger: Root cause identification for cloud native systems. In: 2018 18th IEEE/ACM In- ternational Symposium on Cluster, Cloud and Grid Computing (CCGRID). pp. 492–502. IEEE (2018)

2018
[24]

In: IEEE/IFIP Network Operations and Management Symposium (NOMS) (2020)

Wu, L., Tordsson, J., Elmroth, E., Kao, O.: Microrca: Root cause localization of performance issues in microservices. In: IEEE/IFIP Network Operations and Management Symposium (NOMS) (2020)

2020
[25]

Ieee Access8, 88348–88359 (2020)

Xie, X., Wang, B., Wan, T., Tang, W.: Multivariate abnormal detection for indus- trial control systems using 1d cnn and gru. Ieee Access8, 88348–88359 (2020)

2020
[26]

Anomaly transformer: Time series anomaly detection with association discrepancy,

Xu, J., Wu, H., Wang, J., Long, M.: Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642 (2021)

work page arXiv 2021
[27]

In: International Conference on Learning Representations (2022)

Xu, J., Wu, H., Wang, J., Long, M., Wang, J.: Anomaly transformer: Time series anomaly detection with association discrepancy. In: International Conference on Learning Representations (2022)

2022
[28]

Zhang, H., Diao, Y., Meliou, A.: Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT) (2017) Appendix: Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection A Additional Model Details A.1 Model Details VAE Architecture and Hyperpar...

2017