arxiv: 2602.04120 · v2 · submitted 2026-02-04 · 💻 cs.LG · cs.AI· cs.DC· cs.SE

Recognition: 1 theorem link

· Lean Theorem

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Samaresh Kumar Singh , Joyjit Roy

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DCcs.SE

keywords Explainable AIEdge AIXaaSDistributed CachingSemantic SimilarityLatency ReductionIoT SystemsVerification Protocol

0 comments

The pith

XaaS decouples explanation generation from inference so edge devices can cache and reuse explanations, cutting latency by 38 percent while keeping quality high.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current explainable AI methods generate explanations together with each model inference, which repeats work and creates high latency when the same or similar models run on many different edge devices. XaaS moves explanation into a separate distributed service that devices request on demand, retrieve from a semantic-similarity cache, and check with a lightweight verification step. An adaptive engine picks the explanation method to match the device's resources and the user's needs. Experiments on manufacturing quality control, autonomous-vehicle perception, and healthcare diagnostics show the system delivers 38 percent lower latency without measurable drop in explanation fidelity. If the approach holds, transparent AI becomes practical to run at scale across large, heterogeneous IoT networks instead of remaining an ad-hoc add-on.

Core claim

The paper presents XaaS as a distributed architecture that treats explainability as a first-class system service rather than a model-coupled feature. By decoupling inference from explanation, introducing a semantic-similarity cache for reuse, a verification protocol for fidelity, and an adaptive engine that selects methods by device capability, the system achieves 38 percent lower latency across three real-world edge-AI deployments while preserving explanation quality.

What carries the argument

The distributed explanation cache with semantic similarity retrieval that identifies and reuses prior explanations instead of regenerating them for every inference.

If this is right

Edge devices can share and reuse explanations across heterogeneous hardware without regenerating them each time.
Verification ensures both cached and new explanations meet a fidelity threshold before use.
Adaptive method selection lets the system match explanation cost to available device resources.
Large-scale IoT deployments of accountable AI become feasible without prohibitive added latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same caching pattern could apply to other compute-heavy AI services such as uncertainty estimation or continual learning on edge hardware.
Domain-specific similarity metrics might be needed to keep fidelity high when explanations cross task boundaries.
Standardized explanation services could eventually be offered by cloud or edge providers in the same way inference APIs are offered today.

Load-bearing premise

Semantic similarity can locate reusable explanations without losing important fidelity when the same explanation is applied across different devices and tasks.

What would settle it

Measure explanation fidelity on a held-out set of inputs where cached explanations retrieved by semantic similarity score below the fidelity of freshly generated explanations on the same inputs; a consistent gap would show that latency savings come at the expense of quality.

Figures

Figures reproduced from arXiv: 2602.04120 by Joyjit Roy, Samaresh Kumar Singh.

**Figure 1.** Figure 1: XaaS System Architecture. The framework decouples inference from explanation generation, enabling edge devices to request, cache, and verify [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of XaaS with Baseline Methods. Average Values of Primary Performance Metrics Across Three Scenarios. The XaaS demonstrated a 38% [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Scalability analysis. (a) XaaS latency grows sublinearly with device [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 3.** Figure 3: Cache performance dynamics. (a) Hit rate stabilizes after the initial [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Ablation analysis demonstrating the impact of XaaS components. (a) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edge-AI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Explainability-as-a-Service (XaaS), a distributed architecture that decouples model inference from explanation generation in edge AI and IoT systems. It introduces a distributed explanation cache using semantic similarity retrieval to reduce redundant computation, a lightweight verification protocol for fidelity, and an adaptive explanation engine that selects methods based on device capabilities. Evaluation on three real-world use cases (manufacturing quality control, autonomous vehicle perception, and healthcare diagnostics) claims a 38% latency reduction while preserving high explanation quality.

Significance. If the performance and fidelity claims hold under detailed scrutiny, the work could meaningfully advance practical XAI deployment in heterogeneous edge environments by treating explainability as a scalable service rather than a per-model overhead, potentially enabling accountable AI at IoT scale.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation section: the central claim of a 38% latency reduction provides no error bars, baseline system details, statistical significance tests, or exclusion criteria for the three use cases, preventing independent verification of the result and its robustness across deployments.
[§3.1] §3.1 (distributed explanation cache): the semantic similarity retrieval method is presented without any reported fidelity metrics, cross-device mismatch experiments, or quantified loss rates under heterogeneous sensor resolutions or task constraints, leaving the assumption that cache hits preserve explanation quality unanchored and load-bearing for both the latency and quality claims.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly contrast XaaS against prior coupled XAI methods with a brief table of latency overheads from related work.
[§3.2] Notation for the lightweight verification protocol (e.g., any pseudocode or equations defining the fidelity check) should be introduced earlier to improve readability of the architecture description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation rigor and the distributed cache mechanism. We address both major comments by agreeing to incorporate the requested details and metrics in the revised manuscript.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation section: the central claim of a 38% latency reduction provides no error bars, baseline system details, statistical significance tests, or exclusion criteria for the three use cases, preventing independent verification of the result and its robustness across deployments.

Authors: We agree that the current presentation of the 38% latency reduction lacks sufficient statistical detail for independent verification. In the revised manuscript we will add error bars computed over multiple independent runs for each use case, explicitly describe the baseline systems (coupled per-device XAI implementations), report results of statistical significance tests (paired t-tests with p-values), and state the exclusion criteria applied when selecting the three deployments. These additions will directly support the robustness claim. revision: yes
Referee: [§3.1] §3.1 (distributed explanation cache): the semantic similarity retrieval method is presented without any reported fidelity metrics, cross-device mismatch experiments, or quantified loss rates under heterogeneous sensor resolutions or task constraints, leaving the assumption that cache hits preserve explanation quality unanchored and load-bearing for both the latency and quality claims.

Authors: We acknowledge that fidelity metrics for the semantic similarity retrieval were not reported in §3.1. In the revision we will insert a dedicated evaluation subsection that reports fidelity metrics (e.g., explanation similarity scores between cached and freshly generated outputs), results from cross-device mismatch experiments across heterogeneous sensor resolutions, and quantified loss rates under the task constraints of the three use cases. This will provide the missing empirical grounding for the cache-hit quality assumption. revision: yes

Circularity Check

0 steps flagged

No circularity: XaaS is an architectural proposal evaluated experimentally, not a derivation reducing to self-defined inputs

full rationale

The paper presents XaaS as a distributed system architecture decoupling inference from explanation generation via three innovations: semantic-similarity caching, lightweight verification, and adaptive engine selection. The 38% latency claim is reported from direct measurements on three heterogeneous real-world deployments (manufacturing, AV perception, healthcare) rather than any equation or parameter fit that reduces to quantities defined by the authors' prior work. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no predictions are statistically forced by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The proposal rests on the domain assumption that explanations can be treated as reusable services and that semantic similarity preserves fidelity; no numerical free parameters or new physical entities are introduced.

axioms (2)

domain assumption Explanations generated for one input can be reused for semantically similar inputs without unacceptable fidelity loss
Invoked to justify the distributed explanation cache
domain assumption A lightweight verification protocol can confirm explanation fidelity across heterogeneous edge devices
Required for the verification component to function

invented entities (2)

Distributed explanation cache no independent evidence
purpose: Store and retrieve explanations to avoid redundant computation
Core component of the proposed architecture
Lightweight verification protocol no independent evidence
purpose: Ensure fidelity of cached and new explanations
Core component of the proposed architecture

pith-pipeline@v0.9.0 · 5594 in / 1400 out tokens · 33079 ms · 2026-05-16T07:30:37.551742+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness), Foundation/AlexanderDuality.lean (D=3), Foundation/ArithmeticFromLogic.lean (LogicNat orbit) reality_from_one_distinction, washburn_uniqueness_aczel, alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

distributed explanation cache with a semantic similarity based explanation retrieval method... lightweight verification protocol... adaptive explanation engine

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Persistent and Conversational Multi-Method Explainability for Trustworthy Financial AI
cs.AI 2026-05 unverdicted novelty 4.0

An architecture stores XAI explanations persistently in searchable storage and uses RAG to synthesize multiple methods conversationally, cutting hallucination rates by 36% in a FinBERT financial sentiment demo.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Edge intelligence: Paving the last mile of artificial intelligence with edge computing,

Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019

work page 2019
[2]

“why should i trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, ““why should i trust you?” explaining the predictions of any classifier,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1135–1144

work page 2016
[3]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inAdvances in Neural Information Processing Systems, vol. 30, 2017, pp. 4765–4774

work page 2017
[4]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626

work page 2017
[5]

Mobilenets: Efficient convolutional neural networks for mobile vision applications,

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017

work page 2017
[6]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inProceedings of the 36th International Conference on Machine Learning, 2019, pp. 6105–6114

work page 2019
[7]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size,”arXiv preprint arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

Distributed deep neural networks over the cloud, the edge and end devices,

S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in2017 IEEE 37th International Conference on Distributed Computing Systems. IEEE, 2017, pp. 328–339

work page 2017
[9]

Efficient ex- plainability for edge ai systems,

K. Srinivasan, G. Ananthanarayanan, and R. Mahajan, “Efficient ex- plainability for edge ai systems,” inProceedings of the ACM Workshop on Hot Topics in Networks. ACM, 2021, pp. 1–7

work page 2021
[10]

Edge-xai: Lightweight explainable ai for edge devices,

M. J. Islam, M. A. Rahman, and M. Z. A. Bhuiyan, “Edge-xai: Lightweight explainable ai for edge devices,” inProceedings of the IEEE International Conference on Edge Computing. IEEE, 2021, pp. 100– 107

work page 2021
[11]

Explainable machine learning in deployment,

U. Bhatt, A. Xiang, S. Sharma, A. Weller, A. Taly, Y . Jia, J. Ghosh, R. Puri, J. M. Moura, and P. Eckersley, “Explainable machine learning in deployment,” inProceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 648–657

work page 2020
[12]

The akamai network: A platform for high-performance internet applications,

E. Nygren, R. K. Sitaraman, and J. Sun, “The akamai network: A platform for high-performance internet applications,”ACM SIGOPS Operating Systems Review, vol. 44, no. 3, pp. 2–19, 2010

work page 2010
[13]

Rfc 2616: Hypertext transfer protocol – http/1.1,

R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, “Rfc 2616: Hypertext transfer protocol – http/1.1,” Internet Engineering Task Force (IETF), Tech. Rep., 1999

work page 1999
[14]

Machine learning ex- plainability for external stakeholders,

U. Bhatt, M. Andrus, A. Weller, and A. Xiang, “Machine learning ex- plainability for external stakeholders,”arXiv preprint arXiv:2007.05408, 2020

work page arXiv 2007
[15]

Transparency and trust in human-ai- interaction: The role of model-agnostic explanations in computer vision- based decision support,

C. Meske and E. Bunde, “Transparency and trust in human-ai- interaction: The role of model-agnostic explanations in computer vision- based decision support,”Artificial Intelligence in HCI, pp. 54–69, 2020

work page 2020
[16]

Tensorflow-serving: Flexible, high- performance ml serving,

C. Olston, N. Fiedel, K. Gorovoy, J. Harmsen, L. Lao, F. Li, V . Ra- jashekhar, S. Ramesh, and J. Soyke, “Tensorflow-serving: Flexible, high- performance ml serving,” inWorkshop on ML Systems at NIPS, 2017

work page 2017
[17]

Explainable artificial intelligence (xai): Concepts, taxonomies, opportu- nities and challenges toward responsible ai,

A. B. Arrieta, N. D ´ıaz-Rodr´ıguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjaminset al., “Explainable artificial intelligence (xai): Concepts, taxonomies, opportu- nities and challenges toward responsible ai,”Information Fusion, vol. 58, pp. 82–115, 2020

work page 2020
[18]

Billion-scale similarity search with gpus,

J. Johnson, M. Douze, and H. J ´egou, “Billion-scale similarity search with gpus,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019

work page 2019
[19]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inProceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748– 8763

work page 2021
[20]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30, 2017, pp. 5998–6008

work page 2017