Architecture Dependent Temporal Observability Under Deployment Interference in Edge Inference Systems

Akul Swami; Nikhil Chougule

arxiv: 2605.17701 · v1 · pith:US442HRZnew · submitted 2026-05-17 · 📡 eess.SY · cs.SY

Architecture Dependent Temporal Observability Under Deployment Interference in Edge Inference Systems

Akul Swami , Nikhil Chougule This is my paper

Pith reviewed 2026-05-19 22:01 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords edge inferencetiming observabilitydeployment interferenceTensorRTONNX RuntimeJetson OrinGPIO monitoringlatency measurement

0 comments

The pith

Deployment interference can corrupt both inference timing and the software that measures it, independently.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that software-reported latencies in edge inference systems can appear normal even as external hardware measurements detect timing failures caused by deployment stresses. Experiments on an NVIDIA Jetson Orin Nano compare TensorRT GPU and ONNX Runtime CPU setups for MobileNetV2 under baseline, memory pressure, and storage writeback conditions, pairing internal logs with GPIO intervals from a logic analyzer. Different architectures produce distinct distributional changes under stress, and storage stress triggers external timing failures while software logs report full success. This establishes that observability itself is vulnerable to the same interferences it aims to track.

Core claim

Timing observability is itself an interference-sensitive resource, and summary statistics from a single timing source can hide failure modes an independent external observer makes visible. In 35 paired runs, TensorRT baselines cluster tightly while ONNX Runtime baselines are multimodal; memory pressure inflates TensorRT P99 and collapses one ONNX run into a fixed 198 ms regime; storage stress produces complete software logs alongside three distinct external timing failures that the runtime never reports.

What carries the argument

Paired comparison of software-reported inference timing against external GPIO interval captures from a Saleae Logic Pro 8 logic analyzer on NVIDIA Jetson Orin Nano.

If this is right

Software-only latency summaries are insufficient to certify correct behavior under realistic deployment interference.
TensorRT and ONNX Runtime respond to the same stresses with qualitatively different timing structures, so architecture-specific observability checks are required.
Complete software logs can coexist with total external timing loss, meaning runtime success reports alone do not guarantee observable execution.
Light memory pressure and storage writeback each surface distinct hidden failure modes that internal metrics miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production edge deployments may need independent hardware timing channels as a standard safeguard rather than optional diagnostics.
Benchmark suites that rely solely on internal timers risk publishing optimistic results that do not survive contact with real interference.
The same independence between reported and observed timing could appear in other monitoring layers such as network or power telemetry.

Load-bearing premise

The logic analyzer's GPIO captures supply a reliable external ground truth unaffected by the deployment stresses that corrupt software timing reports.

What would settle it

A replication under storage or memory stress in which every external GPIO interval exactly matches the corresponding software-reported latency with no missing transitions, no fixed-regime collapses, and no acquisition failures.

Figures

Figures reproduced from arXiv: 2605.17701 by Akul Swami, Nikhil Chougule.

**Figure 2.** Figure 2: GPIO wrapped synchronization methodology. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-architecture run-level latency profile. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: TensorRT run-level tail latency under baseline and memory pressure. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: ONNX Runtime CPU run-level latency profile under baseline and memory pressure. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Synchronization degradation under storage writeback stress (Run 001 shown). [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Timing observability failure taxonomy. 5 Discussion The experiments support a narrow but specific argument. Different inference architectures on the same hardware exhibit qualitatively different temporal behavior, and that difference is not captured by mean latency. TensorRT and ORT differ in baseline distribution shape; they differ further in how their distributions respond to memory pressure (tail amplif… view at source ↗

read the original abstract

Edge inference systems are typically evaluated with software-reported latency collected under controlled conditions. We argue, and demonstrate empirically, that deployment interference can corrupt not only the inference timing being measured but the timing observability infrastructure that measures it, and that the two failures can occur independently. We pair software-reported timing with externally observable GPIO intervals captured by a Saleae Logic Pro 8 logic analyzer on an NVIDIA Jetson Orin Nano, running MobileNetV2 under two inference architectures (TensorRT FP16 GPU and ONNX Runtime CPU) across baseline, light memory pressure, and storage writeback stress. Across 35 paired capture runs (3500 samples) plus 3 storage-stress runs where external pairing failed (300 software-only samples), we observe three findings the software-only view does not surface. (1) The two architectures differ not only in mean latency but in distributional structure: TensorRT baseline clusters tightly near 1.23 ms (run-mean SD 15 us) while ORT CPU baseline is multimodal with run-mean SD 31.8 ms. (2) Light memory pressure inflates TensorRT P99 from 1.28 ms to 1.61 ms, while one of five ORT memory-stress runs collapses into a deterministic 198 ms regime rather than uniformly inflating variance. (3) All three TensorRT storage-stress runs produce complete software timing logs (100/100 iterations) alongside externally observable timing failures of three different kinds (full post-marker collapse, ~40% transition loss, and complete acquisition failure) -- while the runtime reports normal completion in every case. We claim, narrowly, that timing observability is itself an interference-sensitive resource, and that summary statistics from a single timing source can hide failure modes an independent external observer makes visible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Software timing can report full success while external GPIO captures show distinct failures under storage stress, and this differs by TensorRT vs ONNX architecture, but the external measure needs isolation checks.

read the letter

The main thing to know is that under storage writeback stress the software timing logs show 100 percent completion across all runs while the external Saleae captures reveal three separate failure modes, and the two inference architectures respond differently to the same stresses. This is presented as a narrow empirical claim about observability itself being interference-sensitive rather than a broad theory of edge systems.

Referee Report

2 major / 2 minor

Summary. The manuscript empirically demonstrates that deployment interference in edge inference systems can corrupt timing observability independently of the inference timing itself. On an NVIDIA Jetson Orin Nano running MobileNetV2 under TensorRT FP16 GPU and ONNX Runtime CPU architectures, the authors pair software-reported latency with external GPIO interval captures from a Saleae Logic Pro 8 logic analyzer across baseline, light memory pressure, and storage writeback conditions. From 35 paired runs (3500 samples) and 3 additional storage-stress runs (300 software-only samples), they report three findings invisible to software-only views: (1) architecture-dependent distributional structure in baseline latency, (2) non-uniform effects of memory pressure including a deterministic collapse in one ORT run, and (3) complete software timing logs (100/100 iterations) alongside three distinct external failure modes (post-marker collapse, ~40% transition loss, acquisition failure) under storage stress.

Significance. If the central empirical observations hold, the work provides concrete evidence that single-source software timing metrics can mask interference-induced observability failures in edge systems, with direct implications for reliable benchmarking and monitoring of deployed inference. The study is strengthened by its use of paired captures, explicit sample counts, and identification of multiple distinct external failure modes rather than relying on fitted models or self-referential predictions.

major comments (2)

[Experimental methodology (storage-stress runs)] Experimental methodology (storage-stress runs description): the claim that external GPIO intervals constitute an independent, uncorrupted ground truth is load-bearing for the central finding that software reports normal completion while external captures reveal failures. No control isolating the capture path (e.g., non-inference GPIO toggles under identical storage writeback) is described, leaving open the possibility that USB bus contention or GPIO controller delays on the Jetson could induce correlated artifacts in the Saleae captures rather than revealing independent observability corruption.
[Results (storage-stress runs)] Results on storage-stress runs (the three distinct external failure modes): while the paper reports complete software logs alongside external failures in all three TensorRT runs, the absence of statistical tests or controls for confounding variables (as noted in the abstract's sample counts) weakens the support for the claim that these are distinct, interference-specific failure modes rather than artifacts of the measurement pairing.

minor comments (2)

[Abstract] The abstract states 'three distinct findings' but the numbering and separation of the architecture-dependent distributional structure from the memory-pressure effects could be clarified for readability.
[Abstract] Notation for run-mean SD (e.g., 'run-mean SD 15 us') is used without explicit definition of how runs are aggregated versus individual iteration statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful and constructive comments on our manuscript. We address each of the major comments below and indicate the revisions we will make to improve the clarity and rigor of our empirical claims.

read point-by-point responses

Referee: Experimental methodology (storage-stress runs description): the claim that external GPIO intervals constitute an independent, uncorrupted ground truth is load-bearing for the central finding that software reports normal completion while external captures reveal failures. No control isolating the capture path (e.g., non-inference GPIO toggles under identical storage writeback) is described, leaving open the possibility that USB bus contention or GPIO controller delays on the Jetson could induce correlated artifacts in the Saleae captures rather than revealing independent observability corruption.

Authors: We agree that this is an important methodological point. Although our paired measurements in baseline and memory pressure conditions showed no evidence of capture artifacts, we did not explicitly test the GPIO/Saleae path in isolation under storage writeback. In the revised manuscript, we will include a control experiment with non-inference GPIO toggles under the same storage stress conditions to confirm that the external captures remain reliable and independent of the inference workload. revision: yes
Referee: Results on storage-stress runs (the three distinct external failure modes): while the paper reports complete software logs alongside external failures in all three TensorRT runs, the absence of statistical tests or controls for confounding variables (as noted in the abstract's sample counts) weakens the support for the claim that these are distinct, interference-specific failure modes rather than artifacts of the measurement pairing.

Authors: The three failure modes are presented as qualitatively distinct based on direct observation of the external capture traces across the three independent runs. We acknowledge that with only three runs and no formal statistical tests, the evidence for distinct modes is primarily descriptive. We will revise the manuscript to provide additional quantitative characterization of each mode (e.g., transition counts and timing deviations where measurable), explicitly discuss the small sample size as a limitation, and note that these observations are exploratory. This will better contextualize the findings without overstating their statistical support. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical observations with no derivations or self-referential predictions

full rationale

The paper reports direct experimental comparisons of software-reported inference latencies against external GPIO interval captures under controlled interference conditions on a Jetson platform. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. All claims rest on observed discrepancies across 35 paired runs and additional stress cases, with the central argument being that a single timing source can miss failure modes visible to an independent observer. This structure is self-contained against external benchmarks and contains none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical measurement study and introduces no new mathematical entities or free parameters. It rests on the domain assumption that external hardware capture serves as an independent reference.

axioms (1)

domain assumption The Saleae Logic Pro 8 GPIO captures provide an accurate independent timing reference unaffected by the software stack or deployment stress.
This premise is required to interpret software-external discrepancies as evidence of observability corruption rather than measurement error.

pith-pipeline@v0.9.0 · 5856 in / 1313 out tokens · 47599 ms · 2026-05-19T22:01:43.557922+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We pair software-reported timing with externally observable GPIO intervals captured by a Saleae Logic Pro 8 logic analyzer... three distinct external failure modes (post-marker collapse, ~40% transition loss, acquisition failure)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

timing observability is itself an interference-sensitive resource

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 1 internal anchor

[1]

and Reed, Daniel A

Malony, Allen D. and Reed, Daniel A. and Wijshoff, Harry A. G. , title =. IEEE Transactions on Parallel and Distributed Systems , volume =

work page
[2]

, title =

Mytkowicz, Todd and Diwan, Amer and Hauswirth, Matthias and Sweeney, Peter F. , title =. Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems , pages =

work page
[3]

IEEE International Parallel and Distributed Processing Symposium , year =

Mytkowicz, Todd and Diwan, Amer and Hauswirth, Matthias , title =. IEEE International Parallel and Distributed Processing Symposium , year =

work page
[4]

Ratul, I. J. and Zhou, Y. and Yang, K. , title =. Electronics , volume =

work page
[5]

Don't Buy the Pig in a Poke: Benchmarking DNNs Inference Performance before Development , booktitle =

V. Don't Buy the Pig in a Poke: Benchmarking DNNs Inference Performance before Development , booktitle =

work page
[6]

ACM Transactions on Embedded Computing Systems , volume =

Jeong, Eunjin and Kim, Jangryul and Ha, Soonhoi , title =. ACM Transactions on Embedded Computing Systems , volume =

work page
[7]

and Lee, J

Jeong, Eunjin and Kim, Jangryul and Tan, S. and Lee, J. and Ha, Soonhoi , title =. IEEE Embedded Systems Letters , volume =

work page
[8]

, title =

Mohror, Kathryn and Karavanic, Karen L. , title =

work page
[9]

Fundamental Issues in Testing Distributed Real-Time Systems , journal =

Sch. Fundamental Issues in Testing Distributed Real-Time Systems , journal =

work page
[10]

Proceedings of the ACM Applied Networking Research Workshop , year =

Mizrahi, Tal and Schapira, Michael and Moses, Yoram , title =. Proceedings of the ACM Applied Networking Research Workshop , year =

work page
[11]

Per-Platform GPIO Overhead in Hardware-Validated Edge ML Inference Timing

Swami, Akul and Chougule, Nikhil , title =. 2026 , eprint =. doi:10.48550/arXiv.2605.02835 , note =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.02835 2026

[1] [1]

and Reed, Daniel A

Malony, Allen D. and Reed, Daniel A. and Wijshoff, Harry A. G. , title =. IEEE Transactions on Parallel and Distributed Systems , volume =

work page

[2] [2]

, title =

Mytkowicz, Todd and Diwan, Amer and Hauswirth, Matthias and Sweeney, Peter F. , title =. Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems , pages =

work page

[3] [3]

IEEE International Parallel and Distributed Processing Symposium , year =

Mytkowicz, Todd and Diwan, Amer and Hauswirth, Matthias , title =. IEEE International Parallel and Distributed Processing Symposium , year =

work page

[4] [4]

Ratul, I. J. and Zhou, Y. and Yang, K. , title =. Electronics , volume =

work page

[5] [5]

Don't Buy the Pig in a Poke: Benchmarking DNNs Inference Performance before Development , booktitle =

V. Don't Buy the Pig in a Poke: Benchmarking DNNs Inference Performance before Development , booktitle =

work page

[6] [6]

ACM Transactions on Embedded Computing Systems , volume =

Jeong, Eunjin and Kim, Jangryul and Ha, Soonhoi , title =. ACM Transactions on Embedded Computing Systems , volume =

work page

[7] [7]

and Lee, J

Jeong, Eunjin and Kim, Jangryul and Tan, S. and Lee, J. and Ha, Soonhoi , title =. IEEE Embedded Systems Letters , volume =

work page

[8] [8]

, title =

Mohror, Kathryn and Karavanic, Karen L. , title =

work page

[9] [9]

Fundamental Issues in Testing Distributed Real-Time Systems , journal =

Sch. Fundamental Issues in Testing Distributed Real-Time Systems , journal =

work page

[10] [10]

Proceedings of the ACM Applied Networking Research Workshop , year =

Mizrahi, Tal and Schapira, Michael and Moses, Yoram , title =. Proceedings of the ACM Applied Networking Research Workshop , year =

work page

[11] [11]

Per-Platform GPIO Overhead in Hardware-Validated Edge ML Inference Timing

Swami, Akul and Chougule, Nikhil , title =. 2026 , eprint =. doi:10.48550/arXiv.2605.02835 , note =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.02835 2026