pith. sign in

arxiv: 2605.25298 · v1 · pith:NKH6LFTCnew · submitted 2026-05-24 · 💻 cs.DC · cs.PF

Beyond Thread States: Diagnosing Performance Degradation with eBPF and Thread Dynamics

Pith reviewed 2026-06-29 23:15 UTC · model grok-4.3

classification 💻 cs.DC cs.PF
keywords eBPFthread dynamicsperformance diagnosisthread state analysiscontention detectioninter-thread dependencieskernel tracingdata-intensive applications
0
0 comments X

The pith

An eBPF method extends thread state analysis by tracing inter-thread dependencies to diagnose sources of performance degradation like CPU and lock contention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that standard thread state analysis misses how performance problems spread between threads through shared resources. By adding sixteen eBPF metrics on kernel subsystems such as scheduling, futexes, and block IO, the approach follows degradation from entry threads to the actual constrained resource. A selective tracking algorithm limits the scope to a useful subset of interactions, keeping overhead low while still identifying CPU, disk, lock, and external service issues. If correct, this gives operators a practical way to locate internal application bottlenecks without application-specific instrumentation.

Core claim

The method successfully diagnoses CPU, disk, lock, and external service contention with minimal overhead while revealing internal application constraints by extending TSA with fine-grained thread dynamics captured via eBPF metrics across six kernel subsystems and a selective thread tracking algorithm.

What carries the argument

The selective thread tracking algorithm that traces performance issues from entry-point threads to constrained resources using the sixteen eBPF metrics.

If this is right

  • The approach identifies both the constrained subsystem and the path of propagation through thread interactions.
  • It works across diverse applications under variable workloads without requiring changes to the application code.
  • Overhead remains low enough for production use while still exposing internal constraints not visible in basic thread states.
  • It covers contention from CPU, disk, locks, and external services in one unified tracing setup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tracing structure could be extended to new kernel subsystems as they appear without redesigning the core algorithm.
  • Production monitoring systems could feed the thread-dependency graphs into automated remediation scripts that adjust resource allocation.
  • The method's focus on entry-point threads suggests it may scale to microservice architectures where degradation crosses process boundaries.

Load-bearing premise

Performance degradation propagates along inter-thread dependencies in a manner that tracking a subset of thread-resource interactions captures the common patterns.

What would settle it

An experiment in which the selective tracker misses the true source of degradation because the dependency chain involves more threads or resources than the chosen subset.

Figures

Figures reproduced from arXiv: 2605.25298 by Diogo Landau, Jorge G. Barbosa, Nishant Saurabh.

Figure 1
Figure 1. Figure 1: a illustrates this through a database experiencing degraded performance caused by the serialization of read and write operations on a shared row. TSA results in Figure 1b indicate that the increased time spent by thread t4 waiting on locks strongly correlates with the observed performance degradation. However, the thread dynamics graph in Figure 1c further reveals that the only other thread interacting wit… view at source ↗
Figure 2
Figure 2. Figure 2: Instrumentation architecture: Illustration of the required components to monitor a target application. Our instrumentation also accounts for IPC and therefore thread groups that have communicated with the target application will also be monitored. mediated by kernel resources such as locks, sockets, pipes, and other inter-process communication mechanisms, which are essential for diagnosing the propagation … view at source ↗
Figure 3
Figure 3. Figure 3: Diagnostic Workflow: overview of the diagnostic workflow for application performance degradation. Stages highlighted in red and green denote our method’s participation and include a summary of the associated operations, indicating whether they are automated or require user intervention. TABLE I: Enumeration of the instrumented metrics based on the Thread States enumerated by TSA [8], and the resource granu… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of application interaction observability pro￾vided by different approaches, where purple lines denote TCP connections: (a) Coroot [39]; (b) xCapture [43]; (c) Our approach. Thread interactions are illustrated by directional arrows. This section clarifies the distinctions between two existing observability approaches and our metric collection method￾ology. For comparison, we selected Coroot [39],… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of an appli￾cation’s thread dynamics derived from resource-level metric collec￾tion. Selective Thread Track￾ing: Building on the thread dynamics, our performance degradation analysis em￾ploys a selective thread tracking procedure outlined in Algorithm 1. The algo￾rithm takes a target appli￾cation and time-series of its performance metrics (e.g., response time or through￾put) as input, and outp… view at source ↗
Figure 6
Figure 6. Figure 6: MySQL target metric observation and performance degra￾dation deconstruction: (a) TPCC workload throughput; (b) YCSB read-intensive workload 95th percentile response time; (c) Sector requests per device (identified by their major:minor|values; (d) Device request share for MySQL threads; (e) Per thread futexes’ (f1, f2) wait time; (f) Per thread futexes’ (f1, f2) wake activity; (g) Thread t2’s iowait time; (… view at source ↗
Figure 8
Figure 8. Figure 8: Kafka target metric and performance degradation decon￾struction: (a) Ideal/measured throughput; (b) Kafka threads’ runtime; (c) Kafka threads’ block time; (d) Combined connection wait time for epoll e1; (e) Thread t9 epoll e1 wait time; (f) Pipe p1 write activity. the production load until the broker could no longer sustain the incoming rate. Figure 8a compares the ideal and actual production rates, showin… view at source ↗
Figure 9
Figure 9. Figure 9: Kafka thread dynamics for the degradation path: blue threads denote entry-point threads. Deconstructing Perfor￾mance Degradation: We start by identifying the en￾trypoint threads responsi￾ble for external communi￾cation (t9-t11) using the socket wait time,count and epoll wait time,count met￾rics. These threads each manage dedicated epoll resources to handle incoming producer requests and send acknowledg￾men… view at source ↗
Figure 13
Figure 13. Figure 13: Teastore target metric observation and performance degradation deconstruction: (a) Teastore load pattern; (b) End-to￾end 95th percentile response time; (c) Webui’s threads’ rq time; (d) Webui’s threads’ runtime; (e) Time-series histogram of webui’s wait time for the constrained service. to measure 95th percentile latency (Figure 12a). As an inter￾vention, the official Redis benchmark [61] runs from 76 s f… view at source ↗
Figure 12
Figure 12. Figure 12: Redis target metric observation and performance degra￾dation deconstruction: (a) Memtier benchmark 95th percentile response time; (b) Redis runtime and rq time activity. four worker processes. A variable load pattern (Figure 11a) was generated using Locust, ranging from 2 to 60 concurrent users to induce short bursts of very high workload. The 95th percentile response time (Figure 11b) serves as our targe… view at source ↗
Figure 14
Figure 14. Figure 14: Alternative compare and baseline distribution representation for: (a) Fig 8c; (b) Fig 8f; (c) Fig 6h. Listing 1: Templated query that extracts baseline and compare distri￾bution data for an application’s thread block time. SELECT ts, blkio_share, ’baseline’ FROM taskstats_view WHERE {{ pid_filter }} AND {{ baseline_filter }} AND blkio_share > 0 UNION ALL SELECT ts, blkio_share, ’compare’ FROM taskstats_vi… view at source ↗
Figure 1
Figure 1. Figure 1: User interface screenshots: (a) KPI page; (b) Ripple page; (c) Debug page. On the interface’s homepage http://localhost: 8501, upload the *.db3 database file from the ./results/<experiment> directory. Then, on the KPI page, upload the target metric file. For Redis experiment, selecting baseline and comparing periods displays a graph with highlighted periods, as in Figure 1a. The baseline and compare period… view at source ↗
read the original abstract

Online Data-Intensive applications face performance degradation from load variability and resource interference. While Thread State Analysis (TSA) based approaches enable identifying constrained subsystems, they lack the granularity to reveal the inter-thread dependencies that propagate degradation. In this paper, we present an application-agnostic performance degradation analysis method that extends TSA by capturing fine-grained thread dynamics. We implemented $16$ eBPF-based metrics across six kernel subsystems, including scheduling, VFS, networking, futex, multiplexing IO, and block IO which enables tracing thread interactions with specific resources like futexes, sockets, and disks. Our method leverages the fact that performance degradation propagates along inter-thread dependencies, and a subset of thread-resource interactions can enable capturing common degradation patterns. To this end, we employ a selective thread tracking algorithm that traces performance issues from entry-point threads to constrained resources. Experimentation with diverse applications under variable workloads and resource contention shows our method successfully diagnoses CPU, disk, lock, and external service contention with minimal overhead, while also revealing internal application constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to extend Thread State Analysis (TSA) with an application-agnostic eBPF-based method that captures fine-grained thread dynamics via 16 metrics across six kernel subsystems (scheduling, VFS, networking, futex, multiplexing IO, block IO). It introduces a selective thread tracking algorithm that traces from entry-point threads to constrained resources, justified by the propagation of degradation along inter-thread dependencies and the sufficiency of a subset of thread-resource interactions. Experiments on diverse applications under variable workloads and contention are said to show successful diagnosis of CPU, disk, lock, and external service contention with minimal overhead, plus revelation of internal application constraints.

Significance. If validated, the approach could provide finer-grained diagnosis of inter-thread dependency propagation than standard TSA, enabling better identification of contention sources in data-intensive systems while maintaining low overhead through selective tracing.

major comments (2)
  1. [Abstract] Abstract: The central empirical claim that the method 'successfully diagnoses CPU, disk, lock, and external service contention' is asserted without any quantitative results, error bars, success metrics (e.g., precision/recall, diagnosis accuracy), workload details, or measurement methodology. This absence makes the claim of experimental success unverifiable and load-bearing for the paper's contribution.
  2. [Abstract] Abstract (paragraph on selective thread tracking algorithm): The method's applicability rests on the unvalidated assumption that 'a subset of thread-resource interactions can enable capturing common degradation patterns' because degradation 'propagates along inter-thread dependencies.' No formal argument, completeness proof, failure-mode enumeration, or representativeness argument for the chosen applications is provided; if the subset misses a propagation path, diagnosis is incomplete despite 'minimal overhead.'

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below and will revise the manuscript to improve clarity and support for the claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central empirical claim that the method 'successfully diagnoses CPU, disk, lock, and external service contention' is asserted without any quantitative results, error bars, success metrics (e.g., precision/recall, diagnosis accuracy), workload details, or measurement methodology. This absence makes the claim of experimental success unverifiable and load-bearing for the paper's contribution.

    Authors: We agree that the abstract would be strengthened by including summary quantitative results. The full manuscript reports specific experimental outcomes, including diagnosis success across workloads, overhead measurements (typically below 5%), and workload details in Sections 5 and 6. We will revise the abstract to incorporate key metrics such as average diagnosis accuracy and overhead ranges to make the empirical claim more verifiable. revision: yes

  2. Referee: [Abstract] Abstract (paragraph on selective thread tracking algorithm): The method's applicability rests on the unvalidated assumption that 'a subset of thread-resource interactions can enable capturing common degradation patterns' because degradation 'propagates along inter-thread dependencies.' No formal argument, completeness proof, failure-mode enumeration, or representativeness argument for the chosen applications is provided; if the subset misses a propagation path, diagnosis is incomplete despite 'minimal overhead.'

    Authors: The selective tracking approach is presented as an empirical heuristic justified by the propagation of degradation along inter-thread dependencies, which our experiments on diverse applications demonstrate by successfully identifying contention sources. While no formal completeness proof or exhaustive failure-mode analysis is included, the paper validates the method through results on multiple workloads. We will add a dedicated discussion subsection on the rationale, limitations of the subset selection, and representativeness of the evaluated applications. revision: partial

Circularity Check

0 steps flagged

No circularity: method is a direct engineering construction without reduction to fitted inputs or self-citations

full rationale

The paper describes an eBPF-based tracing implementation and selective thread tracking algorithm justified by an explicit modeling assumption (degradation propagates along inter-thread dependencies; a subset of interactions captures patterns). No equations, parameter fitting, or derivations appear. The assumption is stated as input rather than derived from the method's outputs. No self-citations are invoked to support the core claim. The approach is self-contained as a new tracing technique evaluated on applications; it does not reduce any prediction or result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5715 in / 1015 out tokens · 16637 ms · 2026-06-29T23:15:57.178306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references

  1. [1]

    Power management of online data-intensive services,

    D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch, “Power management of online data-intensive services,” in Proceedings of the 38th annual international symposium on Computer architecture, 2011, pp. 319–330

  2. [2]

    {µTune}:{Auto-Tuned}threading for{OLDI}microservices,

    A. Sriraman and T. F. Wenisch, “{µTune}:{Auto-Tuned}threading for{OLDI}microservices,” in13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 177–194

  3. [3]

    End-to-end quality of service management for distributed real-time embedded appli- cations,

    P. Manghwani, J. Loyall, P. Sharma, M. Gillen, and J. Ye, “End-to-end quality of service management for distributed real-time embedded appli- cations,” in19th IEEE International Parallel and Distributed Processing Symposium. IEEE, 2005, pp. 8–pp

  4. [4]

    Amoeba: Qos-awareness and reduced resource usage of microservices with serverless computing,

    Z. Li, Q. Chen, S. Xue, T. Ma, Y . Yang, Z. Song, and M. Guo, “Amoeba: Qos-awareness and reduced resource usage of microservices with serverless computing,” in2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020, pp. 399– 408

  5. [5]

    Impact of response latency on user behavior in web search,

    I. Arapakis, X. Bai, and B. B. Cambazoglu, “Impact of response latency on user behavior in web search,” inProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014, pp. 103–112

  6. [6]

    Quality is in the eye of the beholder: Meeting users’ requirements for internet quality of service,

    A. Bouch, A. Kuchinsky, and N. Bhatti, “Quality is in the eye of the beholder: Meeting users’ requirements for internet quality of service,” in Proceedings of the SIGCHI conference on Human factors in computing systems, 2000, pp. 297–304

  7. [7]

    {PerfCompass}: Toward runtime performance anomaly fault localization for{Infrastructure-as- a-Service}clouds,

    D. J. Dean, H. Nguyen, P. Wang, and X. Gu, “{PerfCompass}: Toward runtime performance anomaly fault localization for{Infrastructure-as- a-Service}clouds,” in6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), 2014

  8. [8]

    Gregg,Systems performance: enterprise and the cloud

    B. Gregg,Systems performance: enterprise and the cloud. Pearson Education, 2014

  9. [9]

    R. L. Sites,Understanding software dynamics. Addison-Wesley Pro- fessional, 2021

  10. [10]

    Monitorless: Predicting performance degradation in cloud applications with machine learning,

    J. Grohmann, P. K. Nicholson, J. O. Iglesias, S. Kounev, and D. Lugones, “Monitorless: Predicting performance degradation in cloud applications with machine learning,” inProceedings of the 20th international mid- dleware conference, 2019, pp. 149–162

  11. [11]

    Fedge: An interference-aware qos prediction framework for black-box scenario in iaas clouds with domain generalization,

    Y . Cheng, X. Huang, Z. Liu, J. Chen, X. Gao, Z. Fang, and Y . Yang, “Fedge: An interference-aware qos prediction framework for black-box scenario in iaas clouds with domain generalization,” in2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2024, pp. 128–138

  12. [12]

    Predicting real-time service-level metrics from device statistics,

    R. Yanggratoke, J. Ahmed, J. Ardelius, C. Flinta, A. Johnsson, D. Gill- blad, and R. Stadler, “Predicting real-time service-level metrics from device statistics,” in2015 IFIP/IEEE International Symposium on Inte- grated Network Management (IM). IEEE, 2015, pp. 414–422

  13. [13]

    Automated diagnostic of virtualized service performance degradation,

    J. Ahmed, T. Josefsson, A. Johnsson, C. Flinta, F. Moradi, R. Pasquini, and R. Stadler, “Automated diagnostic of virtualized service performance degradation,” inNOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium. IEEE, 2018, pp. 1–9

  14. [14]

    Correlating instrumentation data to system states: A building block for automated diagnosis and control

    I. Cohen, J. S. Chase, M. Goldszmidt, T. Kelly, and J. Symons, “Correlating instrumentation data to system states: A building block for automated diagnosis and control.” inOSDI, vol. 4, 2004, pp. 16–16

  15. [15]

    Ensembles of models for automated diagnosis of system performance problems,

    S. Zhang, I. Cohen, M. Goldszmidt, J. Symons, and A. Fox, “Ensembles of models for automated diagnosis of system performance problems,” in2005 International Conference on Dependable Systems and Networks (DSN’05). IEEE, 2005, pp. 644–653

  16. [16]

    Sage: practical and scalable ml-driven performance debugging in microservices,

    Y . Gan, M. Liang, S. Dev, D. Lo, and C. Delimitrou, “Sage: practical and scalable ml-driven performance debugging in microservices,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 135–151

  17. [17]

    Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices,

    Y . Gan, Y . Zhang, K. Hu, D. Cheng, Y . He, M. Pancholi, and C. Delimitrou, “Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices,” inProceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems, 2019, pp. 19–33

  18. [18]

    Characterizing in-kernel ob- servability of latency-sensitive request-level metrics with ebpf,

    M. Rezvani, A. Jahanshahi, and D. Wong, “Characterizing in-kernel ob- servability of latency-sensitive request-level metrics with ebpf,” in2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2024, pp. 24–35

  19. [19]

    Holistic runtime performance and security-aware monitoring in public cloud environment,

    D. N. Jha, G. Lenton, J. Asker, D. Blundell, and D. Wallom, “Holistic runtime performance and security-aware monitoring in public cloud environment,” in2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 2022, pp. 1052–1059

  20. [20]

    Nosql database per- formance diagnosis through system call-level introspection,

    C. Seo, Y . Chae, J. Lee, E. Seo, and B. Tak, “Nosql database per- formance diagnosis through system call-level introspection,” inNOMS 2022-2022 IEEE/IFIP Network Operations and Management Sympo- sium. IEEE, 2022, pp. 1–9

  21. [21]

    Landau and N

    D. Landau and N. Saurabh, “Prism,” Feb. 2026. [Online]. Available: https://github.com/EC-labs/prism

  22. [22]

    Where is your application stuck?

    S. Nagar, B. Singh, V . Kashyap, C. Seetharaman, N. Sharoff, and P. Banerjee, “Where is your application stuck?” inLinux Symposium. Citeseer, 2007, p. 71

  23. [23]

    Performance monitoring tools for linux,

    D. Gavin, “Performance monitoring tools for linux,”Linux Journal, vol. 1998, no. 56es, pp. 1–es, 1998

  24. [24]

    Runtime-adaptable selective performance instrumentation,

    S. Kreutzer, C. Iwainsky, M. Garcia-Gasulla, V . Lopez, and C. Bischof, “Runtime-adaptable selective performance instrumentation,” in2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2023, pp. 423–432

  25. [25]

    Diagnosing application-network anomalies for millions of{IPs}in production clouds,

    Z. Wang, H. Hu, L. Kong, X. Kang, Q. Xiang, J. Li, Y . Lu, Z. Song, P. Yang, J. Wuet al., “Diagnosing application-network anomalies for millions of{IPs}in production clouds,” in2024 USENIX Annual Technical Conference (USENIX ATC 24), 2024, pp. 885–899

  26. [26]

    Locating system problems using dynamic instrumentation,

    V . Prasad, W. Cohen, F. Eigler, M. Hunt, J. Keniston, and J. Chen, “Locating system problems using dynamic instrumentation,” in2005 Ottawa Linux Symposium. New York, NY: IEEE, 2005, pp. 49–64

  27. [27]

    The lttng tracer: A low impact performance and behavior monitor for gnu/linux,

    M. Desnoyers and M. R. Dagenais, “The lttng tracer: A low impact performance and behavior monitor for gnu/linux,” inOLS (Ottawa Linux Symposium), vol. 2006. Citeseer, 2006, pp. 209–224

  28. [28]

    Dynamic instrumen- tation of production systems

    B. Cantrill, M. W. Shapiro, A. H. Leventhalet al., “Dynamic instrumen- tation of production systems.” inUSENIX Annual Technical Conference, General Track, 2004, pp. 15–28

  29. [29]

    eBPF, “ebpf,” https://docs.kernel.org/bpf/, 2025, accessed: 2025-07-20

  30. [30]

    Bpf compiler collection (bcc),

    Iovisor, “Bpf compiler collection (bcc),” https://github.com/iovisor/bcc, 2025, accessed: 2025-07-20

  31. [31]

    bpftrace,

    Bpftrace, “bpftrace,” https://github.com/bpftrace/bpftrace, 2025, ac- cessed: 2025-07-20

  32. [32]

    libbpf source library,

    Libbpf, “libbpf source library,” https://github.com/libbpf/libbpf, 2025, accessed: 2025-07-20

  33. [33]

    libbpf-rs: A Rust wrapper around libbpf,

    Libbpf-rs, “libbpf-rs: A Rust wrapper around libbpf,” https://github.com/ libbpf/libbpf-rs, 2025, accessed: 2025-07-20

  34. [34]

    Canario: Sounding the alarm on io-related performance degradation,

    M. R. Wyatt, S. Herbein, K. Shoga, T. Gamblin, and M. Taufer, “Canario: Sounding the alarm on io-related performance degradation,” in2020 IEEE International Parallel and Distributed Processing Sympo- sium (IPDPS). IEEE, 2020, pp. 73–83

  35. [35]

    Baro: Robust root cause analysis for microservices via multivariate bayesian online change point detection,

    L. Pham, H. Ha, and H. Zhang, “Baro: Robust root cause analysis for microservices via multivariate bayesian online change point detection,” Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 2214–2237, 2024

  36. [36]

    Causal inference-based root cause analysis for online service systems with intervention recognition,

    M. Li, Z. Li, K. Yin, X. Nie, W. Zhang, K. Sui, and D. Pei, “Causal inference-based root cause analysis for online service systems with intervention recognition,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 3230– 3240

  37. [37]

    Microscope: Pinpoint performance issues with causal graphs in micro-service environments,

    J. Lin, P. Chen, and Z. Zheng, “Microscope: Pinpoint performance issues with causal graphs in micro-service environments,” inService-Oriented Computing: 16th International Conference, ICSOC 2018, Hangzhou, China, November 12-15, 2018, Proceedings 16. Springer, 2018, pp. 3–20

  38. [38]

    {FIRM}: An intelligent fine-grained resource management framework for{SLO-Oriented}microservices,

    H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “{FIRM}: An intelligent fine-grained resource management framework for{SLO-Oriented}microservices,” in14th USENIX symposium on operating systems design and implementation (OSDI 20), 2020, pp. 805– 825

  39. [39]

    Coroot, “Coroot,” https://docs.coroot.com/, 2025, accessed: 2025-10-09

  40. [40]

    Pixie, “Pixie,” https://docs.px.dev/, 2025, accessed: 2025-10-09

  41. [41]

    dynatrace runtime support,

    dynatrace, “dynatrace runtime support,” https://docs.dynatrace.com/ docs/ingest-from/technology-support/application-software, accessed: 2026-01-16

  42. [42]

    application performance monitoring,

    new relic, “application performance monitoring,” https://docs.newrelic. com/docs/apm/new-relic-apm/getting-started/introduction-apm/, accessed: 2026-01-16

  43. [43]

    xcapture,

    xCapture, “xcapture,” https://github.com/tanelpoder/0xtools, 2025, ac- cessed: 2025-10-09. 13

  44. [44]

    Reduce the infrastructure agent: CPU footprint — New Relic Documentation,

    “Reduce the infrastructure agent: CPU footprint — New Relic Documentation,” https://docs.newrelic.com/docs/infrastructure/ infrastructure-troubleshooting/troubleshoot-infrastructure/reduce- infrastructure-agents-cpu-footprint/, [Accessed 14-01-2026]

  45. [45]

    Troubleshooting large memory usage (Node.js): New Relic Documentation

    “Troubleshooting large memory usage (Node.js): New Relic Documentation.” https://docs.newrelic.com/docs/apm/agents/nodejs- agent/troubleshooting/troubleshooting-large-memory-usage-nodejs/, [Accessed 10-01-2026]

  46. [46]

    Controlling Measurement Overhead: Dynatrace

    “Controlling Measurement Overhead: Dynatrace.” https://www. dynatrace.com/resources/ebooks/javabook/controlling-measurement- overhead/, [Accessed 10-01-2026]

  47. [47]

    Process deep monitoring: Dynatrace,

    “Process deep monitoring: Dynatrace,” https://docs.dynatrace.com/docs/ observe/infrastructure-observability/process-groups/configuration/pg- monitoring, [Accessed 10-01-2026]

  48. [48]

    The tsa method,

    B. Gregg, “The tsa method,” https://www.brendangregg.com/tsamethod. html, 2025, accessed: 2025-10-07

  49. [49]

    Raasveldt, H

    M. Raasveldt, H. Muehleisenet al., “Duckdb,” inProceedings of the 2019 International Conference on Management of Data. ACM, 2019

  50. [50]

    Futexes are tricky,

    U. Drepper, “Futexes are tricky,”Futexes are Tricky, Red Hat Inc, Japan, vol. 4, 2005

  51. [51]

    Rust parker implementation,

    R. Lang, “Rust parker implementation,” https://github.com/rust- lang/rust/blob/eb33b43bab08223fa6b46abacc1e95e859fe375d/library/ std/src/sys/sync/thread parking/futex.rs, 2025, accessed: 2025-07-20

  52. [52]

    Eval- uating network processing efficiency with processor partitioning and asynchronous i/o,

    T. Brecht, G. Janakiraman, B. Lynn, V . Saletore, and Y . Turner, “Eval- uating network processing efficiency with processor partitioning and asynchronous i/o,”ACM SIGOPS Operating Systems Review, vol. 40, no. 4, pp. 265–278, 2006

  53. [53]

    Kerrisk,The Linux programming interface: a Linux and UNIX system programming handbook

    M. Kerrisk,The Linux programming interface: a Linux and UNIX system programming handbook. No Starch Press, 2010

  54. [54]

    Teastore: A micro-service reference application for bench- marking, modeling and resource management research,

    J. V on Kistowski, S. Eismann, N. Schmitt, A. Bauer, J. Grohmann, and S. Kounev, “Teastore: A micro-service reference application for bench- marking, modeling and resource management research,” in2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 2018, pp. 223–236

  55. [55]

    pysentimiento: A python toolkit for opinion mining and social nlp tasks,

    J. M. P ´erez, M. Rajngewerc, J. C. Giudici, D. A. Furman, F. Luque, L. A. Alemany, and M. V . Mart ´ınez, “pysentimiento: A python toolkit for opinion mining and social nlp tasks,” 2023

  56. [56]

    A hybrid system call profiling approach for container protection,

    Y . Xing, X. Wang, S. Torabi, Z. Zhang, L. Lei, and K. Sun, “A hybrid system call profiling approach for container protection,”IEEE Transactions on Dependable and Secure Computing, 2023

  57. [57]

    Demystifying cloud benchmarking,

    T. Palit, Y . Shen, and M. Ferdman, “Demystifying cloud benchmarking,” in2016 IEEE international symposium on performance analysis of systems and software (ISPASS). IEEE, 2016, pp. 122–132

  58. [58]

    Characterizing and optimizing kernel resource isolation for containers,

    K. Wang, S. Wu, K. Suo, Y . Liu, H. Huang, Z. Huang, and H. Jin, “Characterizing and optimizing kernel resource isolation for containers,” Future Generation Computer Systems, vol. 141, pp. 218–229, 2023

  59. [59]

    Benchmarking cloud serving systems with ycsb,

    B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with ycsb,” inProceedings of the 1st ACM symposium on Cloud computing, 2010, pp. 143–154

  60. [60]

    A modeling study of the tpc-c benchmark,

    S. T. Leutenegger and D. Dias, “A modeling study of the tpc-c benchmark,”ACM Sigmod Record, vol. 22, no. 2, pp. 22–31, 1993

  61. [61]

    Redis benchmark,

    Redis, “Redis benchmark,” https://redis.io/docs/latest/operate/oss-and- stack/management/optimization/benchmarks/, 2025, accessed: 2025-07- 20

  62. [62]

    Memtier benchmark,

    RedisLabs, “Memtier benchmark,” https://github.com/RedisLabs/ memtier benchmark, 2025, accessed: 2025-07-20

  63. [63]

    An open source load testing tool

    Locust, “An open source load testing tool.” https://locust.io, accessed: 2025-07-20

  64. [64]

    Twitter sentiment analysis dataset,

    Kaggle, “Twitter sentiment analysis dataset,” https://www.kaggle.com/ datasets/jp797498e/twitter-entity-sentiment-analysis, 2025, accessed: 2025-07-20

  65. [65]

    Thinking methodically about performance,

    B. Gregg, “Thinking methodically about performance,”Communications of the ACM, vol. 56, no. 2, pp. 45–51, 2013

  66. [66]

    stress next generation,

    Stressng, “stress next generation,” https://github.com/ColinIanKing/ stress-ng/tree/master, accessed: 2025-07-20

  67. [67]

    Innodb and the acid model,

    MySQL, “Innodb and the acid model,” https://dev.mysql.com/doc/ refman/8.4/en/mysql-acid.html, accessed: 2025-07-20

  68. [68]

    Cassandra storage engine documentation,

    Cassandra, “Cassandra storage engine documentation,” https://cassandra. apache.org/doc/4.1/cassandra/architecture/storage engine.html, accessed: 2025-07-20. 14 Appendix: Artifact Description/Artifact Evaluation This two-page appendix contains theArtifact Description (AD)andArtifact Evaluation (AE). The complete source code for the metric collector and ana...