Recognition: no theorem link
Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
Pith reviewed 2026-05-12 02:54 UTC · model grok-4.3
The pith
Two time-series decomposition techniques reveal obscured trends and seasonal cycles in cloud performance traces, enabling accurate predictions and improved resource allocation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose two time-series decomposition techniques for cloud performance engineering: a hybrid/manual method and a fully automatic method. Through a case study of 11 serverless functions, we show that both approaches can successfully and consistently reveal trends and seasonal cycles, such as weekly and quarterly patterns, which are otherwise obscured. As an evaluation and application of the decomposition, we used the decomposed components to predict future performance, yielding mean absolute percentage error (MAPE) values of only 1.8% (hybrid) and 2.1% (automatic), significantly outperforming basic time-series methods and deep learning. We further show that decomposition insights can guide
What carries the argument
The hybrid/manual and fully automatic time-series decomposition techniques that isolate trend, seasonal, and residual components in performance traces.
Load-bearing premise
The proposed decomposition techniques accurately isolate the underlying performance factors without introducing artifacts or missing intermittent patterns, and that the results from the 11-function case study and AWS generalize to diverse cloud deployments.
What would settle it
A cloud performance trace from another service or provider where the decomposition fails to identify known seasonal patterns or produces prediction errors higher than basic time-series methods would challenge the central claim.
Figures
read the original abstract
Cloud performance fluctuates due to factors such as resource contention and workload changes. These factors can be short-term, seasonal, or long-term. Their effects are often intertwined in performance traces, making performance management difficult. Prior work on cloud performance engineering used time-series decomposition to separate these factors. However, existing approaches rely on basic decomposition methods that may miss key variation patterns and fail on traces with complex or intermittent patterns, limiting their usefulness across diverse cloud deployments. To address this limitation, we propose two time-series decomposition techniques for cloud performance engineering: a hybrid/manual method and a fully automatic method. Through a case study of 11 serverless functions, we show that both approaches can successfully and consistently reveal trends and seasonal cycles, such as weekly and quarterly patterns, which are otherwise obscured. As an evaluation and application of the decomposition, we used the decomposed components to predict future performance, yielding mean absolute percentage error (MAPE) values of only 1.8\% (hybrid) and 2.1\% (automatic), significantly outperforming basic time-series methods and deep learning. We further show that decomposition insights can guide practical resource allocation. Using decomposition-informed scaling on AWS, we reduced latency variability by over 60\% and maximum latency by 10\%. Similar experiments on benchmarks on AWS confirmed that seasonal patterns and performance gains generalize beyond our case study. Notably, our findings demonstrate that even a single performance trace contains rich actionable information for guiding cloud management decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes two time-series decomposition techniques (a hybrid/manual method and a fully automatic method) to separate short-term, seasonal, and long-term factors in cloud performance traces, which are often intertwined due to resource contention and workload changes. In a case study of 11 serverless functions, both methods are shown to reveal obscured trends and seasonal cycles such as weekly and quarterly patterns. The decomposed components are then applied to predict future performance, achieving MAPE values of 1.8% (hybrid) and 2.1% (automatic) that outperform basic time-series methods and deep learning. The insights are further used to guide AWS resource scaling, reducing latency variability by over 60% and maximum latency by 10%, with generalization confirmed via additional AWS benchmarks.
Significance. If the decomposition techniques prove accurate in isolating performance factors without introducing artifacts, the work could meaningfully advance long-term performance engineering in cloud and serverless systems by extracting actionable information from individual traces for improved prediction and resource allocation. The empirical AWS results and reported scaling gains provide a practical demonstration of potential impact.
major comments (2)
- [Abstract and §5] Abstract and §5 (Evaluation): The reported MAPE values (1.8% hybrid, 2.1% automatic) and claims of outperforming baselines are presented without details on the decomposition algorithms, data preprocessing, or quantitative validation (e.g., tests on synthetic traces with known intermittent patterns or metrics confirming no artifacts). This is load-bearing for the central claim that the methods 'successfully and consistently reveal trends and seasonal cycles' and enable low-error prediction.
- [§4] §4 (Case Study): The generalization from 11 serverless functions and AWS benchmarks to 'diverse cloud deployments' with complex patterns rests on visual inspection and prediction MAPE alone; no quantitative assessment (such as recovery error on injected patterns or cross-validation against known events) is provided to confirm faithful isolation of factors.
minor comments (2)
- Figure captions and legends in the decomposition and scaling result plots could be expanded to explicitly label components (trend, seasonal, residual) and scaling policies for easier interpretation.
- [§5] The paper would benefit from a brief comparison table in §5 summarizing MAPE and variability metrics against all baselines (basic time-series, deep learning, and non-decomposition scaling) for direct readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract and §5] Abstract and §5 (Evaluation): The reported MAPE values (1.8% hybrid, 2.1% automatic) and claims of outperforming baselines are presented without details on the decomposition algorithms, data preprocessing, or quantitative validation (e.g., tests on synthetic traces with known intermittent patterns or metrics confirming no artifacts). This is load-bearing for the central claim that the methods 'successfully and consistently reveal trends and seasonal cycles' and enable low-error prediction.
Authors: The decomposition algorithms are presented in Section 3, with the hybrid method combining manual pattern identification and automated component fitting, and the automatic method relying on statistical detection of seasonal and trend components. Data preprocessing steps, including normalization and outlier handling for the performance traces, are described briefly but will be expanded with pseudocode and parameter settings in the revised manuscript. Our validation relies on real AWS traces from 11 serverless functions plus additional benchmarks, where low MAPE and scaling gains serve as evidence of effective decomposition without introducing obvious artifacts. We acknowledge that synthetic traces with injected patterns would provide stronger quantitative confirmation of no artifacts; however, such controlled experiments are outside the current scope as our focus is on practical cloud workloads where ground truth factors are unavailable. We will add a dedicated subsection in §5 discussing preprocessing details, potential artifacts, and why the real-world results support the claims. revision: partial
-
Referee: [§4] §4 (Case Study): The generalization from 11 serverless functions and AWS benchmarks to 'diverse cloud deployments' with complex patterns rests on visual inspection and prediction MAPE alone; no quantitative assessment (such as recovery error on injected patterns or cross-validation against known events) is provided to confirm faithful isolation of factors.
Authors: Section 4 presents results from 11 serverless functions and confirms generalization via separate AWS benchmark experiments showing similar seasonal patterns and scaling benefits. Visual inspection of decomposed components combined with predictive accuracy (MAPE) and practical latency reductions (>60% variability, 10% max latency) serve as our primary evidence for faithful isolation, as real traces lack known ground-truth factors for metrics like recovery error. We agree this limits strong claims about all diverse deployments and will revise the text in §4 and the abstract to qualify the generalization scope, emphasize the serverless/AWS context, and add a limitations paragraph on the absence of injected-pattern validation. revision: yes
Circularity Check
No circularity: empirical case study and out-of-sample evaluation are self-contained
full rationale
The paper proposes two decomposition techniques (hybrid and automatic) and applies them to 11 serverless function traces. It reports that the methods reveal trends/seasonal cycles, then uses the resulting components for future-performance forecasting evaluated by MAPE against held-out actual data (1.8% hybrid, 2.1% automatic), plus AWS scaling experiments that measure latency reduction. These steps rely on standard time-series methods, direct comparison to baselines, and external benchmarks rather than any self-definition, fitted-parameter renaming, or self-citation chain. The central claims are falsifiable via the reported quantitative metrics and do not reduce to their inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Time-series performance data can be decomposed into additive or multiplicative trend, seasonal, and residual components that capture distinct variation sources.
Reference graph
Works this paper leans on
-
[1]
CNCF Cloud Native Definition v1.0,
CNCF, “CNCF Cloud Native Definition v1.0,” https://github.com/cncf/toc/blob/main/DEFINITION.md, 2018, accessed: 06-09-2023
work page 2018
-
[2]
Amazon, “What Is Cloud Native?” https://aws.amazon.com/what- is/cloud-native/, 2023, accessed: 06-09-2023
work page 2023
-
[3]
Next Generation Cloud Computing: New Trends and Research Directions,
B. Varghese and R. Buyya, “Next Generation Cloud Computing: New Trends and Research Directions,”Future Generation Computer Systems, vol. 79, pp. 849–861, 2018
work page 2018
-
[4]
M. Villamizar, O. Garc ´es, L. Ochoa, H. Castro, L. Salamanca, M. Ver- ano, R. Casallas, S. Gil, C. Valencia, A. Zambrano, and M. Lang, “In- frastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures,” in2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing...
work page 2016
-
[5]
AlpaServe: Statistical Mul- tiplexing with Model Parallelism for Deep Learning Serving,
Z. Li, L. Zheng, Y . Zhong, V . Liu, Y . Sheng, X. Jin, Y . Huang, Z. Chen, H. Zhang, J. E. Gonzalez, and I. Stoica, “AlpaServe: Statistical Mul- tiplexing with Model Parallelism for Deep Learning Serving,” in17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), 2023
work page 2023
-
[6]
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving,
L. Wang, L. Yang, Y . Yu, W. Wang, B. Li, X. Sun, J. He, and L. Zhang, “Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving,” inProceedings of the ACM Symposium on Cloud Computing, 2021
work page 2021
-
[7]
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native,
Y . Lu, S. Bian, L. Chen, Y . He, Y . Hui, M. Lentz, B. Li, F. Liu, J. Li, Q. Liu, R. Liu, X. Liu, L. Ma, K. Rong, J. Wang, Y . Wu, Y . Wu, H. Zhang, M. Zhang, Q. Zhang, T. Zhou, and D. Zhuo, “Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native,”
-
[8]
Available: https://arxiv.org/abs/2401.12230
[Online]. Available: https://arxiv.org/abs/2401.12230
-
[9]
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up,
A. Gujarati, R. Karimi, S. Alzayat, W. Hao, A. Kaufmann, Y . Vigfusson, and J. Mace, “Serving DNNs like Clockwork: Performance Predictability from the Bottom Up,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020
work page 2020
-
[10]
A. Janes and B. Russo, “Automatic performance monitoring and regres- sion testing during the transition from monolith to microservices,” in 2019 IEEE International Symposium on Software Reliability Engineer- ing Workshops (ISSREW), 2019
work page 2019
-
[11]
Long-term iaas selection using performance discovery,
S. M. M. Fattah, A. Bouguettaya, and S. Mistry, “Long-term iaas selection using performance discovery,”IEEE Transactions on Services Computing, vol. 15, no. 4, 2022
work page 2022
-
[12]
Performance monitoring and root cause analysis for cloud-hosted web applications,
H. Jayathilaka, C. Krintz, and R. Wolski, “Performance monitoring and root cause analysis for cloud-hosted web applications,” inProceedings of the 26th International Conference on World Wide Web, 2017
work page 2017
-
[13]
Taming Performance Variability,
A. Maricq, D. Duplyakin, I. Jimenez, C. Maltzahn, R. Stutsman, and R. Ricci, “Taming Performance Variability,” inUSENIX Symp. on Operating Systems Design and Implementation, 2018
work page 2018
-
[14]
Perfor- mance Evaluation of Heterogeneous Cloud Functions,
K. Figiela, A. Gajek, A. Zima, B. Obrok, and M. Malawski, “Perfor- mance Evaluation of Heterogeneous Cloud Functions,”Concurrency and Computation: Practice and Experience, vol. 30, no. 23, p. e4792, 2018
work page 2018
-
[15]
Sieve: Actionable Insights from Monitored Metrics in Distributed Systems,
J. Thalheim, A. Rodrigues, I. E. Akkus, P. Bhatotia, R. Chen, B. Viswanath, L. Jiao, and C. Fetzer, “Sieve: Actionable Insights from Monitored Metrics in Distributed Systems,” inProceedings of the 18th ACM/IFIP/USENIX Middleware Conference, 2017
work page 2017
-
[16]
B. Bugbee, C. Phillips, H. Egan, R. Elmore, K. Gruchalla, and A. Purkayastha, “Prediction and characterization of application power use in a high-performance computing environment,”Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 10, no. 3, pp. 155–165, 2017
work page 2017
-
[17]
A case study on the stability of performance tests for serverless applications,
S. Eismann, D. E. Costa, L. Liao, C.-P. Bezemer, W. Shang, A. van Hoorn, and S. Kounev, “A case study on the stability of performance tests for serverless applications,”Journal of Systems and Software, vol. 189, p. 111294, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0164121222000498
work page 2022
-
[18]
A Novel Technique for Long-Term Anomaly Detection in the Cloud,
O. Vallis, J. Hochenbaum, and A. Kejariwal, “A Novel Technique for Long-Term Anomaly Detection in the Cloud,” in6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), 2014
work page 2014
-
[19]
Time Series Forecasting of Cloud Resource Usage,
S. S, N. S, and S. V . D. K, “Time Series Forecasting of Cloud Resource Usage,” in2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA), 2021
work page 2021
-
[20]
K. Jia, X. Yu, C. Zhang, W. Hu, D. Zhao, and J. Xiang, “Software Aging Prediction for Cloud Services Using a Gate Recurrent Unit Neural Network Model Based on Time Series Decomposition,”IEEE Transactions on Emerging Topics in Computing, 2023
work page 2023
-
[21]
STL: A Seasonal-Trend Decomposition,
R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning, “STL: A Seasonal-Trend Decomposition,”Journal of Official Statistics, vol. 6, no. 1, pp. 3–73, 1990
work page 1990
-
[22]
Anomaly Detection in Time Series: A Comprehensive Evaluation,
S. Schmidl, P. Wenig, and T. Papenbrock, “Anomaly Detection in Time Series: A Comprehensive Evaluation,”Proc. VLDB Endow., vol. 15, no. 9, p. 1779–1797, may 2022
work page 2022
-
[23]
A review of irregular time series data handling with gated recurrent neural networks,
P. B. Weerakody, K. W. Wong, G. Wang, and W. Ela, “A review of irregular time series data handling with gated recurrent neural networks,” Neurocomputing, vol. 441, pp. 161–178, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231221003003
work page 2021
-
[24]
Functional latent dynamics for irregularly sampled time se- ries forecasting,
C. Kl ¨otergens, V . K. Yalavarthi, M. Stubbemann, and L. Schmidt- Thieme, “Functional latent dynamics for irregularly sampled time se- ries forecasting,” inMachine Learning and Knowledge Discovery in Databases. Research Track, A. Bifet, J. Davis, T. Krilavi ˇcius, M. Kull, E. Ntoutsi, and I. ˇZliobait˙e, Eds., 2024. 13
work page 2024
-
[25]
Analysis and solution to the mode mixing phenomenon in emd,
Y . Gao, G. Ge, Z. Sheng, and E. Sang, “Analysis and solution to the mode mixing phenomenon in emd,” in2008 Congress on Image and Signal Processing, vol. 5, 2008, pp. 223–227
work page 2008
-
[26]
Study on mode mixing problem of empirical mode decomposition,
G. Xu, Z. Yang, and S. Wang, “Study on mode mixing problem of empirical mode decomposition,” in2016 Joint International Informa- tion Technology, Mechanical and Electronic Engineering Conference. Atlantis Press, 2016, pp. 389–394
work page 2016
-
[27]
N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.- C. Yen, C. C. Tung, and H. H. Liu, “The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis,”Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences, vol. 454, no. 1971, pp. 903–995, 1998
work page 1971
-
[28]
Ensemble empirical mode decomposition: a noise-assisted data analysis method,
Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted data analysis method,”Advances in Adaptive Data Analysis, vol. 01, no. 01, pp. 1–41, 2009
work page 2009
-
[29]
C. Chatfield and H. Xing,The analysis of time series: an introduction with R. CRC press, 2019
work page 2019
-
[30]
P. J. Brockwell and R. A. Davis,Introduction to Time Series and Forecasting. Springer, 2002
work page 2002
-
[31]
Exploring time series randomness,
P. Massoli, “Exploring time series randomness,”Curr Res Stat Math, vol. 3, no. 1, pp. 01–07, 2024
work page 2024
-
[32]
R. N. Bracewell, “The Fourier Transform,”Scientific American, vol. 260, no. 6, pp. 86–95, 1989. [Online]. Available: http://www.jstor.org/stable/24987290
-
[33]
Data-driven Nonstationary Signal De- composition Approaches: A Comparative Analysis,
T. Eriksen and N. u. Rehman, “Data-driven Nonstationary Signal De- composition Approaches: A Comparative Analysis,”Scientific Reports, vol. 13, no. 1, p. 1798, 2023
work page 2023
-
[34]
Signal Processing Techniques Applied to Human Sleep EEG Signals—A Review,
S. Motamedi-Fakhr, M. Moshrefi-Torbati, M. Hill, C. M. Hill, and P. R. White, “Signal Processing Techniques Applied to Human Sleep EEG Signals—A Review,”Biomedical Signal Processing and Control, vol. 10, pp. 21–33, 2014
work page 2014
-
[35]
Y .-H. Wang, K. Hu, and M.-T. Lo, “Uniform Phase Empirical Mode Decomposition: An Optimal Hybridization of Masking Signal and Ensemble Approaches,”IEEE Access, vol. 6, 2018
work page 2018
-
[36]
A. Stallone, A. Cicone, and M. Materassi, “New Insights and Best Practices for the Successful Use of Empirical Mode Decomposition, Iterative Filtering and Derived Algorithms,”Scientific reports, vol. 10, no. 1, p. 15161, 2020
work page 2020
-
[37]
A New View of Nonlinear Water Waves: the Hilbert Spectrum,
N. E. Huang, Z. Shen, and S. R. Long, “A New View of Nonlinear Water Waves: the Hilbert Spectrum,”Annual review of fluid mechanics, vol. 31, no. 1, pp. 417–457, 1999
work page 1999
-
[38]
R. J. Hyndman and G. Athanasopoulos,Forecasting: principles and practice. OTexts, 2018
work page 2018
-
[39]
S.-X. Lv and L. Wang, “Deep learning combined wind speed forecasting with hybrid time series decomposition and multi-objective parameter optimization,”Applied Energy, vol. 311, p. 118674, 2022
work page 2022
-
[40]
A Hybrid Evolutionary Decompo- sition System for Time Series Forecasting,
J. F. de Oliveira and T. B. Ludermir, “A Hybrid Evolutionary Decompo- sition System for Time Series Forecasting,”Neurocomputing, vol. 180, pp. 27–34, 2016, progress in Intelligent Systems Design
work page 2016
-
[41]
Time Series Forecasting using A Hybrid ARIMA and Neural Network Model,
G. Zhang, “Time Series Forecasting using A Hybrid ARIMA and Neural Network Model,”Neurocomputing, vol. 50, pp. 159–175, 2003
work page 2003
-
[42]
Exponential Smoothing: The State of the Art,
E. S. Gardner Jr, “Exponential Smoothing: The State of the Art,”Journal of forecasting, vol. 4, no. 1, pp. 1–28, 1985
work page 1985
-
[43]
R. K. Pearson, Y . Neuvo, J. Astola, and M. Gabbouj, “Generalized hampel filters,”EURASIP Journal on Advances in Signal Processing, vol. 2016, pp. 1–18, 2016
work page 2016
-
[44]
Forecasting Sales by Exponentially Weighted Moving Averages,
P. R. Winters, “Forecasting Sales by Exponentially Weighted Moving Averages,”Management science, vol. 6, no. 3, pp. 324–342, 1960
work page 1960
-
[45]
Is Big Data Performance Re- producible in Modern Cloud Networks? ,
A. Uta, A. Custura, D. Duplyakin, I. Jimenez, J. Rellermeyer, C. Maltzahn, R. Ricci, and A. Iosup, “Is Big Data Performance Re- producible in Modern Cloud Networks? ,” in17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020
work page 2020
-
[46]
Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems,
H. S. Gunawi, R. O. Suminto, R. Sears, C. Golliher, S. Sundararaman, X. Lin, T. Emami, W. Sheng, N. Bidokhti, C. McCaffrey, D. Srinivasan, B. Panda, A. Baptist, G. Grider, P. M. Fields, K. Harms, R. B. Ross, A. Jacobson, R. Ricci, K. Webb, P. Alvaro, H. B. Runesha, M. Hao, and H. Li, “Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Pr...
work page 2018
-
[47]
Stripe, “Stripe 2021 Update,” https://stripe.com/files/stripe-2021- update.pdf, 2021, [Online]
work page 2021
-
[48]
Counterfactual expla- nations for multivariate time series,
E. Ates, B. Aksar, V . J. Leung, and A. K. Coskun, “Counterfactual expla- nations for multivariate time series,” in2021 International Conference on Applied Artificial Intelligence (ICAPAI), 2021
work page 2021
-
[49]
How Your Web Traffic Changes With the Season,
K. Pratt, “How Your Web Traffic Changes With the Season,” https://uberall.com/en-us/resources/blog/how-your-web-traffic-changes- with-the-season, 2019, accessed: 07-09-2023
work page 2019
-
[50]
Notice a drop in your Website Traffic this sum- mer? Don’t Panic!
M. Technologies, “Notice a drop in your Website Traffic this sum- mer? Don’t Panic!” https://www.mltinnovations.com/notice-a-drop-in- your-website-traffic-this-summer-dont-panic/, 2023, accessed: 07-09- 2023
work page 2023
-
[51]
How Does Summer Affect Website Traffic?
S. Pace, “How Does Summer Affect Website Traffic?” https://blog.imageworksllc.com/blog/how-does-summer-affect-website- traffic, 2015, accessed: 07-09-2023
work page 2015
-
[52]
T. Xiong, Y . Bao, and Z. Hu, “Does Restraining End Effect Matter in EMD-based Modeling Framework for Time Series Prediction? Some Experimental Evidences,”Neurocomputing, vol. 123, pp. 174–184, 2014, contains Special issue articles: Advances in Pattern Recognition Applications and Methods. [Online]. Available: https://www.sciencedirect.com/science/article...
work page 2014
-
[53]
Application of the EEMD Method to Rotor Fault Diagnosis of Rotating Machinery,
Y . Lei, Z. He, and Y . Zi, “Application of the EEMD Method to Rotor Fault Diagnosis of Rotating Machinery,”Mechanical Systems and Signal Processing, vol. 23, no. 4, pp. 1327–1338, 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0888327008002720
work page 2009
-
[54]
On the Marriage of Lp-norms and Edit Distance,
L. Chen and R. Ng, “On the Marriage of Lp-norms and Edit Distance,” inProceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, 2004
work page 2004
-
[55]
FBDetect: Catching Tiny Performance Regres- sions at Hyperscale through In-Production Monitoring,
D. Y . Yoon, Y . Wang, M. Yu, E. Huang, J. I. Jones, A. Kukkadapu, O. Kocas, J. Wiepert, K. Goenka, S. Chen, Y . Lin, Z. Huang, J. Kong, M. Chow, and C. Tang, “FBDetect: Catching Tiny Performance Regres- sions at Hyperscale through In-Production Monitoring,” inProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024
work page 2024
-
[56]
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing,
M. Copik, G. Kwasniewski, M. Besta, M. Podstawski, and T. Hoefler, “SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing,” inProceedings of the 22nd International Middleware Conference, ser. Middleware ’21. Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3464298.3476133
-
[57]
D. Jauk, D. Yang, and M. Schulz, “Predicting Faults in High Per- formance Computing Systems: An in-Depth Survey of the State-of- the-Practice,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019
work page 2019
-
[58]
Smart Predictive Maintenance for High- Performance Computing Systems: A Literature Review,
A. L. d. C. D. Lima, V . M. Aranha, C. J. a. d. L. Carvalho, and E. G. S. Nascimento, “Smart Predictive Maintenance for High- Performance Computing Systems: A Literature Review,”J. Supercom- put., vol. 77, no. 11, p. 13494–13513, nov 2021
work page 2021
-
[59]
A Systematic Review on Anomaly Detection for Cloud Computing Environments,
T. Hagemann and K. Katsarou, “A Systematic Review on Anomaly Detection for Cloud Computing Environments,” inProceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference, 2021
work page 2020
-
[60]
Time-Series Anomaly Detection Service at Microsoft,
H. Ren, B. Xu, Y . Wang, C. Yi, C. Huang, X. Kou, T. Xing, M. Yang, J. Tong, and Q. Zhang, “Time-Series Anomaly Detection Service at Microsoft,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019
work page 2019
-
[61]
Performance anomaly detection and bottleneck identification,
O. Ibidunmoye, F. Hern ´andez-Rodriguez, and E. Elmroth, “Performance anomaly detection and bottleneck identification,”ACM Comput. Surv., vol. 48, no. 1, jul 2015
work page 2015
-
[62]
Failure prediction of data centers using time series and fault tree analysis,
T. Chalermarrewong, T. Achalakul, and S. C. W. See, “Failure prediction of data centers using time series and fault tree analysis,” in2012 IEEE 18th International Conference on Parallel and Distributed Systems, 2012
work page 2012
-
[63]
Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing,
N. Tavakoli, D. Dai, and Y . Chen, “Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing,” in2016 45th International Conference on Parallel Processing Workshops (ICPPW), 2016
work page 2016
-
[64]
Profi- ciency Metrics for Failure Prediction in High Performance Computing,
N. Taerat, C. Leangsuksun, C. Chandler, and N. Naksinehaboon, “Profi- ciency Metrics for Failure Prediction in High Performance Computing,” inInternational Symposium on Parallel and Distributed Processing with Applications, 2010
work page 2010
-
[65]
Reliability of a System of k Nodes for High Performance Computing Applications,
N. R. Gottumukkala, R. Nassar, M. Paun, C. B. Leangsuksun, and S. L. Scott, “Reliability of a System of k Nodes for High Performance Computing Applications,”IEEE Transactions on Reliability, vol. 59, no. 1, pp. 162–169, 2010
work page 2010
-
[66]
System- atically Inferring I/O Performance Variability by Examining Repetitive Job Behavior,
E. Costa, T. Patel, B. Schwaller, J. M. Brandt, and D. Tiwari, “System- atically Inferring I/O Performance Variability by Examining Repetitive Job Behavior,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021
work page 2021
-
[67]
Y . Gan, Y . Zhang, K. Hu, D. Cheng, Y . He, M. Pancholi, and C. De- limitrou, “Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019. 14
work page 2019
-
[68]
AutoMAP: Diagnose Your Microservice-Based Web Applications Automatically,
M. Ma, J. Xu, Y . Wang, P. Chen, Z. Zhang, and P. Wang, “AutoMAP: Diagnose Your Microservice-Based Web Applications Automatically,” inProceedings of The Web Conference 2020, 2020
work page 2020
-
[69]
Causal inference-based root cause analysis for online service systems with intervention recognition,
M. Li, Z. Li, K. Yin, X. Nie, W. Zhang, K. Sui, and D. Pei, “Causal inference-based root cause analysis for online service systems with intervention recognition,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022
work page 2022
-
[70]
Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems,
Z. Li, N. Zhao, M. Li, X. Lu, L. Wang, D. Chang, X. Nie, L. Cao, W. Zhang, K. Sui, Y . Wang, X. Du, G. Duan, and D. Pei, “Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022
work page 2022
-
[71]
Enabling Practical Cloud Performance Debugging with Unsupervised Learning,
Y . Gan, M. Liang, S. Dev, D. Lo, and C. Delimitrou, “Enabling Practical Cloud Performance Debugging with Unsupervised Learning,”SIGOPS Oper. Syst. Rev., vol. 56, no. 1, p. 34–41, jun 2022
work page 2022
-
[72]
Detecting Layered Bottlenecks in Microser- vices,
T. Inagaki, Y . Ueda, M. Ohara, S. Choochotkaew, M. Amaral, S. Trent, T. Chiba, and Q. Zhang, “Detecting Layered Bottlenecks in Microser- vices,” in2022 IEEE 15th International Conference on Cloud Comput- ing (CLOUD), 2022, pp. 385–396
work page 2022
-
[73]
Localizing and Explaining Faults in Microservices Using Distributed Tracing,
J. Rios, S. Jha, and L. Shwartz, “Localizing and Explaining Faults in Microservices Using Distributed Tracing,” in2022 IEEE 15th Interna- tional Conference on Cloud Computing (CLOUD), 2022, pp. 489–499
work page 2022
-
[74]
ImpactTracer: Root Cause Localization in Microservices Based on Fault Propagation Modeling,
R. Xie, J. Yang, J. Li, and L. Wang, “ImpactTracer: Root Cause Localization in Microservices Based on Fault Propagation Modeling,” in2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2023
work page 2023
-
[75]
CRISP: Critical Path Analysis of Large-Scale Microservice Architectures,
Z. Zhang, M. K. Ramanathan, P. Raj, A. Parwal, T. Sherwood, and M. Chabbi, “CRISP: Critical Path Analysis of Large-Scale Microservice Architectures,” in2022 USENIX Annual Technical Conference (USENIX ATC 22), 2022
work page 2022
-
[76]
Performance Debugging for Distributed Systems of Black Boxes,
M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthi- tacharoen, “Performance Debugging for Distributed Systems of Black Boxes,”SIGOPS Oper. Syst. Rev., vol. 37, no. 5, p. 74–89, oct 2003
work page 2003
-
[77]
I. Cohen, J. S. Chase, M. Goldszmidt, T. Kelly, and J. Symons, “Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control,” in6th Symposium on Operating Systems Design & Implementation (OSDI 04), 2004
work page 2004
-
[78]
Structured comparative analysis of systems logs to diagnose performance problems,
K. Nagaraj, C. Killian, and J. Neville, “Structured comparative analysis of systems logs to diagnose performance problems,” inProceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012
work page 2012
-
[79]
A Case For Cross- Domain Observability to Debug Performance Issues in Microservices,
R. K, P. Tammana, P. G. Kannan, and P. Naik, “A Case For Cross- Domain Observability to Debug Performance Issues in Microservices,” in2022 IEEE 15th International Conference on Cloud Computing (CLOUD), 2022
work page 2022
-
[80]
Col- lie: Finding Performance Anomalies in RDMA Subsystems,
X. Kong, Y . Zhu, H. Zhou, Z. Jiang, J. Ye, C. Guo, and D. Zhuo, “Col- lie: Finding Performance Anomalies in RDMA Subsystems,” in19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.