BACC: Budget-Aware Calibration and Control for Horizontal Autoscaling

Behrooz Farkiani; Fan Liu; Guanqi Li; Patrick Crowley

arxiv: 2606.20575 · v1 · pith:B6N6BWQKnew · submitted 2026-05-01 · 💻 cs.NI

BACC: Budget-Aware Calibration and Control for Horizontal Autoscaling

Fan Liu , Guanqi Li , Behrooz Farkiani , Patrick Crowley This is my paper

Pith reviewed 2026-07-01 07:45 UTC · model grok-4.3

classification 💻 cs.NI

keywords horizontal autoscalingbudget-aware controlAdaptive Conformal Inferenceproportional-integral controlKubernetes HPAreliability budgetsworkload forecastingviolation compliance

0 comments

The pith

BACC adjusts autoscaling aggressiveness via a PI controller driven by observed budget-consumption pace to track violation targets within 0.5 percentage points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BACC as a model-agnostic way to keep horizontal autoscaling inside fixed-period reliability budgets. It wraps any forecaster with Adaptive Conformal Inference for online uncertainty calibration, then feeds the rate at which the violation budget is being spent into a proportional-integral controller that raises or lowers provisioning aggressiveness. Trace-driven simulations on Azure Functions data show the method keeps realized violation rates close to the chosen targets for both ARIMA and Chronos forecasters. Kubernetes replay experiments further indicate the same controller improves threshold compliance relative to native HPA once measurement delays and replica readiness are present.

Core claim

BACC separates workload prediction, online uncertainty calibration with Adaptive Conformal Inference, and budget-paced capacity control. A proportional-integral controller continuously modulates how aggressively to add or remove replicas according to the observed pace of budget consumption. Across five traces, three compliance levels, and two forecasters, the resulting compliance gaps average 0.44 and 0.42 percentage points; the same controller also raises CPU-threshold compliance over native HPA in cluster experiments that include deployment effects.

What carries the argument

The proportional-integral controller that raises or lowers provisioning aggressiveness in proportion to the observed rate of violation-budget consumption, layered on top of an ACI-calibrated forecaster.

If this is right

When budget consumption is slow, the controller provisions more aggressively and thereby reduces unnecessary replica counts.
When consumption accelerates, the controller tightens provisioning to protect the remaining budget.
The same controller logic applies unchanged to any forecaster because calibration and control are kept separate.
In real Kubernetes deployments the controller compensates for measurement delay and replica readiness better than threshold-only HPA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of calibration and control could be reused for other resource types if a comparable period-level budget metric is defined.
The approach may lower average over-provisioning in services whose SLOs are expressed as period violation budgets rather than instantaneous thresholds.
Extending the controller with an explicit model of replica spin-up time could further reduce the compliance gap under high churn.

Load-bearing premise

That the observed pace of budget consumption supplies enough information for the proportional-integral controller to set the right aggressiveness level for any forecaster and any deployment dynamics without further system modeling.

What would settle it

A controlled experiment in which the controller's aggressiveness adjustments produce realized violation rates more than one percentage point away from the target despite accurate real-time budget tracking would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.20575 by Behrooz Farkiani, Fan Liu, Guanqi Li, Patrick Crowley.

**Figure 1.** Figure 1: System overview of the proposed budget-aware autoscaler. Solid arrows denote the forward decision path; dashed [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Resource–compliance tradeoff (𝑆𝑣𝑟 vs 𝑅avg) across fixed-period compliance levels and traces. Dashed line = target threshold; shaded region = compliant zone. 35.5%, 21.4%, 9.0% with ARIMA and 39.1%, 24.3%, 10.7% with Chronos across P90/P95/P99. In our controlled setup, OptScaler uses the same forecast backends as BACC, so some of this gap is attributable to forecast quality. We do not fine-tune these predic… view at source ↗

read the original abstract

Cloud services must continuously adapt replica counts to fluctuating demand while respecting fixed-period reliability budgets. Many horizontal autoscalers either react to instantaneous utilization or provision against a fixed predictive risk target. These policies do not explicitly account for how much of the period-level violation budget has already been consumed, so they can be overly conservative when the budget is healthy and insufficiently conservative when the budget is being depleted. We present BACC, a model-agnostic framework for budget-aware horizontal autoscaling. BACC separates three concerns that are often entangled in prior systems: workload prediction, online uncertainty calibration, and budget-paced capacity control. It wraps an arbitrary forecaster with Adaptive Conformal Inference (ACI) to calibrate workload uncertainty online, then uses a proportional--integral controller to adjust provisioning aggressiveness based on the observed pace of budget consumption. We instantiate BACC for CPU-threshold-based horizontal autoscaling in Kubernetes and evaluate it through trace-driven simulation and cluster replay experiments. Across five Azure Functions traces, three compliance levels, and two forecasting backends, BACC tracks the requested violation target closely, achieving mean absolute compliance gaps of 0.44 and 0.42 percentage points with ARIMA and Chronos, respectively. The Kubernetes experiments further show that the same controller improves CPU-threshold compliance over native HPA under deployment effects such as measurement delay and replica readiness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BACC cleanly separates ACI calibration from PI control on budget-consumption pace, delivering tight compliance in the reported Azure and Kubernetes runs.

read the letter

BACC's core move is to treat workload prediction, online uncertainty calibration via ACI, and budget-paced control as separate pieces. The PI controller reacts to how quickly the violation budget is being spent rather than just instantaneous load or a fixed risk target. This lets the system stay conservative only when the budget is running low and relax when it is healthy.

The paper shows this works in practice. Across five Azure Functions traces, three compliance levels, and two forecasters, the mean absolute gaps to the target are 0.44 and 0.42 percentage points. The Kubernetes replay experiments also report better CPU-threshold compliance than native HPA once measurement delay and replica readiness are factored in. The design is model-agnostic, which is useful if teams want to swap forecasters without rewriting the controller.

The main limitation visible from the abstract is the lack of error bars, statistical tests, or a full description of how traces were chosen and pre-processed. That makes it harder to judge how stable the gaps are across different random seeds or workload shifts. The assumption that observed budget-consumption pace supplies enough signal for the PI loop also looks plausible but would benefit from more stress cases, such as abrupt demand spikes or changes in the underlying forecaster accuracy.

This is solid incremental work for the cloud resource-management community. It is worth sending to peer review because the separation of concerns is clear, the numbers are specific, and the experiments cover both simulation and real deployment effects. A referee can check the missing protocol details and robustness claims without needing to rewrite the central idea.

Referee Report

2 major / 3 minor

Summary. The paper presents BACC, a model-agnostic framework for budget-aware horizontal autoscaling in cloud systems. It decouples workload forecasting from online uncertainty calibration via Adaptive Conformal Inference (ACI) and from capacity control via a proportional-integral (PI) controller that modulates provisioning aggressiveness according to the observed pace of violation-budget consumption. Trace-driven simulations on five Azure Functions traces across three compliance targets and two forecasters (ARIMA, Chronos) report mean absolute compliance gaps of 0.44 and 0.42 percentage points; Kubernetes replay experiments show improved CPU-threshold compliance relative to native HPA under measurement delay and replica-readiness effects.

Significance. If the empirical results hold, the work supplies a practical, forecaster-agnostic mechanism for explicitly managing fixed-period reliability budgets, which prior reactive or fixed-risk autoscalers do not address. The clean separation of prediction, ACI calibration, and budget-paced PI control is a coherent architectural contribution, and the combination of trace-driven simulation with real-cluster replay provides relevant evidence for deployment relevance.

major comments (2)

[Evaluation section] Evaluation section: the central claim of close target tracking (mean absolute gaps 0.44/0.42 pp) is presented without error bars, per-trace standard deviations, or statistical significance tests across the five traces and three compliance levels; this weakens the ability to judge consistency of the result.
[Design / Controller subsection] The PI controller's reliance on observed budget-consumption pace as the sole feedback signal is load-bearing for the claim of robustness across forecasters and deployment dynamics, yet no sensitivity analysis or ablation on controller gains or alternative feedback signals is reported.

minor comments (3)

[§3] Notation for the ACI nonconformity score and the budget-consumption rate should be defined once in a single table or equation block rather than reintroduced in multiple sections.
[Evaluation figures] Figure captions for the Kubernetes replay results should explicitly state the number of independent runs and the exact compliance target used in each panel.
[Introduction / Related Work] The manuscript would benefit from a short related-work paragraph contrasting BACC with prior budget-aware or risk-aware autoscalers that also employ conformal methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: the central claim of close target tracking (mean absolute gaps 0.44/0.42 pp) is presented without error bars, per-trace standard deviations, or statistical significance tests across the five traces and three compliance levels; this weakens the ability to judge consistency of the result.

Authors: We agree that additional statistical detail would strengthen the evaluation. In the revised manuscript we will report per-trace compliance gaps, standard deviations across the five traces and three compliance targets, and error bars on the aggregate means. We will also include statistical significance tests (e.g., Wilcoxon signed-rank) with the explicit caveat that the small number of traces limits statistical power. revision: yes
Referee: [Design / Controller subsection] The PI controller's reliance on observed budget-consumption pace as the sole feedback signal is load-bearing for the claim of robustness across forecasters and deployment dynamics, yet no sensitivity analysis or ablation on controller gains or alternative feedback signals is reported.

Authors: We acknowledge the value of sensitivity analysis for the PI gains. The revised manuscript will add a dedicated subsection that varies the proportional and integral coefficients over a reasonable range and reports the resulting compliance gaps for both forecasters. We retain the position that budget-consumption pace is the natural feedback signal because it directly encodes the remaining reliability budget, but the new analysis will quantify robustness to gain choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical controller design is self-contained

full rationale

The paper describes BACC as a model-agnostic framework that applies Adaptive Conformal Inference (ACI) for online uncertainty calibration around an arbitrary forecaster, followed by a standard proportional-integral controller driven by observed budget-consumption pace. No derivation chain reduces a claimed prediction or result to a fitted quantity defined from the same evaluation data; the reported compliance gaps (0.44/0.42 pp) are measured outcomes from trace-driven and Kubernetes experiments rather than outputs forced by construction. ACI and PI control are standard techniques invoked without load-bearing self-citation chains or uniqueness theorems from the authors' prior work. The separation of prediction, calibration, and budget-paced control is presented as a design choice whose performance is externally validated against native HPA and multiple forecasters, leaving the central claims independent of the inputs used for evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

axioms (1)

domain assumption Adaptive Conformal Inference can be wrapped around an arbitrary forecaster to produce valid online uncertainty sets for workload prediction.
Invoked as the calibration mechanism in the framework description.

pith-pipeline@v0.9.1-grok · 5773 in / 1355 out tokens · 44956 ms · 2026-07-01T07:45:18.031190+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 5 canonical work pages · 1 internal anchor

[1]

2026.Auto Scaling Documentation

Amazon Web Services. 2026.Auto Scaling Documentation. AWS. https://docs. aws.amazon.com/autoscaling/ Accessed: 2026-02-17

2026
[2]

Maddix, Pablo Guer- ron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael Bohlke-Schneider

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guer- ron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael...
[3]

Chronos-2: From Univariate to Universal Forecasting

Chronos-2: From Univariate to Universal Forecasting.arXiv preprint arXiv:2510.15821(2025). https://arxiv.org/abs/2510.15821

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Mad- dix, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Syndar Rangapuram, Sebas- tian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Mad- dix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. 2024. Chronos: Learning the Language ...

2024
[5]

2016.Site reliability engineering: how Google runs production systems

Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016.Site reliability engineering: how Google runs production systems. O’Reilly Media, Inc

2016
[6]

Vivek M Bhasi, Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Taylan Kandemir, and Chita Das. 2021. Kraken: Adaptive container provisioning for deploying dynamic dags in serverless platforms. In Proceedings of the ACM Symposium on Cloud Computing. 153–167

2021
[7]

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time series analysis: forecasting and control. John Wiley & Sons

2015
[8]

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes
[9]

ACM59, 5 (2016), 50–57

Borg, omega, and kubernetes.Commun. ACM59, 5 (2016), 50–57

2016
[10]

Tao Chen, Rami Bahsoon, and Xin Yao. 2018. A survey and taxonomy of self- aware and self-adaptive cloud autoscaling systems.ACM Computing Surveys (CSUR)51, 3 (2018), 1–40

2018
[11]

Valentin Flunkert, Quentin Rebjock, Joel Castellon, Laurent Callot, and Tim Januschowski. 2020. A simple and effective predictive resource scaling heuristic for large-scale cloud applications.arXiv preprint arXiv:2008.01215(2020)

work page arXiv 2020
[12]

Guilherme Galante, Luis Carlos Erpen De Bona, Antonio Roberto Mury, Bruno Schulze, and Rodrigo da Rosa Righi. 2016. An analysis of public clouds elasticity in the execution of scientific applications: a survey.Journal of Grid Computing 14, 2 (2016), 193–216

2016
[13]

Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. InProceedings of the twenty- fourth international conference on architectural support for programming languages and operating systems. 19–33

2019
[14]

Alim Ul Gias, Giuliano Casale, and Murray Woodside. 2019. ATOM: Model- driven autoscaling for microservices. In2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1994–2004

2019
[15]

Isaac Gibbs and Emmanuel Candes. 2021. Adaptive conformal inference under distribution shift.Advances in Neural Information Processing Systems34 (2021), 1660–1672. BACC: Budget-Aware Calibration and Control for Horizontal Autoscaling

2021
[16]

2026.Load Balancing and Autoscaling

Google Cloud Docs. 2026.Load Balancing and Autoscaling. Google Cloud. https:// docs.cloud.google.com/compute/docs/load-balancing-and-autoscaling Accessed: 2026-02-17

2026
[17]

2024.KEDA: Kubernetes Event-Driven Autoscaling

KEDA Project. 2024.KEDA: Kubernetes Event-Driven Autoscaling. CNCF. https: //keda.sh/docs/ Accessed: 2026-02-17

2024
[18]

Nane Kratzke and Peter-Christian Quint. 2017. Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study. Journal of Systems and Software126 (2017), 1–16

2017
[19]

2026.Horizontal Pod Autoscaling

Kubernetes Documentation. 2026.Horizontal Pod Autoscaling. Kuber- netes. https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal- pod-autoscale/ Accessed: 2026-02-17

2026
[20]

Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A Lozano. 2014. A review of auto-scaling techniques for elastic applications in cloud environments.Journal of grid computing12, 4 (2014), 559–592

2014
[21]

Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. 2017. Imbalance in the cloud: An analysis on alibaba cluster trace. In2017 IEEE Inter- national Conference on Big Data (Big Data). IEEE, 2884–2892

2017
[22]

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Jian He, Guodong Yang, and Chengzhong Xu. 2022. Erms: Efficient resource management for shared microservices with SLA guarantees. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 62–77

2022
[23]

Olesia Pozdniakova, Dalius Mažeika, and Aurimas Cholomskis. 2024. SLA- adaptive threshold adjustment for a Kubernetes horizontal pod autoscaler.Elec- tronics13, 7 (2024), 1242

2024
[24]

Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravis- hankar K Iyer. 2020. {FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices. In14th USENIX symposium on operating systems design and implementation (OSDI 20). 805–825

2020
[25]

Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemys- law Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, et al. 2020. Autopilot: workload autoscaling at google. Inproceed- ings of the fifteenth european conference on computer systems. 1–16

2020
[26]

Vighnesh Sachidananda and Anirudh Sivaraman. 2024. Erlang: Application- aware autoscaling for cloud microservices. InProceedings of the Nineteenth European Conference on Computer Systems. 888–923

2024
[27]

Glenn Shafer and Vladimir Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research9 (2008)

2008
[28]

Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In2020 USENIX annual technical conference (USENIX ATC 20). 205–218

2020
[29]

Xiaoyang Sun, Chunming Hu, Renyu Yang, Peter Garraghan, Tianyu Wo, Jie Xu, Jianyong Zhu, and Chao Li. 2018. Rose: Cluster resource scheduling via speculative over-subscription. In2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 949–960

2018
[30]

Johan Hallberg Szabadváry. 2024. Adaptive conformal inference for multi-step ahead time-series forecasting online.arXiv preprint arXiv:2409.14792(2024)

work page arXiv 2024
[31]

2005.Algorithmic learning in a random world

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. 2005.Algorithmic learning in a random world. Springer

2005
[32]

Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, and Francis Y. Yan
[33]

In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, 149–165. https: //www.usenix.org/conference/nsdi24/presentation/wang-zibo
[34]

Chen Xu and Yao Xie. 2023. Sequential Predictive Conformal Inference for Time Series. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 202. PMLR, 38707–38727. https: //proceedings.mlr.press/v202/xu23r.html

2023
[35]

Margaux Zaffran, Aymeric Dieuleveut, Olivier Féron, Yannig Goude, and Julie Josse. 2022. Adaptive conformal predictions for time series. InInternational Conference on Machine Learning. PMLR, 25834–25866

2022
[36]

Guilin Zhang, Srinivas Vippagunta, Raghavendra Nandagopal, Suchitra Raman, Jeff Xu, Marcus Pfeiffer, Shreeshankar Chatterjee, Ziqi Tan, Wulan Guo, and Hailong Jiang. 2025. AAPA: An Archetype-Aware Predictive Autoscaler with Uncertainty Quantification for Serverless Workloads on Kubernetes.arXiv preprint arXiv:2507.05653(2025)

work page arXiv 2025
[37]

Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-based and QoS-aware resource management for cloud microservices. InProceedings of the 26th ACM international conference on architectural support for programming languages and operating systems. 167–181

2021
[38]

Zhuangzhuang Zhou, Yanqi Zhang, and Christina Delimitrou. 2022. Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows. InProceedings of the 28th ACM International Conference on Archi- tectural Support for Programming Languages and Operating Systems, Volume 1. 1–14

2022
[39]

Ding Zou, Wei Lu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Xiaojin Wang, Kangyu Liu, Kefan Wang, Renen Sun, and Haiqing Wang. 2024. OptScaler: A Collabora- tive Framework for Robust Autoscaling in the Cloud.Proceedings of the VLDB Endowment17, 12 (2024), 4090–4103. https://doi.org/10.14778/3685800.3685829

work page doi:10.14778/3685800.3685829 2024

[1] [1]

2026.Auto Scaling Documentation

Amazon Web Services. 2026.Auto Scaling Documentation. AWS. https://docs. aws.amazon.com/autoscaling/ Accessed: 2026-02-17

2026

[2] [2]

Maddix, Pablo Guer- ron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael Bohlke-Schneider

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guer- ron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael...

[3] [3]

Chronos-2: From Univariate to Universal Forecasting

Chronos-2: From Univariate to Universal Forecasting.arXiv preprint arXiv:2510.15821(2025). https://arxiv.org/abs/2510.15821

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Mad- dix, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Syndar Rangapuram, Sebas- tian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Mad- dix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. 2024. Chronos: Learning the Language ...

2024

[5] [5]

2016.Site reliability engineering: how Google runs production systems

Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016.Site reliability engineering: how Google runs production systems. O’Reilly Media, Inc

2016

[6] [6]

Vivek M Bhasi, Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Taylan Kandemir, and Chita Das. 2021. Kraken: Adaptive container provisioning for deploying dynamic dags in serverless platforms. In Proceedings of the ACM Symposium on Cloud Computing. 153–167

2021

[7] [7]

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time series analysis: forecasting and control. John Wiley & Sons

2015

[8] [8]

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes

[9] [9]

ACM59, 5 (2016), 50–57

Borg, omega, and kubernetes.Commun. ACM59, 5 (2016), 50–57

2016

[10] [10]

Tao Chen, Rami Bahsoon, and Xin Yao. 2018. A survey and taxonomy of self- aware and self-adaptive cloud autoscaling systems.ACM Computing Surveys (CSUR)51, 3 (2018), 1–40

2018

[11] [11]

Valentin Flunkert, Quentin Rebjock, Joel Castellon, Laurent Callot, and Tim Januschowski. 2020. A simple and effective predictive resource scaling heuristic for large-scale cloud applications.arXiv preprint arXiv:2008.01215(2020)

work page arXiv 2020

[12] [12]

Guilherme Galante, Luis Carlos Erpen De Bona, Antonio Roberto Mury, Bruno Schulze, and Rodrigo da Rosa Righi. 2016. An analysis of public clouds elasticity in the execution of scientific applications: a survey.Journal of Grid Computing 14, 2 (2016), 193–216

2016

[13] [13]

Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. InProceedings of the twenty- fourth international conference on architectural support for programming languages and operating systems. 19–33

2019

[14] [14]

Alim Ul Gias, Giuliano Casale, and Murray Woodside. 2019. ATOM: Model- driven autoscaling for microservices. In2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1994–2004

2019

[15] [15]

Isaac Gibbs and Emmanuel Candes. 2021. Adaptive conformal inference under distribution shift.Advances in Neural Information Processing Systems34 (2021), 1660–1672. BACC: Budget-Aware Calibration and Control for Horizontal Autoscaling

2021

[16] [16]

2026.Load Balancing and Autoscaling

Google Cloud Docs. 2026.Load Balancing and Autoscaling. Google Cloud. https:// docs.cloud.google.com/compute/docs/load-balancing-and-autoscaling Accessed: 2026-02-17

2026

[17] [17]

2024.KEDA: Kubernetes Event-Driven Autoscaling

KEDA Project. 2024.KEDA: Kubernetes Event-Driven Autoscaling. CNCF. https: //keda.sh/docs/ Accessed: 2026-02-17

2024

[18] [18]

Nane Kratzke and Peter-Christian Quint. 2017. Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study. Journal of Systems and Software126 (2017), 1–16

2017

[19] [19]

2026.Horizontal Pod Autoscaling

Kubernetes Documentation. 2026.Horizontal Pod Autoscaling. Kuber- netes. https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal- pod-autoscale/ Accessed: 2026-02-17

2026

[20] [20]

Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A Lozano. 2014. A review of auto-scaling techniques for elastic applications in cloud environments.Journal of grid computing12, 4 (2014), 559–592

2014

[21] [21]

Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. 2017. Imbalance in the cloud: An analysis on alibaba cluster trace. In2017 IEEE Inter- national Conference on Big Data (Big Data). IEEE, 2884–2892

2017

[22] [22]

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Jian He, Guodong Yang, and Chengzhong Xu. 2022. Erms: Efficient resource management for shared microservices with SLA guarantees. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 62–77

2022

[23] [23]

Olesia Pozdniakova, Dalius Mažeika, and Aurimas Cholomskis. 2024. SLA- adaptive threshold adjustment for a Kubernetes horizontal pod autoscaler.Elec- tronics13, 7 (2024), 1242

2024

[24] [24]

Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravis- hankar K Iyer. 2020. {FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices. In14th USENIX symposium on operating systems design and implementation (OSDI 20). 805–825

2020

[25] [25]

Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemys- law Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, et al. 2020. Autopilot: workload autoscaling at google. Inproceed- ings of the fifteenth european conference on computer systems. 1–16

2020

[26] [26]

Vighnesh Sachidananda and Anirudh Sivaraman. 2024. Erlang: Application- aware autoscaling for cloud microservices. InProceedings of the Nineteenth European Conference on Computer Systems. 888–923

2024

[27] [27]

Glenn Shafer and Vladimir Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research9 (2008)

2008

[28] [28]

Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In2020 USENIX annual technical conference (USENIX ATC 20). 205–218

2020

[29] [29]

Xiaoyang Sun, Chunming Hu, Renyu Yang, Peter Garraghan, Tianyu Wo, Jie Xu, Jianyong Zhu, and Chao Li. 2018. Rose: Cluster resource scheduling via speculative over-subscription. In2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 949–960

2018

[30] [30]

Johan Hallberg Szabadváry. 2024. Adaptive conformal inference for multi-step ahead time-series forecasting online.arXiv preprint arXiv:2409.14792(2024)

work page arXiv 2024

[31] [31]

2005.Algorithmic learning in a random world

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. 2005.Algorithmic learning in a random world. Springer

2005

[32] [32]

Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, and Francis Y. Yan

[33] [33]

In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, 149–165. https: //www.usenix.org/conference/nsdi24/presentation/wang-zibo

[34] [34]

Chen Xu and Yao Xie. 2023. Sequential Predictive Conformal Inference for Time Series. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 202. PMLR, 38707–38727. https: //proceedings.mlr.press/v202/xu23r.html

2023

[35] [35]

Margaux Zaffran, Aymeric Dieuleveut, Olivier Féron, Yannig Goude, and Julie Josse. 2022. Adaptive conformal predictions for time series. InInternational Conference on Machine Learning. PMLR, 25834–25866

2022

[36] [36]

Guilin Zhang, Srinivas Vippagunta, Raghavendra Nandagopal, Suchitra Raman, Jeff Xu, Marcus Pfeiffer, Shreeshankar Chatterjee, Ziqi Tan, Wulan Guo, and Hailong Jiang. 2025. AAPA: An Archetype-Aware Predictive Autoscaler with Uncertainty Quantification for Serverless Workloads on Kubernetes.arXiv preprint arXiv:2507.05653(2025)

work page arXiv 2025

[37] [37]

Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-based and QoS-aware resource management for cloud microservices. InProceedings of the 26th ACM international conference on architectural support for programming languages and operating systems. 167–181

2021

[38] [38]

Zhuangzhuang Zhou, Yanqi Zhang, and Christina Delimitrou. 2022. Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows. InProceedings of the 28th ACM International Conference on Archi- tectural Support for Programming Languages and Operating Systems, Volume 1. 1–14

2022

[39] [39]

Ding Zou, Wei Lu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Xiaojin Wang, Kangyu Liu, Kefan Wang, Renen Sun, and Haiqing Wang. 2024. OptScaler: A Collabora- tive Framework for Robust Autoscaling in the Cloud.Proceedings of the VLDB Endowment17, 12 (2024), 4090–4103. https://doi.org/10.14778/3685800.3685829

work page doi:10.14778/3685800.3685829 2024