pith. machine review for the scientific record. sign in

arxiv: 2604.16457 · v1 · submitted 2026-04-08 · 💻 cs.DC

Recognition: 2 theorem links

· Lean Theorem

Spot-and-Scoot: Peeking Into Spot Instance Availability

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:36 UTC · model grok-4.3

classification 💻 cs.DC
keywords spot instancesavailability modelingcloud computingspot request probinginterruption predictionAWSAzureworkload simulation
0
0 comments X

The pith

Spot-and-Scoot collects spot instance availability signals at near-zero cost by submitting requests and canceling them before instances start running.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Spot-and-Scoot as a method to monitor cloud spot instance availability without the expense of running the instances themselves. It works by submitting spot requests and canceling them once the provider accepts them for provisioning, which yields binary signals of whether capacity is present. Submitting several requests together at the same time gives a rough count of available capacity. The collected signals are turned into features that support models for describing current availability and forecasting it ahead in time. Simulations with standard database workloads show that following these forecasts can lower the amount of work lost when interruptions occur.

Core claim

Spot-and-Scoot gathers binary availability signals through canceled spot requests across many instance types and regions. Analysis of real interruption events shows that instances of the same type in the same zone tend to fail together within three minutes, which supports treating availability as a binary state. Three features extracted from the signals are combined into models that describe current availability and predict it at future times. A separate simulation using standard workloads indicates that acting on these predictions reduces lost computation relative to running without guidance.

What carries the argument

The Spot-and-Scoot (SnS) probing technique that submits spot requests and cancels them upon acceptance to obtain binary availability signals without incurring instance runtime costs.

Load-bearing premise

The assumption that the outcome of a canceled spot request accurately reflects the capacity that a running instance would experience on the same type and zone.

What would settle it

A direct comparison in which SnS signals report availability but actual long-running instances on the matching types and zones are interrupted at high rates, or the reverse pattern.

Figures

Figures reproduced from arXiv: 2604.16457 by Kyumin Kim, Kyungyong Lee, Moohyun Song, Taeyoon Kim.

Figure 1
Figure 1. Figure 1: Spot instance request lifecycle with capacity check [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SnS signals and actual running instance traces [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative distribution of co-interrupt proximity for [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The SnS Collector periodically probes spot availability across multiple instance types and regions. The Data Pipeline processes collection records into input features for prediction. The Interrupt Predictor produces real-time interruption fore￾casts based on the computed features. SnS Collector. The SnS Collector is implemented on a serverless architecture using AWS Lambda as the primary exe￾cution environ… view at source ↗
Figure 4
Figure 4. Figure 4: Architecture of the SnS system To assess the generality of SnS-derived features across different learning approaches, we compare six classification models. Four models operate on a single data point per prediction such as Logistic Regression, Support Vector Ma￾chine, Random Forest, and XGBoost. The remaining two models operate on a sequence of consecutive data points: Long Short-Term Memory (LSTM) [23] and… view at source ↗
Figure 5
Figure 5. Figure 5: 24-hour spot instance monitoring cost comparison 0.5 0.0 0.5 1.0 Correlation Coefficient 0.00 0.25 0.50 0.75 1.00 CDF (a) Success Rate 0.5 0.0 0.5 1.0 Correlation Coefficient (b) Unfulfilled Ratio 0.5 0.0 0.5 1.0 Correlation Coefficient (c) Contiguous Unfulfilled Time [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Availability modeling performance with SnS-derived [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: F1 macro score by prediction model across varying window sizes [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Total lost computation by strategy. during periods that were in fact available, introducing idle time. As the prediction horizon lengthened, the reduction in lost computation grew while idle time increased. This result indicates a trade-off between loss avoidance and resource utilization. VII. DISCUSSION Conservatism of SnS signals. As observed in the ex￾perimental results, SnS provides a conservative esti… view at source ↗
read the original abstract

Spot instances offer significant cost savings of up to 90% over on-demand prices, making them an attractive resource for large-scale computing workloads. However, understanding their availability dynamics is essential for building systems that tolerate interruptions, and observing this availability directly requires keeping instances running, which incurs costs that scale with the number of monitored instance types and their per-instance price. We propose Spot-and-Scoot (SnS), a cost-efficient method that collects spot instance availability signals by leveraging the cloud provider's provisioning lifecycle. Since the outcome of a spot request is determined before the instance enters the running state, SnS submits requests and cancels them upon provisioning acceptance, collecting binary availability signals at near-zero instance cost. Submitting multiple concurrent requests per measurement point further yields a quantitative estimate of available capacity. We validate SnS through simultaneous collection of probing signals and actual running instance traces across 68 instance types and 15 regions on both AWS and Azure, totaling 336,033 spot requests. Analysis of 2,635 real-world interruption events reveals that co-interruptions within the same instance type and availability zone occur within three minutes in over 92% of cases, motivating a binary availability formulation. Based on this formulation, we derive three complementary features from SnS signals and demonstrate that their combination achieves an F1-macro score of up to 0.90 for current availability modeling and maintains 0.85 at a 60-minute prediction horizon. A trace-driven simulation using TPC-DS workloads further demonstrates the potential of SnS-based prediction to reduce lost computation compared to an unguided baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Spot-and-Scoot (SnS), a cost-efficient probing technique for spot instance availability that submits requests and cancels them upon provisioning acceptance to collect binary signals without incurring running costs. It validates this approach by simultaneous collection with running instance traces across 68 instance types, 15 regions, and 336,033 requests on AWS and Azure. Analysis of 2,635 interruption events shows that 92% of co-interruptions occur within 3 minutes, justifying a binary availability model per type and availability zone. Three features derived from SnS signals are used to model current availability with up to 0.90 F1-macro and predict at 60-minute horizon with 0.85 F1, and a TPC-DS trace-driven simulation shows reduced lost computation compared to baseline.

Significance. If the results hold, this provides a practical, scalable way to monitor and predict spot instance availability at low cost, which is significant for cloud users relying on spot instances for cost savings. The large-scale empirical validation across many types and regions, combined with real interruption data and workload simulation, offers concrete evidence of utility and could enable more robust interruption-tolerant systems.

major comments (3)
  1. [Validation / Experiments] The equivalence between SnS probe signals (submit-then-cancel) and actual running-instance capacity is load-bearing for all downstream claims; the simultaneous collection across 68 types is cited as validation, but quantitative agreement metrics (e.g., per-type precision/recall or discrepancy rates between probe outcomes and trace availability) must be reported explicitly to rule out systematic bias from the probing method.
  2. [Interruption Analysis] The binary availability formulation rests on the observation that 92% of 2,635 co-interruption events occur within 3 minutes; while this supports the model in the collected traces, the paper must address generalizability (e.g., sensitivity to request duration, account history, or provider allocation changes) with either additional analysis or a concrete test, as this premise directly enables the feature derivation and F1 results.
  3. [Availability Modeling / Prediction] The three complementary features derived from SnS signals achieve the reported F1-macro scores of 0.90 (current) and 0.85 (60 min); their exact definitions, computation from the quantitative capacity estimates obtained via concurrent requests, and the training procedure (algorithm, hyperparameters, cross-validation) need fuller specification in the modeling section for reproducibility and to confirm the scores are not sensitive to implementation details.
minor comments (2)
  1. [Abstract] The abstract states that multiple concurrent requests yield a quantitative capacity estimate, but the precise mapping from number of acceptances to the capacity value used in features should be clarified for readers.
  2. [Simulation] In the TPC-DS simulation, specify the exact unguided baseline policy and the definition of 'lost computation' (e.g., work completed before interruption) to allow direct comparison with other schedulers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each of the major comments in detail below, providing clarifications and committing to specific revisions that will enhance the manuscript's rigor and reproducibility.

read point-by-point responses
  1. Referee: The equivalence between SnS probe signals (submit-then-cancel) and actual running-instance capacity is load-bearing for all downstream claims; the simultaneous collection across 68 types is cited as validation, but quantitative agreement metrics (e.g., per-type precision/recall or discrepancy rates between probe outcomes and trace availability) must be reported explicitly to rule out systematic bias from the probing method.

    Authors: We acknowledge that explicit quantitative metrics would strengthen the validation of the equivalence between SnS probes and running instances. Although the manuscript describes the simultaneous collection across 68 types and 336,033 requests, we did not include per-type agreement statistics. In the revision, we will add a new table and accompanying text in the validation section reporting quantitative metrics such as per-type and per-region precision, recall, F1 scores, and discrepancy rates between probe signals and trace availability. This will be computed directly from the existing dataset to rule out systematic bias. revision: yes

  2. Referee: The binary availability formulation rests on the observation that 92% of 2,635 co-interruption events occur within 3 minutes; while this supports the model in the collected traces, the paper must address generalizability (e.g., sensitivity to request duration, account history, or provider allocation changes) with either additional analysis or a concrete test, as this premise directly enables the feature derivation and F1 results.

    Authors: We agree that generalizability of the binary model is an important consideration. Our current analysis is based on 2,635 events from diverse instance types and regions, but we will revise the interruption analysis section to include additional discussion on potential sensitivities. Specifically, we will present a sensitivity analysis by recomputing the co-interruption statistics for different time windows (e.g., 1, 3, and 5 minutes) and discuss how factors like request duration might influence the 92% figure. We will also add a limitations paragraph noting that while our multi-provider, multi-region data provides broad coverage, account history and future allocation changes are beyond the scope of this study and represent opportunities for future work. revision: partial

  3. Referee: The three complementary features derived from SnS signals achieve the reported F1-macro scores of 0.90 (current) and 0.85 (60 min); their exact definitions, computation from the quantitative capacity estimates obtained via concurrent requests, and the training procedure (algorithm, hyperparameters, cross-validation) need fuller specification in the modeling section for reproducibility and to confirm the scores are not sensitive to implementation details.

    Authors: We concur that fuller specification of the modeling approach is required for reproducibility. We will substantially expand the modeling section to provide: exact definitions of the three features with formulas showing how they are derived from the quantitative capacity estimates (obtained via concurrent requests); the machine learning algorithm employed along with all hyperparameters and their selection process; and details of the training and evaluation procedure, including the cross-validation strategy (e.g., temporal splits to avoid data leakage). Additionally, we will include a sensitivity analysis demonstrating that the reported F1-macro scores are robust to reasonable variations in these parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent empirical validation and measured model performance

full rationale

The derivation collects SnS probe signals and simultaneous running-instance traces (336k requests across 68 types/15 regions) to empirically validate probe-to-running equivalence. Co-interruption statistics (92% within 3 min from 2635 events) motivate but do not define the binary formulation. Three features are extracted from the signals; an ML model is trained and evaluated to produce the reported F1-macro scores (0.90 current, 0.85 at 60 min) on the collected data. The TPC-DS trace-driven simulation applies the model to external workloads and measures lost computation against an unguided baseline. None of these steps reduce by construction to fitted parameters or self-referential definitions; the performance numbers are measured outcomes, not tautological outputs of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach is presented as an empirical measurement technique validated against real traces.

pith-pipeline@v0.9.0 · 5595 in / 1251 out tokens · 24420 ms · 2026-05-10T18:36:36.419041+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 10 canonical work pages

  1. [1]

    Usage patterns and the economics of the public cloud,

    C. Kilcioglu, J. M. Rao, A. Kannan, and R. P. McAfee, “Usage patterns and the economics of the public cloud,” inProceedings of the 26th International Conference on World Wide Web, ser. WWW ’17. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2017, p. 83–91. [Online]. Available: https://doi.org/10.1145/30389...

  2. [2]

    From cloud computing to sky computing,

    I. Stoica and S. Shenker, “From cloud computing to sky computing,” inProceedings of the Workshop on Hot Topics in Operating Systems, ser. HotOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 26–32. [Online]. Available: https://doi.org/10.1145/3458336.3465301

  3. [3]

    SkyPilot: An intercloud broker for sky computing,

    Z. Yang, Z. Wu, M. Luo, W.-L. Chiang, R. Bhardwaj, W. Kwon, S. Zhuang, F. S. Luan, G. Mittal, S. Shenker, and I. Stoica, “SkyPilot: An intercloud broker for sky computing,” in20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 437–455. [Online]. Available: https://www.usenix.org/co...

  4. [4]

    Can’t be late: Optimizing spot instance savings under deadlines,

    Z. Wu, W.-L. Chiang, Z. Mao, Z. Yang, E. Friedman, S. Shenker, and I. Stoica, “Can’t be late: Optimizing spot instance savings under deadlines,” in21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). Santa Clara, CA: USENIX Association, Apr. 2024, pp. 185–203. [Online]. Available: https://www.usenix.org/conference/nsdi24/present...

  5. [5]

    Parcae: Proactive, Liveput-Optimized DNN training on preemptible instances,

    J. Duan, Z. Song, X. Miao, X. Xi, D. Lin, H. Xu, M. Zhang, and Z. Jia, “Parcae: Proactive, Liveput-Optimized DNN training on preemptible instances,” in21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). Santa Clara, CA: USENIX Association, Apr. 2024, pp. 1121–1139. [Online]. Available: https://www.usenix.org/conference/nsdi24/p...

  6. [6]

    Making cloud spot instance interruption events visible,

    K. Kim and K. Lee, “Making cloud spot instance interruption events visible,” inProceedings of the ACM on Web Conference 2024, ser. WWW ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 2998–3009. [Online]. Available: https://doi.org/10.1145/3589334.3645548

  7. [7]

    Autobot: Resilient and cost-effective scheduling of a bag of tasks on spot vms,

    P. Varshney and Y . Simmhan, “Autobot: Resilient and cost-effective scheduling of a bag of tasks on spot vms,”IEEE Transactions on Parallel & Distributed Systems, vol. 30, no. 07, pp. 1512–1527, jul 2019

  8. [8]

    Skyserve: Serving ai models across regions and clouds with spot instances,

    Z. Mao, T. Xia, Z. Wu, W.-L. Chiang, T. Griggs, R. Bhardwaj, Z. Yang, S. Shenker, and I. Stoica, “Skyserve: Serving ai models across regions and clouds with spot instances,” inProceedings of the Twentieth European Conference on Computer Systems, ser. EuroSys ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 159–175. [Online]. Availabl...

  9. [9]

    Deconstructing amazon ec2 spot instance pricing,

    O. Agmon Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. Tsafrir, “Deconstructing amazon ec2 spot instance pricing,”ACM Trans. Econ. Comput., vol. 1, no. 3, sep 2013. [Online]. Available: https://doi.org/10.1145/2509413.2509416

  10. [10]

    The price is (not) right: Reflections on pricing for transient cloud servers,

    D. Irwin, P. Shenoy, P. Ambati, P. Sharma, S. Shastri, and A. Ali- Eldin, “The price is (not) right: Reflections on pricing for transient cloud servers,” in2019 28th International Conference on Computer Communication and Networks (ICCCN), 2019, pp. 1–9

  11. [11]

    Deepspotcloud: Leveraging cross-region gpu spot instances for deep learning,

    K. Lee and M. Son, “Deepspotcloud: Leveraging cross-region gpu spot instances for deep learning,” in2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 2017, pp. 98–105

  12. [12]

    Bamboo: Making preemptible instances resilient for affordable training of large DNNs,

    J. Thorpe, P. Zhao, J. Eyolfson, Y . Qiao, Z. Jia, M. Zhang, R. Netravali, and G. H. Xu, “Bamboo: Making preemptible instances resilient for affordable training of large DNNs,” in20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 497–513. [Online]. Available: https://www.usenix.or...

  13. [13]

    Gutstein, Jonathan Woodruff, Jessica Clarke, Peter Rugg, Brooks Davis, Mark Johnston, Robert Norton, David Chisnall, Simon W

    X. Miao, C. Shi, J. Duan, X. Xi, D. Lin, B. Cui, and Z. Jia, “Spotserve: Serving generative large language models on preemptible instances,” inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ser. ASPLOS ’24. New York, NY , USA: Association for Computing Machinery, 202...

  14. [14]

    Skynomad: On using multi-region spot instances to minimize ai batch job cost.arXiv preprint arXiv:2601.06520,

    Z. Li, T. Xia, Z. Mao, Z. Zhou, E. J. Jackson, J. Kerney, Z. Wu, P. Mishra, Y . Xu, Y . Qiao, S. Shenker, and I. Stoica, “Skynomad: On using multi-region spot instances to minimize ai batch job cost,” 2026. [Online]. Available: https://arxiv.org/abs/2601.06520

  15. [15]

    Spotlake: Diverse spot instance dataset archive service,

    S. Lee, J. Hwang, and K. Lee, “Spotlake: Diverse spot instance dataset archive service,” in2022 IEEE International Symposium on Workload Characterization (IISWC). Los Alamitos, CA, USA: IEEE Computer Society, nov 2022, pp. 242–255. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/IISWC55918.2022.00029

  16. [16]

    Performance and behavior characterization of amazon ec2 spot instances,

    T.-P. Pham, S. Ristov, and T. Fahringer, “Performance and behavior characterization of amazon ec2 spot instances,” in2018 IEEE 11th International Conference on Cloud Computing (CLOUD), 2018, pp. 73– 81

  17. [17]

    Spot instance interruption notices,

    Amazon Web Services, “Spot instance interruption notices,” https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance- termination-notices.html, 2026, accessed: 2026-03-22

  18. [18]

    Use azure spot virtual machines,

    Microsoft, “Use azure spot virtual machines,” https://learn.microsoft.com/en-us/azure/virtual-machines/spot-vms, 2026, accessed: 2026-03-22

  19. [19]

    Spot vms,

    Google, “Spot vms,” https://docs.cloud.google.com/compute/docs/instances/spot, 2026, accessed: 2026-03-22

  20. [20]

    Deepvm: Integrating spot and on-demand vms for cost-efficient deep learning clusters in the cloud,

    Y . Kim, K. Kim, Y . Cho, J. Kim, A. Khan, K.-D. Kang, B.-S. An, M.-H. Cha, H.-Y . Kim, and Y . Kim, “Deepvm: Integrating spot and on-demand vms for cost-efficient deep learning clusters in the cloud,” in2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2024, pp. 227–235

  21. [21]

    Snape: Reliable and low-cost computing with mixture of spot and on-demand vms,

    F. Yang, L. Wang, Z. Xu, J. Zhang, L. Li, B. Qiao, C. Couturier, C. Bansal, S. Ram, S. Qin, Z. Ma, I. n. Goiri, E. Cortez, T. Yang, V . R ¨uhle, S. Rajmohan, Q. Lin, and D. Zhang, “Snape: Reliable and low-cost computing with mixture of spot and on-demand vms,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming ...

  22. [22]

    Proceedings of the 22nd

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: https://doi.org/10.1145/2939672.2939785

  23. [23]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

  24. [24]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://pr...

  25. [25]

    The making of tpc-ds,

    R. O. Nambiar and M. Poess, “The making of tpc-ds,” inProceedings of the 32nd International Conference on Very Large Data Bases, ser. VLDB ’06. VLDB Endowment, 2006, p. 1049–1058