arxiv: 2604.20021 · v1 · submitted 2026-04-21 · 💻 cs.LG · cs.CL

Recognition: unknown

Continuous Semantic Caching for Low-Cost LLM Serving

Baran Atalar , Xutong Liu , Jinhang Zuo , Siwei Wang , Wei Chen , Carlee Joe-Wong

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:29 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords semantic cachingLLM response cachingcontinuous query spaceonline learningregret boundskernel ridge regressionepsilon-netcost optimization

0 comments

The pith

A new framework allows semantic caching of LLM responses in continuous query spaces with provable sublinear regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a rigorous theoretical framework for caching LLM responses when queries exist in an infinite continuous embedding space rather than a known discrete set. It introduces dynamic epsilon-net discretization paired with kernel ridge regression to measure uncertainty in cost estimates and to extend observations from served queries to semantically similar ones. Online algorithms built on this foundation adapt the cache over time while controlling the cost of changes, and the analysis shows sublinear regret relative to the best possible continuous policy. This matters because it removes the need to predefine all possible queries, enabling efficient reuse of responses as the range of user inputs grows without bound.

Core claim

By combining dynamic ε-net discretization with Kernel Ridge Regression, the system quantifies uncertainty in continuous query spaces and generalizes partial feedback across semantic neighborhoods. This supports both offline and online algorithms that reduce switching costs, with the online version achieving a sublinear regret bound against an optimal continuous oracle that reduces to prior discrete-model bounds.

What carries the argument

dynamic ε-net discretization coupled with Kernel Ridge Regression, which adaptively discretizes the continuous embedding space to quantify estimation uncertainty and generalize cost feedback across nearby queries

If this is right

The online algorithm achieves a sublinear regret bound against an optimal continuous oracle.
The regret bound reduces to existing bounds when the query space is discrete.
Empirical evaluations show the framework approximates the continuous optimal cache while lowering computational and switching overhead.
Partial feedback on query costs can be generalized across semantic neighborhoods without full observation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the discretization approach scales, it could be adapted to continuous spaces in related problems like dynamic resource allocation for model inference.
The generalization via kernels implies that embedding similarity can substitute for exact query matches in caching decisions.
Real deployment would benefit from testing whether user query streams naturally form low-dimensional manifolds that simplify the epsilon-net construction.

Load-bearing premise

Real-world LLM queries lie in an infinite continuous embedding space where dynamic ε-net discretization with Kernel Ridge Regression can quantify uncertainty and generalize partial feedback across semantic neighborhoods.

What would settle it

Running the online algorithm on a long sequence of real or simulated continuous LLM queries and observing linear growth in cumulative regret instead of sublinear growth would falsify the theoretical bound.

Figures

Figures reproduced from arXiv: 2604.20021 by Baran Atalar, Carlee Joe-Wong, Jinhang Zuo, Siwei Wang, Wei Chen, Xutong Liu.

**Figure 2.** Figure 2: (a) Variation of the suboptimality gap versus stream length in the offline setting (b) Sensitivity analysis showing the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: PCA on the synthetic dataset 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 UMAP Dimension 1 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 UMAP Dimension 2 UMAP Projection: NQ vs TriviaQA Clustering & Stream Arrivals Universe: NQ Universe: TriviaQA Stream: NQ Stream: TriviaQA [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: UMAP on the NQ+TriviaQA dataset to the ℓ𝑇 queries where the LLM was actually invoked (𝑧𝑗 ): E "∑︁𝑇 𝑡=1 Optimistic Gap# ≤ E "∑︁ℓ𝑇 𝑗=1 2𝛽𝑡𝑗 −1𝜎𝑗−1,𝜆 (𝑧𝑗) # (10) Since 𝛽𝑡 ≤ 𝛽𝑇 , applying the Cauchy-Schwarz inequality and the Elliptical Potential Lemma for RKHS yields: ∑︁ℓ𝑇 𝑗=1 𝜎𝑗−1,𝜆 (𝑧𝑗) ≤ vut ℓ𝑇 ∑︁ℓ𝑇 𝑗=1 𝜎 2 𝑗−1,𝜆 (𝑧𝑗) ≤ √︂ 𝑇 2 ln(1 + 1/𝜆) 𝛾𝑇 = O √︁ 𝑇𝛾𝑇 By Lemma 5.3, maintaining the effective centers tri… view at source ↗

**Figure 5.** Figure 5: Cache size vs Suboptimality Gap Ablation for the [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 7.** Figure 7: (a) Variation of the suboptimality gap versus stream length in the offline setting (b) Sensitivity analysis showing the [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Distribution of pairwise distances among stream [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 10.** Figure 10: Nearest-neighbor distance distribution within the [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 14.** Figure 14: CDF of the Token length distribution for the syn [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

**Figure 12.** Figure 12: Cost prediction error of the KRR model The embedding-space distance plots ( [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 19.** Figure 19: Nearest-neighbor distance distribution in embed [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗

**Figure 20.** Figure 20: Nearest-neighbor distance CDF in embedding [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗

**Figure 16.** Figure 16: Token length CDF for Natural Questions, TriviaQA, [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

read the original abstract

As Large Language Models (LLMs) become increasingly popular, caching responses so that they can be reused by users with semantically similar queries has become a vital strategy for reducing inference costs and latency. Existing caching frameworks have proposed to decide which query responses to cache by assuming a finite, known universe of discrete queries and learning their serving costs and arrival probabilities. As LLMs' pool of users and queries expands, however, such an assumption becomes increasingly untenable: real-world LLM queries reside in an infinite, continuous embedding space. In this paper, we establish the first rigorous theoretical framework for semantic LLM response caching in continuous query space under uncertainty. To bridge the gap between discrete optimization and continuous representation spaces, we introduce dynamic $\epsilon$-net discretization coupled with Kernel Ridge Regression. This design enables the system to formally quantify estimation uncertainty and generalize partial feedback on LLM query costs across continuous semantic query neighborhoods. We develop both offline learning and online adaptive algorithms optimized to reduce switching costs incurred by changing the cached responses. We prove that our online algorithm achieves a sublinear regret bound against an optimal continuous oracle, which reduces to existing bounds for discrete query models. Extensive empirical evaluations demonstrate that our framework approximates the continuous optimal cache well while also reducing computational and switching overhead compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable continuous extension of discrete caching via dynamic ε-nets and KRR, but the sublinear regret bound needs the cost function to have bounded RKHS norm, which is not addressed.

read the letter

The main takeaway is that this work moves semantic LLM caching into continuous embedding space. They discretize on the fly with a dynamic ε-net and use kernel ridge regression to estimate costs and uncertainties, then run online algorithms that keep switching costs low. The claimed result is sublinear regret against a continuous oracle that falls back to existing discrete bounds when the query set is finite.

Referee Report

1 major / 1 minor

Summary. The paper introduces the first rigorous theoretical framework for semantic LLM response caching in continuous, infinite embedding spaces. It bridges discrete optimization with continuous representations via dynamic ε-net discretization combined with Kernel Ridge Regression (KRR) to quantify uncertainty and generalize partial cost feedback across semantic neighborhoods. Offline and online adaptive algorithms are developed to minimize switching costs when updating the cache; the online algorithm is proven to achieve sublinear regret against an optimal continuous oracle, with the bound reducing to existing discrete-query results. Empirical evaluations are claimed to show that the framework approximates the continuous optimum while lowering computational and switching overhead relative to prior methods.

Significance. If the central regret analysis holds, the work would supply the first formal sublinear-regret guarantee for continuous semantic caching, directly addressing the scalability limits of discrete-query assumptions as LLM user bases grow. The reduction to known discrete bounds and the use of KRR for neighborhood generalization are technically attractive strengths. However, the result's generality hinges on unstated conditions on the cost function, so the practical significance for arbitrary LLM workloads remains conditional on verification of those conditions.

major comments (1)

[Abstract and theoretical results section] Abstract and theoretical results section: the claimed sublinear regret bound for the online algorithm against the continuous oracle is obtained via dynamic ε-net discretization + KRR. This analysis requires that the unknown query-cost function lie in the RKHS of the chosen kernel with norm bounded independently of the discretization parameter ε; without this, the covering numbers of the ε-net can grow with query diversity or dimension and the regret ceases to be sublinear. The manuscript does not state or verify this bounded-norm condition, yet the abstract asserts the bound 'reduces to existing bounds for discrete query models' without supplying the argument that the continuous-to-discrete reduction preserves sublinearity absent the extra assumption.

minor comments (1)

The abstract refers to 'extensive empirical evaluations' and 'reducing computational and switching overhead compared to existing methods' but supplies no dataset descriptions, baseline implementations, or quantitative tables in the provided text; adding these details would strengthen the empirical claims without altering the theoretical contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address the major comment point by point below and will incorporate the necessary clarifications in the revised version.

read point-by-point responses

Referee: [Abstract and theoretical results section] Abstract and theoretical results section: the claimed sublinear regret bound for the online algorithm against the continuous oracle is obtained via dynamic ε-net discretization + KRR. This analysis requires that the unknown query-cost function lie in the RKHS of the chosen kernel with norm bounded independently of the discretization parameter ε; without this, the covering numbers of the ε-net can grow with query diversity or dimension and the regret ceases to be sublinear. The manuscript does not state or verify this bounded-norm condition, yet the abstract asserts the bound 'reduces to existing bounds for discrete query models' without supplying the argument that the continuous-to-discrete reduction preserves sublinearity absent the extra assumption.

Authors: We agree that the sublinear regret bound relies on the query-cost function belonging to the RKHS with a norm bounded independently of ε; this is a standard condition in kernel methods to ensure controlled covering numbers and sublinear regret. While our theoretical analysis implicitly uses this assumption from the KRR literature, we acknowledge that it is not explicitly stated in the abstract or theoretical results section, and the reduction argument is not detailed there. In the revised manuscript, we will explicitly state the bounded-norm condition and add a concise argument demonstrating that the continuous-to-discrete reduction preserves sublinearity under this assumption, thereby clarifying how the bound reduces to existing discrete-query results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation extends discrete models via standard techniques without self-referential reduction

full rationale

The paper's central result is a sublinear regret bound for an online algorithm in continuous embedding space, derived from dynamic ε-net discretization combined with Kernel Ridge Regression to quantify uncertainty and generalize feedback. This construction reduces to discrete bounds by design of the discretization, but the continuous case adds independent content through RKHS-based neighborhood generalization and handling of switching costs. No steps match the enumerated circularity patterns: the regret analysis is not self-definitional, does not rename a fitted parameter as a prediction, and does not rely on load-bearing self-citations or imported uniqueness theorems. The framework applies standard online learning and kernel methods to the new continuous setting under stated assumptions, remaining self-contained against external benchmarks in discrete caching literature.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based on abstract only; full parameter lists, proof assumptions, and any invented constructs are unavailable.

axioms (2)

domain assumption LLM queries reside in an infinite, continuous embedding space
Explicitly stated as the reason existing discrete models become untenable.
ad hoc to paper Dynamic ε-net discretization coupled with Kernel Ridge Regression quantifies uncertainty and generalizes partial feedback across neighborhoods
Introduced in the abstract as the core bridge between discrete optimization and continuous spaces.

pith-pipeline@v0.9.0 · 5536 in / 1252 out tokens · 37749 ms · 2026-05-10T02:29:26.250008+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 15 canonical work pages

[1]

2026.Continuous Semantic Caching for Low-Cost LLM Serving

Baran Atalar, Xutong Liu, Jinhang Zuo, Siwei Wang, Wei Chen, and Carlee Joe- Wong. 2026.Continuous Semantic Caching for Low-Cost LLM Serving. Technical Report. Carnegie Mellon University. https://research.ece.cmu.edu/lions/Papers/ Semantic_LLM_Caching.pdf

2026
[2]

Hyokyung Bahn. 2005. Web cache management based on the expected cost of web objects.Information and Software Technology47, 9 (2005), 609–621

2005
[3]

Fu Bang. 2023. GPTCache: An Open-Source Semantic Cache for LLM Applications Enabling Faster Answers and Cost Savings.Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)(2023)

2023
[4]

2025.How people use chatgpt

Aaron Chatterji, Thomas Cunningham, David J Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman. 2025.How people use chatgpt. Technical Report. National Bureau of Economic Research

2025
[5]

Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial multi-armed bandit: General framework and applications. InICML. PMLR, 151–159

2013
[6]

Sayak Ray Chowdhury and Aditya Gopalan. 2017. On Kernelized Multi-armed Bandits. arXiv:1704.00445 [cs.LG] https://arxiv.org/abs/1704.00445

work page arXiv 2017
[7]

Xiangxiang Dai, Jin Li, Xutong Liu, Anqi Yu, and John C. S. Lui. 2024. Cost- Effective Online Multi-LLM Selection with Versatile Reward Models.ArXiv abs/2405.16587 (2024). https://api.semanticscholar.org/CorpusID:270063595

work page arXiv 2024
[8]

Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems35 (2022), 30318–30332

2022
[9]

Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar, and Muhammad Ali Gulzar. 2024. MeanCache: User-Centric Semantic Cache for Large Language Model Based Web Services.arXiv:2403.02694(2024)

work page arXiv 2024
[10]

In Gim, Guojun Chen, Seung seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. 2023. Prompt Cache: Modular Attention Reuse for Low-Latency Inference.ArXivabs/2311.04934 (2023). Conference’17, July 2017, Washington, DC, USA Baran Atalar, Xutong Liu, Jinhang Zuo, Siwei Wang, Wei Chen, and Carlee Joe-Wong 0 500 1000 1500 2000 2500 Stream Length 0.00 0....

work page arXiv 2023
[11]

Victor P Il’ev. 2001. An approximation guarantee of the greedy descent algorithm for minimizing a supermodular set function.Discrete Applied Mathematics114, 1-3 (2001), 131–146

2001
[12]

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. InProceedings of the 55th Annual Meeting of the ACL, Regina Barzilay and Min-Yen Kan (Eds.). ACL, Vancouver, Canada, 1601–1611. doi:10.18653/v1/P17-1147

work page doi:10.18653/v1/p17-1147 2017
[13]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research.AC...

2019
[14]

Gonzalez, Haotong Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Haotong Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. Proceedings of the 29th Symposium on Operating Systems Principles(2023)

2023
[15]

Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H Noh, Sang Lyul Min, Yookun Cho, and Chong Sang Kim. 2001. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies.IEEE transactions on Computers50, 12 (2001), 1352–1361

2001
[16]

Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from transformers via speculative decoding. InInternational Conference on Machine Learning. PMLR, 19274–19286

2023
[17]

Jiaxing Li, Chi Xu, Feng Wang, Isaac M von Riedemann, Cong Zhang, and Jiangchuan Liu. 2024. Scalm: Towards semantic caching for automated chat services with large language models. In2024 IEEE/ACM 32nd IWQoS. IEEE, 1–10

2024
[18]

Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn. 2020. An Imitation Learning Approach for Cache Replacement. CoRRabs/2006.16239 (2020). arXiv:2006.16239 https://arxiv.org/abs/2006.16239

work page arXiv 2020
[19]

Xutong Liu, Baran Atalar, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, John C. S. Lui, Wei Chen, and Carlee Joe-Wong. 2026. Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation. arXiv:2508.07675 [cs.LG]

work page arXiv 2026
[20]

Xutong Liu, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John Lui, and Wei Chen. 2025. Offline learning for combinatorial multi-armed bandits. arXiv preprint arXiv:2501.19300(2025)

work page arXiv 2025
[21]

Paschos, Apostolos Destounis, Luigi Vigneri, and George Iosifidis

Georgios S. Paschos, Apostolos Destounis, Luigi Vigneri, and George Iosifidis
[22]

arXiv:1904.09849

Learning to Cache With No Regrets.CoRR(2019). arXiv:1904.09849

work page arXiv 2019
[23]

Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Brad- bury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. 2022. Efficiently Scaling Transformer Inference.ArXiv2211.05102 (2022)

work page arXiv 2022
[24]

Guanqiao Qu, Qiyuan Chen, Wei Wei, Zhengyi Lin, Xianhao Chen, and Kaibin Huang. 2024. Mobile Edge Intelligence for Large Language Models: A Contem- porary Survey.ArXivabs/2407.18921 (2024)

work page arXiv 2024
[25]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of EMNLP 2019 Conference. ACL

2019
[26]

Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, and Vijay Gadepally. 2023. From words to watts: Benchmarking the energy costs of large language model inference. In2023 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–9

2023
[27]

Fu, Zhiqiang Xie, Beidi Chen, Clark W

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark W. Barrett, Joseph Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, and Ce Zhang. 2023. High-throughput Generative Inference of Large Language Models with a Single GPU. InICML

2023
[28]

Asmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, and Weihua Zhu. 2026. Asynchronous verified semantic caching for tiered LLM architectures.arXiv preprint arXiv:2602.13165(2026)

work page arXiv 2026
[29]

Benjamin Spector and Chris Re. 2023. Accelerating llm inference with staged speculative decoding.arXiv preprint arXiv:2308.04623(2023)

work page arXiv 2023
[30]

Ilias Stogiannidis, Stavros Vassos, Prodromos Malakasiotis, and Ion Androut- sopoulos. 2023. Cache me if you can: An online cost-aware teacher-student framework to reduce the calls to large language models.arXiv:2310.13395(2023)

work page arXiv 2023
[31]

Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J Weinberger. 2003. Inequalities for the L1 deviation of the empirical distribution. Hewlett-Packard Labs, Tech. Rep(2003)

2003
[32]

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. 2023. Smoothquant: Accurate and efficient post-training quantization for large language models. InICML. PMLR, 38087–38099

2023
[33]

Jianxin Yan, Wangze Ni, Lei Chen, Xuemin Lin, Peng Cheng, Zhan Qin, and Kui Ren. 2025. Contextcache: Context-aware semantic cache for multi-turn queries in large language models.arXiv preprint arXiv:2506.22791(2025)

work page arXiv 2025
[34]

Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael Jordan, and Jiantao Jiao. 2023. Towards Optimal Caching and Model Selection for Large Model Inference. 36 (2023), 59062–59094. Continuous Semantic Caching for Low-Cost LLM Serving Conference’17, July 2017, Washington, DC, USA A Appendix Here we provide the missing proofs in the main body for t...

2023
[35]

To ensure this holds uni- formly over all 𝑡∈ [𝑇] , we apply a union bound over 𝑇 rounds

Arrival Concentration (E arr): Following the exact derivation using Hoeffding’s inequality and the2 𝑚 subsets from the proof of Lemma 4.2, for a fixed round 𝑡 and an 𝜖-net of size up to 𝑚eff, the variation distance satisfies P(∥ ˆ𝒘𝑡 −𝒘∥ 1 ≥𝜖 𝑏 ) ≤ 2𝑚eff exp(−𝑡𝜖 2 𝑏 /2). To ensure this holds uni- formly over all 𝑡∈ [𝑇] , we apply a union bound over 𝑇 round...
[36]

𝑇∑︁ 𝑡=1 Optimistic Gap # ≤E

Cost Concentration (E cost): Instead of analyzing the fixed-sample KRR bias and noise terms as in Lemma 4.2, the online setting requires an any-time bound. This follows directly from Theorem 2 in [6]. Applying this result with a failure probability of 𝛿/2directly yields Ecost. By the union bound, the two events hold simultaneously with probability at leas...

2017