arxiv: 2604.19057 · v1 · submitted 2026-04-21 · 💻 cs.DB · cs.DC

Recognition: unknown

Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries

Chandra Biksheswaran Mouleeswaran, Prashant Kumar Pathak, Rama Teja Repaka

Pith reviewed 2026-05-10 01:55 UTC · model grok-4.3

classification 💻 cs.DB cs.DC

keywords multi-tenant databasesquery partitioningbuffer cache pressurelatency reductiondynamic predicatesheuristic plan selectioncloud query optimizationproduction database scaling

0 comments

The pith

A heuristic system partitions broad multi-tenant queries on the fly to keep each piece in memory, cutting P95 latency from 61 seconds to 2 seconds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that broad queries spanning many tenants in large cloud databases suffer from high latency mainly because they evict each other's data from the buffer cache. By splitting each query into smaller pieces that each fit in memory using a heuristic to choose how to split and what plan to use, the system avoids this problem while still getting all the results. This matters because it lets operators handle more traffic without buying bigger hardware or changing their data layout. The approach works at query time so it needs no permanent changes to the tables or indexes.

Core claim

The authors demonstrate that the dominant cause of slow queries is buffer cache pressure from concurrent broad scans, and introduce a system that partitions the query space on the fly by adding temporary filters, chooses the best splits and execution plans with a two-phase heuristic, and tracks progress across partitions using tokens on the client side. In tests this yields large drops in response time and resource use, and in real production traffic the P95 latency falls from 61 seconds to 2 seconds for eligible queries.

What carries the argument

The Heuristic Search Space Partitioning System (HSSPS) that uses dynamic predicate injection to logically divide the search space, a two-phase heuristic engine to select partition keys and score plans, and a client-side page token to maintain cross-partition state without server sessions.

If this is right

P95 latency falls 50-97 percent across query types, with 95-97 percent gains on high-cardinality queries.
Throughput rises 8-10 times while average active sessions drop by a factor of 41.
Production traffic sustains the 61-second to 2-second improvement across more than 14,000 eligible queries per window.
The method applies to any multi-tenant system running broad queries against large shared tables where schema changes are impractical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Client-side tokens for state could let the same technique scale across replica sets without requiring session stickiness on the database servers.
If the heuristic remains stable, operators might rely less on per-tenant indexing or materialized views to control cache behavior.
The approach might extend to analytics workloads on shared data lakes where similar cache-pressure patterns appear during wide scans.

Load-bearing premise

The heuristic can consistently pick partition keys and plans that keep each sub-query's working set memory-resident without missing results or adding prohibitive overhead from repeated executions.

What would settle it

Run the same broad queries on a loaded multi-tenant database both with and without the partitioning layer and check whether P95 latency stays near 2 seconds while returning complete, duplicate-free results.

Figures

Figures reproduced from arXiv: 2604.19057 by Chandra Biksheswaran Mouleeswaran, Prashant Kumar Pathak, Rama Teja Repaka.

**Figure 2.** Figure 2: Database resource comparison. HSSPS processes 4.7 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Production rollout results. P95 and average latency decrease from 61s [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Large-scale cloud security platforms must continuously query millions of structured cloud resource records distributed across thousands of tenant accounts. Broad, account-spanning queries saturate database infrastructure, producing P95 latencies exceeding 60 seconds. We identify buffer cache pressure as the dominant latency driver: in a controlled experiment, the same query executing with the same plan completed in 3.7 seconds when its working set was memory-resident and 94 seconds when concurrent load had evicted those pages. No query plan optimization can address this; the only effective intervention is reducing the number of pages each query must touch. We present the Heuristic Search Space Partitioning System (HSSPS), a query-time optimization layer that logically partitions the search space through dynamic predicate injection, without schema modification. A two-phase heuristic engine selects partition key values and scores candidate query plans before execution. A client-side page token maintains cross-partition traversal state without server-side sessions, enabling horizontal scalability. Controlled evaluation across representative query types demonstrates 50-97% P95 latency reduction (95-97% on high-cardinality queries), 8-10x throughput improvement, and 41x reduction in average active sessions. Production rollout across live multi-tenant traffic reduced P95 latency from 61s to 2s across successive releases, sustained over 14,000 eligible queries per measurement window. The technique generalizes to any multi-tenant system where broad queries execute against large shared databases and physical schema modification is impractical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HSSPS gives a practical runtime partitioning method for broad multi-tenant queries that cuts latency in controlled tests, but the big production win is hard to credit cleanly to the new technique.

read the letter

The paper shows a workable heuristic for splitting broad queries across partitions in multi-tenant cloud databases to keep data in cache and lower latency. The controlled results look credible, but the production numbers are harder to trust because of how they were measured. What stands out is the focus on buffer cache pressure as the main issue rather than query plans. They demonstrate that the same query runs much faster when pages stay in memory, which makes sense. The HSSPS approach injects predicates at runtime to limit each subquery's scope, picks partitions with a two-phase heuristic, and uses page tokens on the client to track progress without server sessions. That last part helps with scaling. The controlled evaluation reports 50-97% P95 latency cuts, especially on high-cardinality queries, plus big gains in throughput and fewer active sessions. Those numbers come from representative workloads and seem like a fair test of the idea. The production rollout is the weak part. Latency dropped from 61 seconds to 2 seconds over successive releases on more than 14,000 queries, but since other things likely changed at the same time, you can't pin the improvement on the partitioning alone. The stress test note is right on this. A controlled toggle or A/B test would have made the claim stronger. Details on the exact heuristic logic, how it avoids missing data, or handling skewed tenant sizes are thin. If the full paper has more on those, it would help, but based on what's here the method might need tuning per workload. This work is aimed at engineers and DBAs dealing with large shared databases where schema changes aren't an option. Readers working on similar cloud query problems could pick up useful tactics from the partitioning and state management approach. I'd recommend sending it for peer review. The idea is practical and the controlled data supports it, even if the production evidence needs tightening up.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Heuristic Search Space Partitioning System (HSSPS), a query-time optimization layer for broad, account-spanning queries over large multi-tenant cloud resource databases. It attributes high P95 latencies primarily to buffer cache pressure and proposes logical partitioning of the search space via dynamic predicate injection, driven by a two-phase heuristic that selects partition keys and scores plans, combined with client-side page tokens for stateless cross-partition traversal. Controlled experiments on representative workloads report 50-97% P95 latency reductions (95-97% for high-cardinality queries), 8-10x throughput gains, and 41x fewer active sessions; a production rollout across successive releases is reported to have lowered P95 latency from 61s to 2s on over 14,000 eligible queries per window.

Significance. If the empirical claims hold under controlled conditions, the work provides a practical, schema-agnostic technique for mitigating buffer-cache-induced latency in shared multi-tenant databases, which could generalize to other large-scale query systems where physical partitioning is impractical. The emphasis on client-side state management for horizontal scalability and the identification of cache pressure as the dominant factor (supported by the 3.7s vs. 94s controlled contrast) are notable strengths for systems-oriented database research.

major comments (2)

[Abstract / Production rollout description] Abstract and production evaluation: the headline claim that rollout 'across successive releases' reduced P95 latency from 61s to 2s on >14,000 queries does not isolate the contribution of HSSPS. Successive releases typically include concurrent changes (schema tweaks, index additions, query rewrites, or infrastructure updates), so the before/after observational comparison cannot confidently attribute the 30x improvement to the two-phase heuristic and predicate injection alone. The controlled experiments (50-97% reductions) use synthetic or representative workloads rather than the exact live traffic mix and lack a toggle/rollback design.
[Heuristic engine / two-phase selection] Heuristic description (likely §3 or §4): the two-phase heuristic engine is central to the approach yet the manuscript provides insufficient detail on partition-key selection logic, the exact scoring function for candidate plans, data-skew handling, and mechanisms ensuring result completeness without repeated executions or missed tuples. These omissions are load-bearing because the core claim is that the heuristic consistently keeps each sub-query's working set memory-resident; without them, reproducibility and verification of the 95-97% high-cardinality gains are limited.

minor comments (1)

[Evaluation sections] The abstract and evaluation sections would benefit from explicit statistical controls or variance reporting for the latency measurements (e.g., number of runs, confidence intervals) to strengthen the buffer-cache contrast (3.7s vs. 94s).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. We address each major comment below, agreeing where the manuscript requires clarification or expansion and outlining the corresponding revisions.

read point-by-point responses

Referee: [Abstract / Production rollout description] Abstract and production evaluation: the headline claim that rollout 'across successive releases' reduced P95 latency from 61s to 2s on >14,000 queries does not isolate the contribution of HSSPS. Successive releases typically include concurrent changes (schema tweaks, index additions, query rewrites, or infrastructure updates), so the before/after observational comparison cannot confidently attribute the 30x improvement to the two-phase heuristic and predicate injection alone. The controlled experiments (50-97% reductions) use synthetic or representative workloads rather than the exact live traffic mix and lack a toggle/rollback design.

Authors: We agree that the production rollout description is observational and cannot isolate HSSPS from other concurrent changes across releases. The controlled experiments constitute the primary quantitative evidence. In the revised manuscript we will revise the abstract and production section to explicitly distinguish the observational before/after numbers from the controlled results, note the lack of a toggle/rollback design as a limitation of the live deployment data, and avoid implying direct attribution of the full 30x reduction solely to HSSPS. revision: yes
Referee: [Heuristic engine / two-phase selection] Heuristic description (likely §3 or §4): the two-phase heuristic engine is central to the approach yet the manuscript provides insufficient detail on partition-key selection logic, the exact scoring function for candidate plans, data-skew handling, and mechanisms ensuring result completeness without repeated executions or missed tuples. These omissions are load-bearing because the core claim is that the heuristic consistently keeps each sub-query's working set memory-resident; without them, reproducibility and verification of the 95-97% high-cardinality gains are limited.

Authors: We acknowledge that the current description of the two-phase heuristic lacks the level of detail needed for reproducibility. In the revised manuscript we will expand the relevant sections to specify: the partition-key selection criteria (including cardinality thresholds and skew detection via histogram analysis), the precise scoring function (including the cost model and weighting of memory residency estimates), skew-handling logic (adaptive bucket sizing and fallback to finer-grained keys), and completeness guarantees (deterministic ordering combined with client-side page tokens that ensure exhaustive, duplicate-free traversal without re-execution). revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical system description with no derivation chain

full rationale

The paper presents HSSPS as a heuristic query optimization layer relying on dynamic predicate injection and a two-phase engine. All claims rest on controlled experiments (50-97% P95 reductions) and an observational production rollout (61s to 2s P95). No equations, parameters fitted to data, self-definitional constructs, or load-bearing self-citations appear. The central results are direct measurements rather than any chain that reduces to its own inputs by construction. This is the expected non-finding for an applied systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper describes an applied heuristic system rather than a formal model. No free parameters, axioms, or invented entities are explicitly defined in the abstract.

pith-pipeline@v0.9.0 · 5579 in / 1027 out tokens · 47854 ms · 2026-05-10T01:55:38.417010+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach
cs.DB 2026-04 unverdicted novelty 4.0

A HistGradientBoosting model predicts pre-execution BigQuery slot-time from query complexity, volume, and text features, cutting error 30-37% versus baselines on significant queries while failing on long-tail outliers.

Reference graph

Works this paper leans on

13 extracted references · cited by 1 Pith paper

[1]

Eddies: Continuously adaptive query processing,

R. Avnur and J. M. Hellerstein, “Eddies: Continuously adaptive query processing,” inProc. ACM SIGMOD, 2000, pp. 261–272

2000
[2]

Spanner: Google’s globally-distributed database,

J. C. Corbett et al., “Spanner: Google’s globally-distributed database,” inProc. USENIX OSDI, 2012, pp. 251–264

2012
[3]

Schism: A workload- driven approach to database replication and partitioning,

C. Curino, E. Jones, Y . Zhang, and S. Madden, “Schism: A workload- driven approach to database replication and partitioning,”PVLDB, vol. 3, no. 1, pp. 48–57, 2010

2010
[4]

Towards multi-tenant performance SLOs,

W. Lang, S. Shankar, J. M. Patel, and A. Kalhan, “Towards multi-tenant performance SLOs,” inProc. IEEE ICDE, 2012, pp. 702–713

2012
[5]

Access path selection in a relational database management system,

P. G. Selinger et al., “Access path selection in a relational database management system,” inProc. ACM SIGMOD, 1979, pp. 23–34

1979
[6]

Neo: A learned query optimizer,

R. Marcus et al., “Neo: A learned query optimizer,”PVLDB, vol. 12, no. 11, pp. 1705–1718, 2019

2019
[7]

F1: A distributed SQL database that scales,

J. Shute et al., “F1: A distributed SQL database that scales,”PVLDB, vol. 6, no. 11, pp. 1068–1079, 2013

2013
[8]

Opti- mizing queries with materialized views,

S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim, “Opti- mizing queries with materialized views,” inProc. IEEE ICDE, 1995, pp. 190–200

1995
[9]

Bao: Making learned query optimization practical,

R. Marcus et al., “Bao: Making learned query optimization practical,” inProc. ACM SIGMOD, 2021, pp. 1275–1288

2021
[10]

Learned cardinalities: Estimating correlated joins with deep learning,

A. Kipf, T. Kipf, B. Radke, V . Leis, P. Boncz, and A. Kemper, “Learned cardinalities: Estimating correlated joins with deep learning,” inProc. CIDR, 2019

2019
[11]

Amazon Aurora: Design considerations for high throughput cloud-native relational databases,

A. Verbitski et al., “Amazon Aurora: Design considerations for high throughput cloud-native relational databases,” inProc. ACM SIGMOD, 2017, pp. 1041–1052

2017
[12]

Heuristic database querying with dynamic partitioning,

P. K. Pathak and C. B. Mouleeswaran, “Heuristic database querying with dynamic partitioning,” U.S. Patent US11941006B2, Mar. 26, 2024

2024
[13]

Heuristic database querying with dynamic partitioning,

C. B. Mouleeswaran, A. Agarwal, P. K. Pathak, and X. Wang, “Heuristic database querying with dynamic partitioning,” U.S. Patent US12373434B2, Jul. 29, 2025

2025