arxiv: 2604.15861 · v1 · submitted 2026-04-17 · 💻 cs.DB

Recognition: unknown

Compliance in Databases: A Study of Structural Policies and Query Optimization

Ahana Pradhan , Srinivas Karthik , Imtiyazuddin Shaik , Srinivas Vivek

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:25 UTC · model grok-4.3

classification 💻 cs.DB

keywords content-based access controlcompliance policiesquery optimizationdatabase performancepolicy structureaccess control enforcementanalytical benchmarks

0 comments

The pith

The structure of content-based compliance policies can force query optimizers into inefficient plans and change end-to-end database performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a structural framework and policy grammar to describe content-based access control rules that depend on the actual data values in a query. It adds controlled policy workloads to an analytical benchmark so that the same queries can be run under different policy structures. Experiments then measure how the optimizer rewrites and executes those queries. The results indicate that some policy structures lead to more restrictive or more expensive plans than others. This matters because privacy regulations are increasing the use of fine-grained, data-dependent rules inside production databases.

Core claim

The paper introduces a structural framework and expressive policy grammar for modelling content-based compliance policies. It augments an analytical benchmark with structured policy workloads and shows through experiments that policy structure has a decisive impact on optimizer behaviour and end-to-end performance. The work concludes that database systems and optimizers need to become policy-aware in their design.

What carries the argument

The structural framework and expressive policy grammar, which classify policies by their form and let the authors attach them to queries for controlled measurement of optimizer responses.

If this is right

Optimizers will need new rewrite rules or cost models that account for the structural properties of attached policies.
Database benchmarks must incorporate policy workloads to produce realistic performance numbers.
Performance differences will grow as policies become richer and more context-sensitive.
System designers should expose policy structure information to the optimizer rather than treating enforcement as a post-optimization filter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Policy authors could deliberately select structures that keep optimizer choices close to the unconstrained case.
The same structural lens might apply to other forms of data-dependent constraints beyond compliance.
Extending the framework to distributed or cloud databases could reveal whether policy effects compound across nodes.

Load-bearing premise

The structured policy workloads added to the benchmark are representative of real-world content-based compliance policies and their interactions with actual query optimizers.

What would settle it

Measure the same set of analytical queries under a collection of real production compliance policies from an operational database and check whether the performance gaps and plan changes match the patterns observed with the synthetic workloads.

Figures

Figures reproduced from arXiv: 2604.15861 by Ahana Pradhan, Imtiyazuddin Shaik, Srinivas Karthik, Srinivas Vivek.

**Figure 2.** Figure 2: RLS vs. View vs. Query Rewrite for Acyclic Policies [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: RLS vs. View vs. Rewrite on scale factor 10 data [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Black-box RLS vs. View for Cyclic Policy [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: RLS Policy performance in Comm-Engine A Next, we study a composite policy that involves negations. The component predicates are listed above, In the composition, we test EXISTS 𝑝(𝑜) ∧ NOT EXISTS 𝑞(𝑜) where both orders of the predicates are tested for each mechanism. We also used policy-coverage indexes here. The results are listed in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Growing privacy regulations and internal governance mandates are driving demand for fine-grained, context-sensitive access control in data management systems. Among competing approaches, content-based access control -- where access decisions depend on the data values referenced by a query -- is becoming particularly prominent, and is supported directly in modern database engines. While simple content-based predicates often incur negligible overhead, increasingly rich policies can interact in subtle ways with query optimization, leading to significant and poorly understood performance variability. This paper investigates this gap by introducing a structural framework and expressive policy grammar for modelling content-based compliance policies and analysing their impact on query planning and execution in database systems. Building on this framework, we augment an analytical benchmark with structured policy workloads, enabling controlled evaluation of enforcement mechanisms and optimization strategies under combined query - policy workloads. Our experimental results show that policy structure has a decisive impact on optimizer behaviour and end-to-end performance, underscoring the need for policy-aware database and optimizer design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces a structural framework and expressive policy grammar for modeling content-based compliance policies in databases. It augments an analytical benchmark with structured policy workloads to enable controlled evaluation of enforcement mechanisms and optimization strategies. Experimental results are presented showing that policy structure has a decisive impact on optimizer behavior and end-to-end performance, leading to the conclusion that policy-aware database and optimizer design is needed.

Significance. If the experimental findings hold under representative workloads, the paper addresses a timely and practically relevant gap at the intersection of access control and query optimization, driven by privacy regulations. The framework and grammar provide a structured way to analyze policy-query interactions, and the benchmark augmentation supports systematic evaluation. These elements represent a constructive contribution to understanding performance variability in compliance scenarios.

major comments (1)

[Experimental Evaluation] The central experimental claim (that policy structure has decisive impact) rests on the augmented benchmark workloads. However, the description of how the structured policy workloads were selected, their structural features, predicate complexity, and validation against real-world content-based compliance policies is insufficient to establish generalizability. This directly affects whether the observed optimizer behavior and performance effects support the broader call for policy-aware design, as opposed to being artifacts of the synthetic construction.

minor comments (1)

[Abstract] The abstract and introduction could more explicitly name the analytical benchmark being augmented and report quantitative measures (e.g., magnitude of performance variability or specific optimizer plan changes) to strengthen the presentation of results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the relevance of policy-aware query optimization. We address the major comment on experimental evaluation below, agreeing that expanded details will improve clarity and support for our claims.

read point-by-point responses

Referee: [Experimental Evaluation] The central experimental claim (that policy structure has decisive impact) rests on the augmented benchmark workloads. However, the description of how the structured policy workloads were selected, their structural features, predicate complexity, and validation against real-world content-based compliance policies is insufficient to establish generalizability. This directly affects whether the observed optimizer behavior and performance effects support the broader call for policy-aware design, as opposed to being artifacts of the synthetic construction.

Authors: We acknowledge that the current description of workload construction could be more explicit to better establish generalizability. The workloads are generated systematically from the policy grammar defined in Section 3, varying structural parameters such as predicate count (1-5), connective types (conjunctions and disjunctions), and selectivity ranges drawn from TPC-H data distributions. In the revised version we will add a new subsection (5.1) that: (i) details the selection methodology, including how structures were chosen to cover representative compliance patterns; (ii) provides quantitative characterization of features including average predicate complexity (measured by expression depth and operator count) and data-type dependencies; and (iii) includes an explicit mapping table linking synthetic policies to real-world examples from GDPR, HIPAA, and financial regulations as referenced in the introduction. While the workloads remain synthetic to permit controlled isolation of structural effects, this expansion will demonstrate that the observed optimizer behaviors align with documented policy-query interactions rather than being construction artifacts. We will also add a sensitivity analysis varying one structural dimension at a time. revision: yes

Circularity Check

0 steps flagged

No circularity detected in experimental framework and evaluation

full rationale

This is an empirical paper that introduces a structural framework and policy grammar, augments an existing analytical benchmark with synthetic policy workloads, and reports experimental measurements of optimizer behavior and performance. No mathematical derivations, first-principles predictions, or equations are present that could reduce to their own inputs by construction. The provided text contains no self-citations, fitted parameters renamed as predictions, uniqueness theorems, or ansatzes smuggled via prior work. The central claim rests on observed experimental variability rather than any self-referential logic, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5467 in / 929 out tokens · 26853 ms · 2026-05-10T07:25:56.511959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references

[1]

Wikipedia

Enterprise privacy authorization language. Wikipedia. Formal language for enterprise privacy policies
[2]

OASIS Standard, 2013

extensible access control markup language (xacml). OASIS Standard, 2013. Declar- ative attribute-based access control policy language and architecture

2013
[3]

Beedkar, D

K. Beedkar, D. Brekardin, J.-A. Quiané-Ruiz, and V. Markl. Compliant geo- distributed query processing in action. InProceedings of the VLDB Endowment, volume 14, pages 2843–2846. VLDB Endowment, 2021

2021
[4]

Damianou, N

N. Damianou, N. Dulay, E. Lupu, and M. Sloman. Ponder: A language for specify- ing security and management policies for distributed systems. Technical report, Imperial College Research Report, 2001

2001
[5]

Dharwada, H

S. Dharwada, H. Devrani, J. R. Haritsa, and H. Doraiswamy. LITHE: A query rewrite advisor using llms. InProceedings 29th International Conference on Ex- tending Database Technology, EDBT 2026, Tampere, Finland, March 24-27, 2026, pages 233–246. OpenProceedings.org, 2026

2026
[6]

C. J. Hoofnagle, B. van der Sloot, and F. Zuiderveen Borgesius. The european union general data protection regulation: What it is and what it means.Informa- tion & Communications Technology Law, 28(1):65–98, 2019

2019
[7]

Mejri and H

M. Mejri and H. Yahyaoui. Formal specification and integration of distributed security policies.Computer Languages, Systems & Structures, 49:1–35, 2017

2017
[8]

Row-level security (sql server)

Microsoft. Row-level security (sql server). https://learn.microsoft.com/en-us/sql/ relational-databases/security/row-level-security, 2024. Accessed: 2026-02-22

2024
[9]

Row-level security, 2025

Microsoft Corporation. Row-level security, 2025. Microsoft Learn documentation

2025
[10]

Using Oracle Virtual Private Database to Control Data Access, 2025

Oracle. Using Oracle Virtual Private Database to Control Data Access, 2025. Official Oracle documentation on Virtual Private Database (VPD): policies attach predicates to tables, enforcing row/column access control

2025
[11]

Oracle Corporation, 2019

Oracle Corporation.DBMS_RLS. Oracle Corporation, 2019. Accessed: 2026-03-11

2019
[12]

Pappachan, R

P. Pappachan, R. Yus, A. Fard, and S. Mehrotra. Sieve: A middleware approach to scalable access control for database management systems. InProceedings of the VLDB Endowment, volume 13, pages 2424–2437. VLDB Endowment, 2020

2020
[13]

Poepsel-Lemaitre, K

R. Poepsel-Lemaitre, K. Beedkar, and V. Markl. Disclosure-compliant query answering.Proceedings of the ACM on Management of Data, 2(6):233, 2024

2024
[14]

PostgreSQL Global Development Group.Documentation: Row Security Policies,
[15]

Official documentation describing PostgreSQL RLS implementation and semantics
[16]

Accessed: 2026-03-11

PostgreSQL Global Development Group.PostgreSQL 18 Documentation: CREATE VIEW, 2026. Accessed: 2026-03-11

2026
[17]

PostgreSQL Documentation: CREATE FUNCTION

PostgreSQL Global Development Group. PostgreSQL Documentation: CREATE FUNCTION. https://www.postgresql.org/docs/current/sql-createfunction.html,
[18]

See theSECURITY DEFINER/SECURITY INVOKERfunction attribute
[19]

S. J. Rizvi, A. O. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting techniques for fine–grained access control. InProceedings of the ACM SIGMOD International Conference on Management of Data, pages 551–562. ACM, 2004

2004
[20]

Saeed, Q

I. Saeed, Q. Mumtaz, P. Boncz, and S. Al-Kiswany. Understanding and benchmark- ing the impact of gdpr on database workloads.IEEE Data Engineering Bulletin, 42(4):3–15, 2019

2019
[21]

P. K. Schwab, J. Röckl, M. S. Langohr, and K. Meyer-Wegener. Performance evaluation of policy-based sql query classification for data-privacy compliance. Datenbank-Spektrum, 21(3):191–201, 2021

2021
[22]

Understanding row access policies

SnowFlake . Understanding row access policies. https://docs.snowflake.com/en/ user-guide/security-row-intro, 2025. Appendix Examples of Atomic Policy Predicates wrt Section 2.2 (1) Attribute Predicate:In TPC-H setting, a customer may be allowed to see only those lineitems that belong to her orders and are supplied by suppliers from the same nation as the ...

2025