pith. sign in

arxiv: 2511.07663 · v3 · submitted 2025-11-10 · 💻 cs.DB · cs.AI· cs.LG

Cortex AISQL: A Production SQL Engine for Unstructured Data

Pith reviewed 2026-05-17 23:22 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.LG
keywords semantic SQLunstructured dataquery optimizationmodel cascadessemantic joinsLLM inferenceproduction deployment
0
0 comments X

The pith

Cortex AISQL integrates semantic operations into SQL with cost-aware planning, model cascades, and join rewriting to deliver major speedups on unstructured data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Cortex AISQL as a production SQL engine that lets users write declarative queries mixing traditional relational operations with semantic reasoning over unstructured data. Semantic operations create new difficulties because they cost more than standard SQL, exhibit different performance profiles, and have costs and selectivities that cannot be known at compile time. Existing engines lack the machinery to optimize around these traits. The work shows how three techniques drawn from real Snowflake customer workloads overcome the problems and produce consistent speedups while preserving result quality.

Core claim

AISQL addresses the efficiency barriers of semantic operations by treating LLM inference cost as a first-class objective in query planning, by routing the bulk of rows through fast proxy models and escalating only uncertain cases to a stronger oracle model, and by reformulating semantic joins as multi-label classification tasks. The first technique yields 2-8× speedups, the second 2-6× speedups at 90-95 percent of oracle quality, and the third 15-70× speedups with often higher prediction quality. The resulting engine is deployed in production at Snowflake and supports customer workloads in analytics, search, and content understanding.

What carries the argument

Three cooperating techniques: AI-aware query optimization that reasons directly about LLM costs, adaptive model cascades that combine a fast proxy with an oracle model, and semantic join query rewriting that converts quadratic joins into linear multi-label classification.

If this is right

  • Query planners can now treat model inference cost as an explicit, quantifiable objective alongside traditional metrics such as cardinality.
  • Most rows in a semantic pipeline can be handled by cheap proxy models while quality remains close to that of the full oracle model.
  • Semantic joins no longer scale quadratically because the problem is recast as a linear-time classification task.
  • Production SQL engines can support mixed structured-unstructured workloads without requiring users to write custom glue code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cost-modeling and cascade ideas could be adapted to other database systems that embed large language models for data cleaning or enrichment.
  • Collecting telemetry from live deployments appears to be a practical way to calibrate cost and selectivity estimates for semantic operators.
  • Extending the classification reformulation to other set-oriented semantic operations might further reduce the gap between relational and AI-driven query performance.

Load-bearing premise

The three techniques, tuned on Snowflake customer workloads and chosen models, will continue to deliver comparable speedups and quality when applied to different workloads or different underlying language models.

What would settle it

Run the same semantic SQL queries on a fresh set of customer data and models never seen during the original development and measure whether the reported speedups and quality retention still hold.

Figures

Figures reproduced from arXiv: 2511.07663 by Aaron Lin, Anupam Datta, Benjamin Han, Bowei Chen, Boxin Jiang, Dimitris Tsirogiannis, Jay Tayade, Kyle Schmaus, Nathan Wiegand, Nitish Jindal, Paritosh Aggarwal, Pawe{\l} Liskowski, Weicheng Zhao, Zihan Li.

Figure 1
Figure 1. Figure 1: Snowflake architecture with the Cortex Platform [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Percentage composition of AISQL workloads by [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of tables used in AISQL queries. Most [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schema of research papers application. figure with results using the well-known TPC-H benchmark. Finally, the user is interested in compiling a summary of the corresponding abstracts. This query requires inspecting both the textual and image contents of research papers. Today, the user can write the following AISQL query in Snowflake as follows: SELECT AI_SUMMARIZE_AGG(p.abstract) FROM papers p JOIN paper_… view at source ↗
Figure 7
Figure 7. Figure 7: Different execution plans. Plan A optimizes for join [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of two Snowflake tables with sample data. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of optimizing AI predicates with respect to [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Performance comparison of adaptive model cascades on six benchmark datasets. The cascade uses adaptive threshold [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Performance comparison of AI_CLASSIFY rewrite optimization on eight semantic join benchmarks. Left: Execution time (in seconds) with speedup factors annotated. Right: F1 scores with exact values labeled. The rewrite reduces execution time by 15.2–69.5× with mean F1 improving by 44.7%. dataset characteristics, i.e., tasks where the proxy model’s uncer￾tainty correlates well with prediction difficulty benef… view at source ↗
read the original abstract

Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL operations, possess distinct latency and throughput characteristics, and their cost and selectivity are unknown during query compilation. Furthermore, existing query engines are not designed to optimize semantic operations. The AISQL query execution engine addresses these challenges through three novel techniques informed by production deployment data from Snowflake customers. First, AI-aware query optimization treats AI inference cost as a first-class optimization objective, reasoning about large language model (LLM) cost directly during query planning to achieve 2-8$\times$ speedups. Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model while escalating uncertain cases to a powerful oracle model, achieving 2-6$\times$ speedups while maintaining 90-95% of oracle model quality. Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear through reformulation as multi-label classification tasks, achieving 15-70$\times$ speedups with often improved prediction quality. AISQL is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents Cortex AISQL, a production SQL engine integrating native semantic operations into SQL for querying structured and unstructured data. It introduces three techniques informed by Snowflake customer data: AI-aware query optimization (treating LLM costs as first-class, 2-8× speedups), adaptive model cascades (routing via proxy models with escalation to oracle models, 2-6× speedups at 90-95% quality), and semantic join query rewriting (reformulating joins as multi-label classification to reduce quadratic to linear complexity, 15-70× speedups), with the system deployed in production at Snowflake for analytics, search, and content workloads.

Significance. If the empirical claims hold under broader validation, the work could meaningfully advance database systems by making semantic/AI operations first-class citizens in query optimization and execution, potentially enabling scalable hybrid queries over unstructured data and informing cost models for LLM integration in production engines.

major comments (2)
  1. [Abstract] Abstract: The central claims of 2-8×, 2-6×, and 15-70× speedups with quality retention are stated without reference to any tables, figures, workload characterizations (e.g., selectivity or cardinality distributions), error bars, or ablation studies, leaving the load-bearing performance assertions unsupported by visible evidence.
  2. [Deployment section] Deployment section: The techniques are described as informed by and validated on Snowflake customer data, yet no quantitative details are supplied on LLM latency variance, model pair choices, or join cardinalities; this makes it impossible to assess whether the cost models, routing thresholds, and classification reformulations transfer beyond the specific observed distributions.
minor comments (1)
  1. [Abstract] Abstract: Consider adding one sentence clarifying the exact semantic primitives exposed (e.g., semantic similarity, classification, or join predicates) to help readers map the techniques to concrete SQL extensions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's potential impact. We address each major comment below with specific plans for revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of 2-8×, 2-6×, and 15-70× speedups with quality retention are stated without reference to any tables, figures, workload characterizations (e.g., selectivity or cardinality distributions), error bars, or ablation studies, leaving the load-bearing performance assertions unsupported by visible evidence.

    Authors: We agree that the abstract would be strengthened by explicit links to supporting evidence. In the revised manuscript we will update the abstract to reference the evaluation sections, figures, and tables that present workload characterizations (including selectivity and cardinality distributions), error bars, and ablation studies for each technique. This will make the performance claims directly traceable without altering the abstract's length or focus. revision: yes

  2. Referee: [Deployment section] Deployment section: The techniques are described as informed by and validated on Snowflake customer data, yet no quantitative details are supplied on LLM latency variance, model pair choices, or join cardinalities; this makes it impossible to assess whether the cost models, routing thresholds, and classification reformulations transfer beyond the specific observed distributions.

    Authors: The deployment section is based on aggregated production data subject to customer confidentiality constraints, so we cannot release exact per-customer values for latency variance, specific model pairs, or join cardinalities. We will revise the section to include additional aggregated characterizations of the observed distributions (e.g., typical ranges for selectivity, cardinality, and model quality metrics) and a discussion of how the chosen thresholds and reformulations were validated across diverse analytics, search, and content workloads. This will improve assessment of transferability while respecting privacy requirements. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance claims from production deployment

full rationale

The paper presents an implemented SQL engine with three optimization techniques whose reported speedups (2-8×, 2-6×, 15-70×) and quality metrics are stated as observed outcomes from Snowflake production workloads. No equations, parameter fits, or derivations are described that reduce by construction to the reported results themselves. The abstract and description contain no self-citations, uniqueness theorems, or ansatzes that serve as load-bearing premises; the central claims rest on direct deployment measurements rather than self-referential definitions or fitted inputs renamed as predictions. This satisfies the criteria for a self-contained empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated or derivable from the provided text.

pith-pipeline@v0.9.0 · 5617 in / 1112 out tokens · 26324 ms · 2026-05-17T23:22:21.813570+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

    cs.DB 2026-04 conditional novelty 7.0

    PLOP is a cost-based optimizer that finds optimal placements for semantic LLM operators in hybrid query plans via dynamic programming, delivering up to 1.5x speedup and 4.29x cost reduction on 44 benchmark queries whi...

  2. Agent-Aided Design for Dynamic CAD Models

    cs.AI 2026-04 unverdicted novelty 6.0

    AADvark extends agent-aided CAD design to dynamic 3D assemblies with movable parts by integrating constraint solvers and visual feedback to create a verification signal for the agent.

  3. Access Paths for Efficient Ordering with Large Language Models

    cs.DB 2025-08 unverdicted novelty 6.0

    Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 3 Pith papers · 2 internal anchors

  1. [1]

    Anderson, J

    E. Anderson, J. Fritz, A. Lee, B. Li, M. Lindblad, H. Lindeman, A. Meyer, P. Parmar, T. Ranade, M. A. Shah, et al. The design of an llm-powered unstructured analytics system.arXiv preprint arXiv:2409.00847, 2024

  2. [2]

    Arora, B

    S. Arora, B. Yang, S. Eyuboglu, A. Narayan, A. Hojel, I. Trummer, and C. Ré. Language models enable simple systems for generating structured views of heterogeneous data lakes.Proc. VLDB Endow., 17(2):92–105, Oct. 2023

  3. [3]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakan- tan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

  4. [4]

    Chaudhuri and K

    S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177–228, June 1999

  5. [5]

    L. Chen, M. Zaharia, and J. Zou. Frugalgpt: How to use large language models while reducing cost and improving performance, 2023

  6. [6]

    H. Dai, B. Y. Wang, X. Wan, B. Dai, S. Yang, A. Nova, P. Yin, P. M. Phothilimthana, C. Sutton, and D. Schuurmans. Uqe: a query engine for unstructured databases. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY, USA, 2025. Curran Associates Inc

  7. [7]

    J. Ding, U. F. Minhas, J. Yu, C. Wang, J. Do, Y. Li, H. Zhang, B. Chandramouli, J. Gehrke, D. Kossmann, D. Lomet, and T. Kraska. Alex: An updatable adaptive learned index. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD ’20, page 969–984, New York, NY, USA, 2020. Association for Computing Machinery

  8. [8]

    Fernandes and J

    S. Fernandes and J. Bernardino. What is bigquery? InProceedings of the 19th International Database Engineering & Applications Symposium, IDEAS ’15, page 202–203, New York, NY, USA, 2015. Association for Computing Machinery

  9. [9]

    J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. InProceedings of the 1996 ACM SIGMOD International Con- ference on Management of Data, SIGMOD ’96, page 423–434, New York, NY, USA,

  10. [10]

    Association for Computing Machinery

  11. [11]

    J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The madlib analytics library: or mad skills, the sql.Proc. VLDB Endow., 5(12):1700–1711, Aug. 2012. 13 , , Liskowski et al

  12. [12]

    J. M. Hellerstein and M. Stonebraker. Predicate migration: optimizing queries with expensive predicates. InProceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD ’93, page 267–276, New York, NY, USA, 1993. Association for Computing Machinery

  13. [13]

    Hilprecht, A

    B. Hilprecht, A. Schmidt, M. Kulessa, A. Molina, K. Kersting, and C. Binnig. Deepdb: learn from data, not from queries!Proc. VLDB Endow., 13(7):992–1005, Mar. 2020

  14. [14]

    Jo and I

    S. Jo and I. Trummer. Thalamusdb: Approximate query processing on multi-modal data.Proc. ACM Manag. Data, 2(3), May 2024

  15. [15]

    D. Kang, P. Bailis, and M. Zaharia. Blazeit: optimizing declarative aggregation and limit queries for neural network-based video analytics.Proc. VLDB Endow., 13(4):533–546, Dec. 2019

  16. [16]

    D. Kang, E. Gan, P. Bailis, T. Hashimoto, and M. Zaharia. Approximate selection with guarantees using proxies.arXiv preprint arXiv:2004.00827, 2020

  17. [17]

    D. Kang, J. Guibas, P. D. Bailis, T. Hashimoto, and M. Zaharia. Tasti: Semantic indexes for machine learning-based queries over unstructured data. InProceedings of the 2022 International Conference on Management of Data, SIGMOD ’22, page 1934–1947, New York, NY, USA, 2022. Association for Computing Machinery

  18. [18]

    A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning.arXiv preprint arXiv:1809.00677, 2018

  19. [19]

    Kraska, M

    T. Kraska, M. Alizadeh, A. Beutel, E. H. Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan. Sagedb: A learned database system. In9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org, 2019

  20. [20]

    Kraska, A

    T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. InProceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, page 489–504, New York, NY, USA, 2018. Association for Computing Machinery

  21. [21]

    Y. Lin, M. Hulsebos, R. Ma, S. Shankar, S. Zeighami, A. G. Parameswaran, and E. Wu. Towards accurate and efficient document analytics with large language models.ArXiv, abs/2405.04674, 2024

  22. [22]

    C. Liu, M. Russo, M. Cafarella, L. Cao, P. B. Chen, Z. Chen, M. Franklin, T. Kraska, S. Madden, R. Shahout, and G. Vitagliano. Palimpzest: Optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR)

  23. [23]

    S. Liu, J. Xu, W. Tjangnaka, S. Semnani, C. Yu, and M. Lam. SUQL: Conversational search over structured and unstructured data with large language models. In K. Duh, H. Gomez, and S. Bethard, editors,Findings of the Association for Compu- tational Linguistics: NAACL 2024, pages 4535–4555, Mexico City, Mexico, June

  24. [24]

    Association for Computational Linguistics

  25. [25]

    Marcus, P

    R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Making learned query optimization practical.SIGMOD Rec., 51(1):6–13, June 2022

  26. [26]

    Marcus, P

    R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaem- manouil, and N. Tatbul. Neo: a learned query optimizer.Proc. VLDB Endow., 12(11):1705–1718, July 2019

  27. [27]

    Marcus and O

    R. Marcus and O. Papaemmanouil. Deep reinforcement learning for join order enumeration. InProceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM’18, New York, NY, USA, 2018. Association for Computing Machinery

  28. [28]

    Ouyang, J

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instruc- tions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

  29. [29]

    Patel, S

    L. Patel, S. Jha, M. Pan, H. Gupta, P. Asawa, C. Guestrin, and M. Zaharia. Seman- tic operators and their optimization: Enabling llm-based data processing with accuracy guarantees in lotus.Proc. VLDB Endow., 18(11):4171–4184, Sept. 2025

  30. [30]

    Pavlo, G

    A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-driving database management systems. InCIDR, volume 4, page 1, 2017

  31. [31]

    Shankar, T

    S. Shankar, T. Chambers, T. Shah, A. G. Parameswaran, and E. Wu. Docetl: Agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18(9):3035–3048, Sept. 2025

  32. [32]

    Stillger, G

    M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. Leo - db2’s learning optimizer. InProceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, page 19–28, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc

  33. [33]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  34. [34]

    Van Aken, A

    D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic database manage- ment system tuning through large-scale machine learning. InProceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, page 1009–1024, New York, NY, USA, 2017. Association for Computing Machinery

  35. [35]

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

  36. [36]

    Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Heller- stein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation.Proc. VLDB Endow., 13(3):279–292, Nov. 2019

  37. [37]

    M. J. Zellinger and M. Thomson. Rational tuning of llm cascades via probabilistic modeling, 2025

  38. [38]

    Zhang, Y

    J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, M. Ran, and Z. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, page 415–432, New York, NY, USA, 2019. Association for Computing Machinery. 14