Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

Tao Yu , Rui Zhang , Kai Yang , Michihiro Yasunaga , Dongxu Wang , Zifan Li , James Ma , Irene Li

show 4 more authors

Qingning Yao Shanelle Roman Zilin Zhang Dragomir Radev

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords complexspiderparsingsemantictaskcross-domaindatabasedataset

0 comments

read the original abstract

We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task where different complex SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new database schemas. Spider is distinct from most of the previous semantic parsing tasks because they all use a single database and the exact same programs in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 12.4% exact matching accuracy on a database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at https://yale-lily.github.io/spider

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
cs.CR 2026-05 unverdicted novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding
cs.CL 2026-04 unverdicted novelty 6.0

TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL
cs.AI 2026-04 unverdicted novelty 6.0

Modular curriculum learning with tier-specific adapters outperforms standard fine-tuning on complex Text-to-SQL queries in Spider and BIRD benchmarks by avoiding catastrophic forgetting.
InCoder-32B-Thinking: Industrial Code World Model for Thinking
cs.AR 2026-04 unverdicted novelty 6.0

InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.
Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions
cs.DB 2026-03 unverdicted novelty 6.0

A literature survey that taxonomizes methods, datasets, and evaluation practices for natural language interfaces to geospatial and temporal databases while identifying recurring trends and future directions.
LLMs Get Lost In Multi-Turn Conversation
cs.CL 2025-05 unverdicted novelty 6.0

LLMs drop 39% in performance during multi-turn conversations due to premature assumptions and inability to recover from early errors.
Measuring Coding Challenge Competence With APPS
cs.SE 2021-05 unverdicted novelty 6.0

APPS benchmark shows models like GPT-Neo pass roughly 20% of test cases on introductory problems, indicating machine learning is beginning to learn basic coding.
An Alternate Agentic AI Architecture (It's About the Data)
cs.DB 2026-04 unverdicted novelty 5.0

RUBICON replaces opaque LLM-based tool orchestration in agentic AI with an explicit query algebra (AQL: Find, From, Where) executed via wrappers to deliver traceable, deterministic access to heterogeneous enterprise d...
Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications
cs.AI 2026-04 unverdicted novelty 5.0

Blue DIL is a new architecture that unifies structured enterprise data, LLM world knowledge, and personal context through declarative query plans and agents for multi-source multi-modal applications.
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
cs.CL 2025-03 unverdicted novelty 5.0

Phi-4-Mini achieves strong math and coding performance with only 3.8B parameters via high-quality synthetic data, while Phi-4-Multimodal uses Mixture-of-LoRAs to integrate modalities and top speech recognition leaderboards.
StarCoder: may the source be with you!
cs.CL 2023-05 accept novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
Supplement Generation Training for Enhancing Agentic Task Performance
cs.LG 2026-04 unverdicted novelty 4.0

SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.
Qwen2.5-Coder Technical Report
cs.CL 2024-09 unverdicted novelty 4.0

Qwen2.5-Coder models claim state-of-the-art results on over 10 code benchmarks, outperforming larger models of similar size.