ExCyTIn-Bench is the first benchmark of 7542 questions from Microsoft Sentinel threat investigation graphs, where the best LLM agent achieves a reward of 0.606.
Title resolution pending
15 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.
ROSE is an intent-centered NL2SQL metric using an adversarial Prover-Refuter cascade that achieves higher human-expert agreement than prior metrics on a new validation set.
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.
EGRefine optimizes column renamings via execution-grounded verification and view materialization to recover Text-to-SQL accuracy lost to schema naming issues while guaranteeing query equivalence.
PiLLar is the first LLM-guided Monte-Carlo Tree Search framework for joint schema-value matching on pivot tables, achieving 87.94% average accuracy on a new benchmark PTbench derived from real-world domains.
A self-healing LLM pipeline for natural language to PostgreSQL translation achieves up to 9.3 percentage point accuracy gains on benchmarks through error diagnosis and anti-regression mechanisms.
AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
RAS conditions each new Cypher query attempt on prior execution errors through ICL and reduces execution error rate by 41-50% at n=5 versus 32-38% for independent scaling across three Neo4j datasets and five models.
SecureMCP integrates RBAC with five sequential defense modules in an MCP server to achieve 82.3% policy compliance against adversarial LLM SQL queries in AIoT while preserving execution accuracy.
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.
citing papers explorer
-
Residual Skill Optimization for Text-to-SQL Ensembles
Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.
-
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
-
RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation
RAS conditions each new Cypher query attempt on prior execution errors through ICL and reduces execution error rate by 41-50% at n=5 versus 32-38% for independent scaling across three Neo4j datasets and five models.
-
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
-
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.