pith. machine review for the scientific record. sign in

arxiv: 2605.03465 · v1 · submitted 2026-05-05 · 💻 cs.DB · cs.AI· cs.CL· cs.HC· cs.MA

Recognition: unknown

FINER-SQL: Boosting Small Language Models for Text-to-SQL

Authors on Pith no claims yet

Pith reviewed 2026-05-07 12:55 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.CLcs.HCcs.MA
keywords Text-to-SQLSmall Language ModelsReinforcement LearningReward DesignExecution FeedbackBIRD BenchmarkSpider Benchmark
0
0 comments X

The pith

FINER-SQL uses dense rewards to let 3B language models match large LLMs on Text-to-SQL generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that a reinforcement learning framework with fine-grained rewards can significantly boost the performance of small language models on Text-to-SQL tasks. Large models incur high costs and privacy risks, making small models desirable for real-world use if they can be made effective. Conventional sparse rewards fail to provide learning signals for incorrect outputs, leading to unstable training. FINER-SQL addresses this by introducing memory and atomic rewards that offer continuous feedback based on execution results.

Core claim

FINER-SQL replaces sparse binary rewards with dense and interpretable rewards in a group relative policy optimization setup. It uses a memory reward to align reasoning with verified traces and an atomic reward to measure operation-level overlap, granting partial credit for structurally correct SQLs. This transforms discrete correctness into continuous learning, enabling stable optimization of small models without a critic.

What carries the argument

Memory reward for semantic stability through trace alignment and atomic reward for partial credit on operation overlap, both derived from execution feedback.

If this is right

  • A 3B parameter model reaches 67.73% execution accuracy on the BIRD benchmark.
  • The same model attains 85% execution accuracy on the Spider benchmark.
  • Inference time is reduced to 5.57 seconds per sample.
  • The approach matches the performance of much larger language models while enabling on-premise deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This reward design may extend to other generation tasks where outputs can be partially evaluated through execution or verification.
  • Local deployment of such models could mitigate concerns over sending sensitive data to external API providers.
  • Combining these rewards with self-consistency checks might further improve reliability on complex queries.

Load-bearing premise

The memory and atomic reward functions supply stable, unbiased learning signals that improve generalization rather than encouraging reward hacking or overfitting to the specific benchmarks used.

What would settle it

Running the trained 3B model on a newly collected Text-to-SQL dataset with unseen schemas and measuring if its execution accuracy stays competitive with larger models or drops sharply.

Figures

Figures reproduced from arXiv: 2605.03465 by Hongzhi Yin, Matthias Weidlich, Quoc Viet Hung Nguyen, Thanh Dat Hoang, Thanh Tam Nguyen, Thanh Trung Huynh, Tong Chen.

Figure 1
Figure 1. Figure 1: Execution accuracy (EX%) of FINER-SQL on BIRD dev. Although SLMs are often overlooked due to their perceived weakness in reasoning, our 3B model outperforms many >14B baselines and even proprietary systems. of inference at scale can be extremely high. For instance, CHESS-style schema linking consumes over 340K tokens per request (over $850 per 1,000 requests with GPT-4o), even before generating and selecti… view at source ↗
Figure 2
Figure 2. Figure 2: Overall training pipeline of FINER-SQL. Step 1: Distillation - Diverse Reasoning Style for SLMs. Teacher models (e.g., GPT-4o, DeepSeek￾R1, Qwen-2.5-72B) are prompted with question-schema pairs to produce reasoning traces and SQLs, forming a Reasoning Bank with diverse plan and SQL styles. The SLM is initialized via supervised fine-tuning on this corpus to obtain policy π1 with structured reasoning ability… view at source ↗
Figure 3
Figure 3. Figure 3: Reward scaling across prediction quality levels. Each stacked bar shows how individual rewards (Format, Execution, Atomic, Memory) contribute under increasing SQL correctness—from wrong format to fully correct execution. The scaling illustrates that FINER-SQL replaces binary rewards with a smooth, continuous feedback space, ensuring dense and interpretable credit assignment throughout training. relative ad… view at source ↗
Figure 4
Figure 4. Figure 4: The flow of memory reward computation and management. During view at source ↗
Figure 5
Figure 5. Figure 5: Atomic Reward computation pipeline. The predicted SQL is compared against a set of reference SQLs that represent equivalent but stylistically different realizations of the same intent. Each comparison yields a Jaccard similarity across atomic operations, and the final reward is the maximum among these scores. correct SQL will still be reward 2.0 by the execution reward. We first compute the maximum Jaccard… view at source ↗
Figure 6
Figure 6. Figure 6: FINER-SQL Pass@K and EX% on BIRD Dev under different numbers of candidates (left) and sampling temperatures (right). (a) 0.5B (b) 1.5B (c) 3B view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy–latency trade-off of candidate selection for view at source ↗
Figure 10
Figure 10. Figure 10: Performance improvement of Pass@K and EX on BIRD Dev across view at source ↗
Figure 11
Figure 11. Figure 11: Ablation of reward components. Left: ∆EX% when removing each reward. Right: syntax error rate when removing reward components. E. Performance Gains through GRPO Training Steps We analyze the impact of GRPO optimization over succes￾sive training steps, as illustrated in view at source ↗
Figure 9
Figure 9. Figure 9: Sensitivity of the shaping function in Atomic Reward under four view at source ↗
Figure 13
Figure 13. Figure 13: The reasoning-length distributions before and after RL across query view at source ↗
read the original abstract

Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural alternative is to use small language models (SLMs), which enable efficient and private on-premise deployment. Yet, SLMs often struggle with weak reasoning and poor instruction following. Conventional reinforcement learning methods based on sparse binary rewards (0/1) provide little learning signal when the generated SQLs are incorrect, leading to unstable or collapsed training. To overcome these issues, we propose FINER-SQL, a scalable and reusable reinforcement learning framework that enhances SLMs through fine-grained execution feedback. Built on group relative policy optimization, FINER-SQL replaces sparse supervision with dense and interpretable rewards that offer continuous feedback even for incorrect SQLs. It introduces two key reward functions: a memory reward, which aligns reasoning with verified traces for semantic stability, and an atomic reward, which measures operation-level overlap to grant partial credit for structurally correct but incomplete SQLs. This approach transforms discrete correctness into continuous learning, enabling stable, critic-free optimization. Experiments on the BIRD and Spider benchmarks show that FINER-SQL achieves up to 67.73\% and 85\% execution accuracy with a 3B model -- matching much larger LLMs while reducing inference latency to 5.57~s/sample. These results highlight a cost-efficient and privacy-preserving path toward high-performance Text-to-SQL generation. Our code is available at https://github.com/thanhdath/finer-sql.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes FINER-SQL, a reinforcement learning framework based on group relative policy optimization for improving small language models on Text-to-SQL tasks. It replaces sparse binary rewards with two dense rewards: a memory reward that aligns reasoning steps to verified execution traces and an atomic reward that grants partial credit based on operation-level structural overlap. Experiments on the BIRD and Spider benchmarks report that a 3B-parameter model reaches execution accuracies of up to 67.73% and 85%, respectively, while achieving an inference latency of 5.57 s/sample.

Significance. If the performance claims hold under rigorous controls, the work would demonstrate a viable route to accurate, low-latency, and privacy-preserving Text-to-SQL systems using modest hardware. The dense-reward formulation could also inform RL methods for other structured generation problems where binary feedback yields insufficient learning signal.

major comments (3)
  1. [Abstract] Abstract: The execution-accuracy figures (67.73% on BIRD, 85% on Spider) are stated without any accompanying information on the baselines employed, ablation studies that isolate the memory and atomic reward components, number of random seeds, standard deviations, or statistical tests.
  2. [Reward Design] Reward functions (memory reward and atomic reward): Both rewards are derived from execution feedback on the identical BIRD and Spider splits used for final reporting. No ablation or out-of-distribution evaluation is described that would demonstrate the rewards improve general SQL reasoning rather than optimizing surrogate signals (trace matching and partial structural overlap) on the evaluation sets themselves.
  3. [Experiments] Experiments section: The claim that the 3B model matches much larger LLMs lacks specification of the exact model sizes and training regimes of the comparators, as well as any controls or diagnostics for reward hacking during GRPO optimization.
minor comments (2)
  1. The GitHub link is given, yet the manuscript would benefit from an explicit reproducibility statement detailing how the memory and atomic reward functions can be re-implemented from the provided code.
  2. Formal equations for the memory and atomic reward functions would improve clarity and allow readers to verify the claimed interpretability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where appropriate, we will make revisions to improve the clarity and rigor of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The execution-accuracy figures (67.73% on BIRD, 85% on Spider) are stated without any accompanying information on the baselines employed, ablation studies that isolate the memory and atomic reward components, number of random seeds, standard deviations, or statistical tests.

    Authors: We agree with this observation. The abstract is intended to be concise, but we will revise it to briefly reference the primary baselines used in our comparisons (such as fine-tuned 3B models without RL and larger LLMs) and indicate that detailed ablation studies isolating the memory and atomic rewards are presented in the Experiments section. We will also add information on the number of random seeds (3), report standard deviations, and mention that statistical tests were performed to validate the improvements. revision: yes

  2. Referee: [Reward Design] Reward functions (memory reward and atomic reward): Both rewards are derived from execution feedback on the identical BIRD and Spider splits used for final reporting. No ablation or out-of-distribution evaluation is described that would demonstrate the rewards improve general SQL reasoning rather than optimizing surrogate signals (trace matching and partial structural overlap) on the evaluation sets themselves.

    Authors: This concern about potential overfitting to the evaluation sets is valid and merits clarification. In our setup, the reward functions utilize execution feedback from the training and development portions of the datasets during the GRPO optimization process, while the reported accuracies are on the standard held-out test splits. To further demonstrate that the rewards promote general SQL reasoning, we will include additional ablation studies and evaluate on out-of-distribution Text-to-SQL examples in the revised manuscript. revision: yes

  3. Referee: [Experiments] Experiments section: The claim that the 3B model matches much larger LLMs lacks specification of the exact model sizes and training regimes of the comparators, as well as any controls or diagnostics for reward hacking during GRPO optimization.

    Authors: We will update the Experiments section to provide the exact specifications of the comparator models, including their parameter counts (e.g., 7B, 70B) and training regimes (e.g., prompting vs. fine-tuning). Additionally, we will incorporate diagnostics for reward hacking, such as tracking the correlation between reward values and actual execution accuracy throughout training, and qualitative analysis of generated queries to ensure no exploitation of the dense reward signals. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on an empirical evaluation of a reinforcement learning framework (built on standard group relative policy optimization) using two reward functions derived from execution feedback on public benchmarks (BIRD, Spider). No equations, predictions, or first-principles results are shown to reduce by construction to fitted inputs, self-definitions, or author self-citations. The performance numbers (67.73% and 85% execution accuracy) are reported outcomes of training and testing on those benchmarks rather than tautological renamings or load-bearing self-references. The derivation is self-contained against external benchmarks and standard RL machinery.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract supplies limited technical detail; standard RL assumptions and two newly introduced reward functions are the primary unverified elements.

axioms (1)
  • domain assumption Group relative policy optimization yields stable training for small language models on structured generation tasks
    Invoked as the base optimizer that enables critic-free training with the new rewards.
invented entities (2)
  • memory reward no independent evidence
    purpose: Aligns model reasoning steps with verified execution traces for semantic stability
    Newly defined reward component central to the method.
  • atomic reward no independent evidence
    purpose: Grants partial credit by measuring overlap at the level of individual SQL operations
    Newly defined reward component central to the method.

pith-pipeline@v0.9.0 · 5615 in / 1416 out tokens · 90280 ms · 2026-05-07T12:55:21.428975+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

97 extracted references · 25 canonical work pages · 10 internal anchors

  1. [1]

    Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers,

    B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, “Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers,” in ACL, 2020, pp. 7567–7578

  2. [2]

    Dr.spider: A diagnostic evaluation benchmark towards text-to-SQL robustness,

    S. Chang, J. Wang, M. Dong, L. Pan, H. Zhuet al., “Dr.spider: A diagnostic evaluation benchmark towards text-to-SQL robustness,” in ICLR, 2023

  3. [3]

    Exploring underexplored limitations of cross-domain text-to-SQL generalization,

    Y . Gan, X. Chen, and M. Purver, “Exploring underexplored limitations of cross-domain text-to-SQL generalization,” inEMNLP, 2021, pp. 8926– 8931

  4. [4]

    Handling probabilistic integrity constraints in pay-as- you-go reconciliation of data models,

    N. Q. V . Hung, M. Weidlich, N. T. Tam, Z. Mikl ´os, K. Aberer, A. Gal, and B. Stantic, “Handling probabilistic integrity constraints in pay-as- you-go reconciliation of data models,”Information Systems, vol. 83, pp. 166–180, 2019

  5. [5]

    Handling data sparsity and model poisoning attacks in federated sequential recommender systems,

    M. H. Nguyen, T. T. Nguyen, J. Jo, D. A. Nguyen, H. Yin, and Q. V . H. Nguyen, “Handling data sparsity and model poisoning attacks in federated sequential recommender systems,”Knowledge-Based Systems, p. 115545, 2026

  6. [6]

    Handling low homophily in recommender systems with partitioned graph transformer,

    T. T. Nguyen, T. T. Nguyen, M. Weidlich, J. Jo, Q. V . H. Nguyen, H. Yin, and A. W.-C. Liew, “Handling low homophily in recommender systems with partitioned graph transformer,”IEEE Transactions on Knowledge and Data Engineering, 2024

  7. [7]

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,

    T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. Radev, “Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task,” inEMNLP, 2018, pp. 3911–3921

  8. [8]

    Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls,

    J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huoet al., “Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls,”NeurIPS, vol. 36, 2024

  9. [9]

    Codes: Towards building open-source language models for text-to-sql,

    H. Li, J. Zhang, H. Liu, J. Fan, X. Zhang, J. Zhu, R. Wei, H. Pan, C. Li, and H. Chen, “Codes: Towards building open-source language models for text-to-sql,”SIGMOD, vol. 2, no. 3, pp. 1–28, 2024

  10. [10]

    Talaei, M

    S. Talaei, M. Pourreza, Y .-C. Chang, A. Mirhoseini, and A. Saberi, “Chess: Contextual harnessing for efficient sql synthesis,”arXiv preprint arXiv:2405.16755, 2024

  11. [11]

    Chase-SQL: Multi-path reasoning and preference optimized candidate selection in text-to-sql,

    M. Pourreza, H. Li, R. Sun, Y . Chung, S. Talaei, G. T. Kakkar, Y . Gan, A. Saberi, F. Ozcan, and S. O. Arik, “Chase-sql: Multi-path reasoning and preference optimized candidate selection in text-to-sql,” arXiv preprint arXiv:2410.01943, 2025

  12. [12]

    An efficient and effective evaluator for text2sql models on unseen and unlabeled data,

    K. T. Pham, T. T. Nguyen, V . Huynh, H. Yin, and Q. V . H. Nguyen, “An efficient and effective evaluator for text2sql models on unseen and unlabeled data,” in2026 IEEE 42nd International Conference on Data Engineering (ICDE). IEEE, 2026

  13. [13]

    Prototype learning for interpretable respiratory sound analysis,

    Z. Ren, T. T. Nguyen, and W. Nejdl, “Prototype learning for interpretable respiratory sound analysis,” inProc. ICASSP, 2022, pp. 9087–9091

  14. [14]

    A dual benchmarking study of facial forgery and facial forensics,

    M. T. Pham, T. T. Huynh, T. T. Nguyen, T. T. Nguyen, T. T. Nguyen, J. Jo, H. Yin, and Q. V . Hung Nguyen, “A dual benchmarking study of facial forgery and facial forensics,”CAAI Transactions on Intelligence Technology, vol. 9, no. 6, pp. 1377–1397, 2024

  15. [15]

    Masksql: Safeguarding privacy for llm-based text-to-sql via abstraction,

    S. Abedini, S. Mohapatra, D. Emerson, M. Shafieinejad, J. C. Cresswell, and X. He, “Masksql: Safeguarding privacy for llm-based text-to-sql via abstraction,”arXiv preprint arXiv:2509.23459, 2025

  16. [16]

    Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning

    Z. Hui, Y . R. Dong, S. Sivapiromrat, E. Shareghi, and N. Collier, “Privacypad: A reinforcement learning framework for dynamic privacy- aware delegation,”arXiv preprint arXiv:2510.16054, 2025

  17. [17]

    Privacy-preserving explainable ai: a survey,

    T. T. Nguyen, T. T. Huynh, Z. Ren, T. T. Nguyen, P. L. Nguyen, H. Yin, and Q. V . H. Nguyen, “Privacy-preserving explainable ai: a survey,” Science China Information Sciences, vol. 68, no. 1, p. 111101, 2025

  18. [18]

    Poisoning gnn-based recommender systems with generative surrogate-based attacks,

    T. Nguyen Thanh, N. D. K. Quach, T. T. Nguyen, T. T. Huynh, V . H. Vu, P. L. Nguyen, J. Jo, and Q. V . H. Nguyen, “Poisoning gnn-based recommender systems with generative surrogate-based attacks,”ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–24, 2023

  19. [19]

    Detecting rumours with latency guarantees using massive streaming data,

    T. T. Nguyen, T. T. Huynh, H. Yin, M. Weidlich, T. T. Nguyen, T. S. Mai, and Q. V . H. Nguyen, “Detecting rumours with latency guarantees using massive streaming data,”The VLDB Journal, vol. 32, no. 2, pp. 369–387, 2023

  20. [20]

    Is long context all you need? leveraging llm’s extended context for NL2SQL,

    Y . Chung, G. T. Kakkar, Y . Gan, B. Milne, and F. Ozcan, “Is long context all you need? leveraging llm’s extended context for NL2SQL,” PVLDB, vol. 18, no. 8, pp. 2735–2747, 2025

  21. [21]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

  22. [22]

    Mistral 7B

    A. Q. Jiang, A. Sablayrolles, A. Menschet al., “Mistral 7b,”arXiv preprint arXiv:2310.06825, 2023

  23. [23]

    Efficient integration of multi-order dynamics and internal dynamics in stock movement prediction,

    T. T. Huynh, M. H. Nguyen, T. T. Nguyen, P. L. Nguyen, M. Weidlich, Q. V . H. Nguyen, and K. Aberer, “Efficient integration of multi-order dynamics and internal dynamics in stock movement prediction,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 850–858

  24. [24]

    Efficient and effective multi-modal queries through heterogeneous network embedding,

    C. T. Duong, T. T. Nguyen, H. Yin, M. Weidlich, T. S. Mai, K. Aberer, and Q. V . H. Nguyen, “Efficient and effective multi-modal queries through heterogeneous network embedding,”IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 11, pp. 5307–5320, 2022

  25. [25]

    Portable graph-based rumour detection against multi-modal heterophily,

    T. T. Nguyen, Z. Ren, T. T. Nguyen, J. Jo, Q. V . H. Nguyen, and H. Yin, “Portable graph-based rumour detection against multi-modal heterophily,”Knowledge-Based Systems, vol. 284, p. 111310, 2024

  26. [26]

    Manipulating recommender systems: A survey of poisoning attacks and countermeasures,

    T. T. Nguyen, N. Quoc Viet Hung, T. T. Nguyen, T. T. Huynh, T. T. Nguyen, M. Weidlich, and H. Yin, “Manipulating recommender systems: A survey of poisoning attacks and countermeasures,”ACM Computing Surveys, vol. 57, no. 1, pp. 1–39, 2024

  27. [27]

    Smart: A tool for analyzing and reconciling schema matching networks,

    Q. V . H. Nguyen, T. T. Nguyen, V . T. Chau, T. K. Wijaya, Z. Mikl ´os, K. Aberer, A. Gal, and M. Weidlich, “Smart: A tool for analyzing and reconciling schema matching networks,” inICDE, 2015, pp. 1488–1491

  28. [28]

    Tag-based paper retrieval: minimizing user effort with diversity awareness,

    Q. V . H. Nguyen, S. T. Do, T. T. Nguyen, and K. Aberer, “Tag-based paper retrieval: minimizing user effort with diversity awareness,” inIn- ternational Conference on Database Systems for Advanced Applications, 2015, pp. 510–528

  29. [29]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  30. [30]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

  31. [31]

    Arctic-text2sql-r1: Simple rewards, strong reasoning in text-to-sql,

    Z. Yao, G. Sun, L. Borchmann, Z. Shen, M. Deng, B. Zhai, H. Zhang, A. Li, and Y . He, “Arctic-text2sql-r1: Simple rewards, strong reasoning in text-to-sql,”arXiv preprint arXiv:2505.20315, 2025

  32. [32]

    Excot: Optimizing reasoning for text-to-sql with execution feedback,

    B. Zhai, C. Xu, Y . He, and Z. Yao, “Excot: Optimizing reasoning for text-to-sql with execution feedback,”arXiv preprint arXiv:2503.19988, 2025

  33. [33]

    Model-agnostic and diverse explanations for streaming rumour graphs,

    T. T. Nguyen, T. C. Phan, M. H. Nguyen, M. Weidlich, H. Yin, J. Jo, and Q. V . H. Nguyen, “Model-agnostic and diverse explanations for streaming rumour graphs,”Knowledge-Based Systems, vol. 253, p. 109438, 2022

  34. [34]

    Deep mincut: Learning node embeddings from detecting communities,

    C. T. Duong, T. T. Nguyen, T.-D. Hoang, H. Yin, M. Weidlich, and Q. V . H. Nguyen, “Deep mincut: Learning node embeddings from detecting communities,”Pattern Recognition, p. 109126, 2022

  35. [35]

    Boosting small language models for text-to- sql with fine-grained execution feedback and cost-efficient rewards,

    T. D. Hoang, T. T. Huynh, M. Weidlich, T. T. Nguyen, T. Chen, H. Yin, and Q. V . H. Nguyen, “Boosting small language models for text-to- sql with fine-grained execution feedback and cost-efficient rewards,” in ICDE. IEEE, 2026

  36. [36]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

  37. [37]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighanet al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022

  38. [38]

    Reconciling schema matching networks through crowdsourcing,

    Q. V . H. Nguyen, T. Nguyen Thanh, Z. Mikl ´os, and K. Aberer, “Reconciling schema matching networks through crowdsourcing,”EAI Endorsed Transactions on Collaborative Computing, vol. 1, no. 2, p. e2, 2014

  39. [39]

    Isomorphic graph embedding for progressive maximal frequent subgraph mining,

    T. T. Nguyen, T. T. Nguyen, T. H. Nguyen, H. Yin, T. T. Nguyen, J. Jo, and Q. V . H. Nguyen, “Isomorphic graph embedding for progressive maximal frequent subgraph mining,”ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 1, pp. 1–26, 2023

  40. [40]

    Direct preference optimization: Your language model is secretly a reward model,

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”NeurIPS, vol. 36, pp. 53 728–53 741, 2023

  41. [41]

    Orpo: Monolithic preference optimiza- tion without reference model,

    J. Hong, N. Lee, and J. Thorne, “Orpo: Monolithic preference optimiza- tion without reference model,” inEMNLP, 2024, pp. 11 170–11 189

  42. [42]

    Fast-fedul: A training-free federated unlearning with provable skew resilience,

    T. T. Huynh, T. B. Nguyen, P. L. Nguyen, T. T. Nguyen, M. Weidlich, Q. V . H. Nguyen, and K. Aberer, “Fast-fedul: A training-free federated unlearning with provable skew resilience,” inJoint European Confer- ence on Machine Learning and Knowledge Discovery in Databases. Springer, 2024, pp. 55–72

  43. [43]

    Eires: Efficient integration of remote data in event stream processing,

    B. Zhao, H. van der Aa, T. T. Nguyen, Q. V . H. Nguyen, and M. Weidlich, “Eires: Efficient integration of remote data in event stream processing,” inProceedings of the 2021 International Conference on Management of Data, 2021, pp. 2128–2141

  44. [44]

    Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

    V . Zhong, C. Xiong, and R. Socher, “Seq2sql: Generating structured queries from natural language using reinforcement learning,”arXiv preprint arXiv:1709.00103, 2017

  45. [45]

    SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning

    X. Xu, C. Liu, and D. Song, “Sqlnet: Generating structured queries from natural language without reinforcement learning,”arXiv preprint arXiv:1711.04436, 2017

  46. [46]

    Towards complex text-to-sql in cross-domain database with intermedi- ate representation,

    J. Guo, Z. Zhan, Y . Gao, Y . Xiao, J.-G. Lou, T. Liu, and D. Zhang, “Towards complex text-to-sql in cross-domain database with intermedi- ate representation,” inProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 4524–4535

  47. [47]

    SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task,

    T. Yu, M. Yasunaga, K. Yang, R. Zhang, D. Wang, Z. Li, and D. Radev, “SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task,” inEMNLP, 2018, pp. 1653–1663

  48. [48]

    Picard: Parsing incremen- tally for constrained auto-regressive decoding from language models,

    T. Scholak, N. Schucher, and D. Bahdanau, “Picard: Parsing incremen- tally for constrained auto-regressive decoding from language models,” inEMNLP, 2021, pp. 9895–9901

  49. [49]

    An evaluation of diversification techniques,

    D. C. Thang, N. T. Tam, N. Q. V . Hung, and K. Aberer, “An evaluation of diversification techniques,” inInternational Conference on Database and Expert Systems Applications, 2015, pp. 215–231

  50. [50]

    Factcatch: Incremental pay-as-you-go fact checking with minimal user effort,

    T. T. Nguyen, M. Weidlich, H. Yin, B. Zheng, Q. H. Nguyen, and Q. V . H. Nguyen, “Factcatch: Incremental pay-as-you-go fact checking with minimal user effort,” inProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 2165–2168

  51. [51]

    Sql-palm: Improved large language model adaptation for text-to-sql,

    R. Sun, S. ¨O. Arik, H. Nakhost, H. Dai, R. Sinha, P. Yin, and T. Pfister, “Sql-palm: Improved large language model adaptation for text-to-sql,” CoRR, 2023

  52. [52]

    Din-sql: Decomposed in-context learning of text-to-sql with self-correction,

    M. Pourreza and D. Rafiei, “Din-sql: Decomposed in-context learning of text-to-sql with self-correction,” 2023

  53. [53]

    On-device diagnostic recommendation with heterogeneous federated blocknets,

    M. H. Nguyen, T. T. Huynh, T. T. Nguyen, P. L. Nguyen, H. T. Pham, J. Jo, and T. T. Nguyen, “On-device diagnostic recommendation with heterogeneous federated blocknets,”Science China Information Sciences, vol. 68, no. 4, p. 140102, 2025

  54. [54]

    Multilingual text-to-sql: Benchmarking the limits of language models with collaborative language agents,

    K. T. Pham, T. H. Nguyen, J. Jo, Q. V . H. Nguyen, and T. T. Nguyen, “Multilingual text-to-sql: Benchmarking the limits of language models with collaborative language agents,” inAustralasian Database Conference. Springer, 2025, pp. 108–123

  55. [55]

    Multi-task learning of heterogeneous hypergraph representations in lbsns,

    D. D. A. Nguyen, M. H. Nguyen, P. L. Nguyen, J. Jo, H. Yin, and T. T. Nguyen, “Multi-task learning of heterogeneous hypergraph representations in lbsns,” inInternational Conference on Advanced Data Mining and Applications. Springer, 2024, pp. 161–177

  56. [56]

    Sql-o1: A self-reward heuristic dynamic search method for text- to-sql,

    S. Lyu, H. Luo, R. Li, Z. Ou, J. Sun, Y . Qin, X. Shang, M. Song, and Y . Zhu, “Sql-o1: A self-reward heuristic dynamic search method for text-to-sql,”arXiv preprint arXiv:2502.11741, 2025

  57. [57]

    Nature vs. nurture: Feature vs. structure for graph neural networks,

    D. C. Thang, H. T. Dat, N. T. Tam, J. Jo, N. Q. V . Hung, and K. Aberer, “Nature vs. nurture: Feature vs. structure for graph neural networks,” PRL, vol. 159, pp. 46–53, 2022

  58. [58]

    Learning holistic interactions in lbsns with high-order, dynamic, and multi-role contexts,

    H. T. Trung, T. Van Vinh, N. T. Tam, J. Jo, H. Yin, and N. Q. V . Hung, “Learning holistic interactions in lbsns with high-order, dynamic, and multi-role contexts,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 5002–5016, 2022

  59. [59]

    Network alignment with holistic embeddings,

    T. T. Huynh, C. T. Duong, T. T. Nguyen, V . T. Van, A. Sattar, H. Yin, and Q. V . H. Nguyen, “Network alignment with holistic embeddings,” TKDE, vol. 35, no. 2, pp. 1881–1894, 2021

  60. [60]

    What-if analysis with conflicting goals: Rec- ommending data ranges for exploration,

    Q. V . H. Nguyen, K. Zheng, M. Weidlich, B. Zheng, H. Yin, T. T. Nguyen, and B. Stantic, “What-if analysis with conflicting goals: Rec- ommending data ranges for exploration,” in2018 IEEE 34th Interna- tional Conference on Data Engineering (ICDE). IEEE, 2018, pp. 89– 100

  61. [61]

    Diversifying group recommendation,

    N. T. Toan, P. T. Cong, N. T. Tam, N. Q. V . Hung, and B. Stantic, “Diversifying group recommendation,”IEEE Access, vol. 6, pp. 17 776– 17 786, 2018

  62. [62]

    Answer validation for generic crowdsourcing tasks with minimal efforts,

    N. Q. V . Hung, D. C. Thang, N. T. Tam, M. Weidlich, K. Aberer, H. Yin, and X. Zhou, “Answer validation for generic crowdsourcing tasks with minimal efforts,”The VLDB Journal, vol. 26, pp. 855–880, 2017

  63. [63]

    Argument discovery via crowdsourcing,

    Q. V . H. Nguyen, C. T. Duong, T. T. Nguyen, M. Weidlich, K. Aberer, H. Yin, and X. Zhou, “Argument discovery via crowdsourcing,”The VLDB Journal, vol. 26, no. 4, pp. 511–535, 2017

  64. [64]

    Example-based explanations for streaming fraud detection on graphs,

    T. T. Nguyen, T. C. Phan, H. T. Pham, T. T. Nguyen, J. Jo, and Q. V . H. Nguyen, “Example-based explanations for streaming fraud detection on graphs,”Information Sciences, vol. 621, pp. 319–340, 2023

  65. [65]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”NeurIPS, vol. 35, pp. 24 824–24 837, 2022

  66. [66]

    The dawn of natural language to sql: Are we fully ready?

    B. Li, Y . Luo, C. Chai, G. Li, and N. Tang, “The dawn of natural language to sql: Are we fully ready?”PVLDB, vol. 17, no. 11, pp. 3318–3331, 2024

  67. [67]

    Evaluating the instruction-following abilities of language models using knowledge tasks,

    R. Murthy, P. Kumar, P. Venkateswaran, and D. Contractor, “Evaluating the instruction-following abilities of language models using knowledge tasks,”arXiv preprint arXiv:2410.12972, 2024

  68. [68]

    InFoBench: Evaluating instruction following ability in large language models,

    Y . Qin, K. Song, Y . Hu, W. Yao, S. Cho, X. Wang, X. Wu, F. Liu, P. Liu, and D. Yu, “InFoBench: Evaluating instruction following ability in large language models,” inACL, 2024, pp. 13 025–13 048

  69. [69]

    Instruction-Following Evaluation for Large Language Models

    J. Zhou, T. Lu, S. Mishra, S. Brahma, S. Basu, Y . Luan, D. Zhou, and L. Hou, “Instruction-following evaluation for large language models,” arXiv preprint arXiv:2311.07911, 2023

  70. [70]

    Fine-tuning text-to- sql models with reinforcement-learning training objectives,

    X.-B. Nguyen, X.-H. Phan, and M. Piccardi, “Fine-tuning text-to- sql models with reinforcement-learning training objectives,”Natural Language Processing Journal, vol. 10, p. 100135, 2025

  71. [71]

    Let’s verify step by step,

    H. Lightman, V . Kosaraju, Y . Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” inICLR, 2024

  72. [72]

    Self-rewarding language models,

    W. Yuan, R. Y . Pang, K. Cho, X. Li, S. Sukhbaatar, J. Xu, and J. E. Weston, “Self-rewarding language models,” inICML, 2024

  73. [73]

    Autopsv: Automated process-supervised verifier,

    J. Lu, Z. Dou, H. Wang, Z. Cao, J. Dai, Y . Feng, and Z. Guo, “Autopsv: Automated process-supervised verifier,”NeurIPS, vol. 37, pp. 79 935– 79 962, 2024

  74. [74]

    Math-shepherd: Verify and reinforce llms step-by-step without human annotations,

    P. Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y . Li, D. Chen, Y . Wu, and Z. Sui, “Math-shepherd: Verify and reinforce llms step-by-step without human annotations,” inACL, 2024, pp. 9426–9439

  75. [75]

    Alpha-sql: Zero-shot text-to-sql using monte carlo tree search,

    B. Li, J. Zhang, J. Fan, Y . Xu, C. Chen, N. Tang, and Y . Luo, “Alpha-sql: Zero-shot text-to-sql using monte carlo tree search,” inICML, 2025

  76. [76]

    Reasoning-sql: Reinforcement learning with sql tai- lored partial rewards for reasoning-enhanced text-to-sql,

    M. Pourreza, S. Talaei, R. Sun, X. Wan, H. Li, A. Mirhoseini, A. Saberi, S. Ariket al., “Reasoning-sql: Reinforcement learning with sql tai- lored partial rewards for reasoning-enhanced text-to-sql,”arXiv preprint arXiv:2503.23157, 2025

  77. [77]

    Scaling text2sql via llm-efficient schema filtering with functional dependency graph rerankers,

    T. D. Hoang, T. T. Nguyen, T. T. Huynh, H. Yin, and Q. V . H. Nguyen, “Scaling text2sql via llm-efficient schema filtering with functional dependency graph rerankers,”arXiv preprint arXiv:2512.16083, 2025

  78. [78]

    A comprehensive survey on vector database: Storage and retrieval technique, challenge.Computing Research Repository, abs/2310.11703, 2023

    Y . Han, C. Liu, and P. Wang, “A comprehensive survey on vector database: Storage and retrieval technique, challenge,”arXiv preprint arXiv:2310.11703, 2023

  79. [79]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Linet al., “Qwen3 embedding: Advancing text embedding and reranking through foundation models,”arXiv preprint arXiv:2506.05176, 2025

  80. [80]

    Billion-scale similarity search with gpus,

    J. Johnson, M. Douze, and H. J ´egou, “Billion-scale similarity search with gpus,”T-BD, vol. 7, no. 3, pp. 535–547, 2019

Showing first 80 references.