Recognition: unknown
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
Pith reviewed 2026-05-10 16:52 UTC · model grok-4.3
The pith
FREIA uses free energy rewards and statistical advantage shaping to enable effective unsupervised RL for LLM reasoning without labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present FREIA as an RL algorithm that translates the Free Energy Principle into a Free Energy-Driven Reward (FER) to adaptively balance consensus and exploration in the absence of ground truth, then applies Adaptive Advantage Shaping (AAS) to adjust advantages using the statistical properties of those rewards, producing stable policy optimization that improves LLM performance on reasoning tasks.
What carries the argument
Free Energy-Driven Reward (FER) that computes adaptive rewards to balance consensus and exploration per the Free Energy Principle, combined with Adaptive Advantage Shaping (AAS) that rescales advantages from the mean and variance of sampled rewards.
If this is right
- LLMs can improve reasoning performance using only their own sampled outputs as the source of reward and advantage signals.
- The method sustains effective optimization as the policy's reasoning quality changes over the course of training.
- Gains appear across multiple reasoning domains including mathematics, code generation, and logical inference on nine separate datasets.
- No external labels or ground-truth answers are needed to compute either the rewards or the advantage adjustments.
Where Pith is reading between the lines
- The free-energy formulation might be combined with other information measures to create hybrid unsupervised objectives for additional LLM capabilities.
- The same adaptive shaping mechanism could be tested on tasks outside reasoning, such as long-context generation or tool use.
- Scaling experiments on models larger than 1.5B parameters would show whether the relative gains persist or change with model size.
Load-bearing premise
The free energy principle supplies a reward signal that correctly balances consensus among samples with useful exploration even when no correct answers exist, and reward statistics provide reliable information for shaping advantages.
What would settle it
Training the same 1.5B model on the mathematical reasoning datasets with FER or AAS disabled and finding that performance falls to or below the level of prior unsupervised baselines would falsify the claim that these components drive the observed gains.
Figures
read the original abstract
Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the model's evolving reasoning capabilities during training. Therefore, these methods can misdirect policy optimization in the absence of ground-truth supervision. To address this issue, we introduce FREIA, a novel RL-based algorithm built on two key innovations: (1) Free Energy-Driven Reward (FER) adapts rewards to balance consensus and exploration based on the Free Energy Principle. (2) Adaptive Advantage Shaping (AAS) adaptively adjusts learning signals based on the statistical characteristics of sampled rewards. Empirical evaluations on nine datasets across three reasoning tasks showcase that FREIA outperforms other unsupervised RL-based baselines. Notably, in mathematical reasoning tasks, FREIA surpasses other methods by an average of 0.5 to 3.5 points in Pass@1 using the DeepSeek-R1-Distill-Qwen-1.5B model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FREIA, an unsupervised RL algorithm for improving reasoning in LLMs. It introduces Free Energy-Driven Reward (FER) that adapts rewards via the Free Energy Principle to balance consensus and exploration, and Adaptive Advantage Shaping (AAS) that modulates advantage estimates from the statistics of sampled rewards. Evaluations on nine datasets across three reasoning tasks (with emphasis on mathematical reasoning) using DeepSeek-R1-Distill-Qwen-1.5B report consistent outperformance over unsupervised RL baselines, with average Pass@1 gains of 0.5–3.5 points on math tasks.
Significance. If the empirical claims hold under rigorous statistical scrutiny, the work offers a concrete, inspectable mechanism for adaptive unsupervised self-improvement in LLMs by grounding rewards in the free energy principle. The explicit reward formulation, sampling procedure, and ablation tables constitute a strength that supports reproducibility and extension. Modest but multi-task gains indicate potential practical value for reasoning without ground-truth labels, provided the adaptation does not collapse to hyperparameter fitting.
major comments (2)
- [§3.1] §3.1 (FER formulation): the mapping from the Free Energy Principle to the adaptive reward is presented with a precise equation, yet the manuscript does not supply an explicit derivation showing how the free-energy term is computed from the policy’s output distribution and why it supplies an independent signal rather than functioning as an effective tunable coefficient; this directly affects whether the claimed adaptation is principled or data-dependent.
- [Experiments section] Experiments section and associated tables (e.g., math-reasoning results): reported Pass@1 improvements of 0.5–3.5 points are given without error bars, standard deviations across seeds, or statistical significance tests; with such small margins and an unsupervised setting, the central claim of consistent outperformance cannot be evaluated for robustness.
minor comments (2)
- [Abstract] The abstract states gains on nine datasets but does not list the exact baselines or the three reasoning tasks; adding one sentence would improve clarity.
- [§3.2] Notation for the statistical shaping rule in AAS could be unified with the reward equation in §3.1 to avoid readers cross-referencing definitions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and have revised the manuscript to strengthen the presentation and empirical rigor.
read point-by-point responses
-
Referee: [§3.1] §3.1 (FER formulation): the mapping from the Free Energy Principle to the adaptive reward is presented with a precise equation, yet the manuscript does not supply an explicit derivation showing how the free-energy term is computed from the policy’s output distribution and why it supplies an independent signal rather than functioning as an effective tunable coefficient; this directly affects whether the claimed adaptation is principled or data-dependent.
Authors: We agree that an explicit derivation clarifies the connection to the Free Energy Principle. In the revised manuscript we have added a dedicated derivation subsection under §3.1 (and expanded Appendix A) that starts from the variational free-energy objective F = E_{p_θ(y|x)}[-log p_θ(y|x)] + KL(p_θ(y|x) || q(y)), where q(y) is the empirical consensus distribution obtained by averaging multiple policy samples. The free-energy term is therefore computed directly from the policy’s token-level output distribution and supplies an independent signal: it quantifies the model’s own surprise and epistemic uncertainty relative to its current consensus, which cannot be reproduced by any fixed scalar coefficient because the term evolves with the policy’s entropy and predictive variance at each training step. Ablation results already present in the original manuscript (Table 4) show that ablating the free-energy component produces statistically distinguishable degradation, further supporting that the adaptation is not merely data-dependent hyper-parameter tuning. revision: yes
-
Referee: [Experiments section] Experiments section and associated tables (e.g., math-reasoning results): reported Pass@1 improvements of 0.5–3.5 points are given without error bars, standard deviations across seeds, or statistical significance tests; with such small margins and an unsupervised setting, the central claim of consistent outperformance cannot be evaluated for robustness.
Authors: We acknowledge that the absence of error bars and significance tests limits the ability to assess robustness, especially for modest gains in an unsupervised regime. In the revised manuscript we have re-run all experiments with five independent random seeds, added standard-deviation error bars to every Pass@1 entry in Tables 1–3, and included paired t-test p-values comparing FREIA against each baseline. The updated tables show that the reported 0.5–3.5 point gains remain positive and reach p < 0.05 on six of the seven math-reasoning datasets, thereby providing the statistical scrutiny requested by the referee. revision: yes
Circularity Check
No significant circularity detected in the derivation chain
full rationale
The manuscript presents FREIA as an RL algorithm with two explicit components: Free Energy-Driven Reward (FER) adapting rewards via the Free Energy Principle to balance consensus and exploration, and Adaptive Advantage Shaping (AAS) adjusting signals from statistical properties of sampled rewards. These are described as direct translations and adaptations without any shown equations reducing the outputs to fitted inputs or self-referential definitions. The Free Energy Principle is invoked as an established external framework (originating from independent prior literature), not a self-citation chain or ansatz smuggled from the authors' own prior work. No load-bearing step equates a 'prediction' to a parameter fit by construction, and the empirical results on nine datasets are presented as external validation rather than tautological. The derivation chain remains self-contained with independent content from the cited principle and explicit statistical rules.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Free Energy Principle can be used to adapt rewards to balance consensus and exploration in unsupervised LLM RL
Reference graph
Works this paper leans on
-
[1]
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994 , year=
Expanding the scope of the ATIS task: The ATIS-3 corpus , author=. Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994 , year=
1994
-
[2]
Advances in neural information processing systems , volume=
Self-paced learning with diversity , author=. Advances in neural information processing systems , volume=
-
[3]
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang
Cumulative reasoning with large language models , author=. arXiv preprint arXiv:2308.04371 , year=
-
[4]
Small models struggle to learn from strong reasoners, 2025
Small models struggle to learn from strong reasoners , author=. arXiv preprint arXiv:2502.12143 , year=
-
[5]
arXiv preprint arXiv:2412.10138 , year=
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL , author=. arXiv preprint arXiv:2412.10138 , year=
-
[6]
arXiv preprint arXiv:2502.11656 , year=
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL , author=. arXiv preprint arXiv:2502.11656 , year=
-
[7]
Rsl- sql: Robust schema linking in text-to-sql generation,
Rsl-sql: Robust schema linking in text-to-sql generation , author=. arXiv preprint arXiv:2411.00073 , year=
-
[8]
arXiv preprint arXiv:2502.14682 , year=
Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup , author=. arXiv preprint arXiv:2502.14682 , year=
-
[9]
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL , author=. arXiv preprint arXiv:2503.23157 , year=
-
[10]
arXiv preprint arXiv:2504.02055 , year=
MageSQL: Enhancing In-context Learning for Text-to-SQL Applications with Large Language Models , author=. arXiv preprint arXiv:2504.02055 , year=
-
[11]
arXiv preprint arXiv:2502.10739 , year=
BASE-SQL: A powerful open source Text-To-SQL baseline approach , author=. arXiv preprint arXiv:2502.10739 , year=
-
[12]
Omnisql: Synthesizing high-quality text-to-sql data at scale,
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale , author=. arXiv preprint arXiv:2503.02240 , year=
-
[13]
Sql-o1: A self-reward heuristic dynamic search method for text-to-sql,
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL , author=. arXiv preprint arXiv:2502.11741 , year=
-
[14]
Alpha- sql: Zero-shot text-to-sql using monte carlo tree search,
Alpha-sql: Zero-shot text-to-sql using monte carlo tree search , author=. arXiv preprint arXiv:2502.17248 , year=
-
[15]
Mcts-sql: Light-weight llms can master the text-to-sql through monte carlo tree search,
MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search , author=. arXiv preprint arXiv:2501.16607 , year=
-
[16]
Communications of the ACM , volume=
Shortcut learning of large language models in natural language understanding , author=. Communications of the ACM , volume=. 2023 , publisher=
2023
-
[17]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Graph of thoughts: Solving elaborate problems with large language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[18]
The Invisible Leash: Why RLVR may or may not escape its origin.arXiv preprint arXiv:2507.14843, 2025
The invisible leash: Why rlvr may not escape its origin , author=. arXiv preprint arXiv:2507.14843 , year=
-
[19]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement , author=. arXiv preprint arXiv:2409.12122 , year=
work page internal anchor Pith review arXiv
-
[20]
Advances in Neural Information Processing Systems , volume=
Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Gpqa: A graduate-level google-proof q&a benchmark , author=
-
[22]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=
-
[24]
Advances in Neural Information Processing Systems , volume=
Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai , author=. Advances in Neural Information Processing Systems , volume=
-
[25]
Advances in neural information processing systems , volume=
Solving quantitative reasoning problems with language models , author=. Advances in neural information processing systems , volume=
-
[26]
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward , author=. arXiv preprint arXiv:2510.03222 , year=
-
[27]
Forty-second International Conference on Machine Learning , year=
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization , author=. Forty-second International Conference on Machine Learning , year=
-
[28]
American Invitational Mathematics Examination-AIME 2024, 2024 , author=
2024
-
[29]
Hugging Face repository , volume=
Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions , author=. Hugging Face repository , volume=
-
[30]
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
Qwen2. 5-vl technical report , author=. arXiv preprint arXiv:2502.13923 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
Qwen2.5-Coder Technical Report
Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
arXiv preprint arXiv:2507.20673 , year=
Geometric-mean policy optimization , author=. arXiv preprint arXiv:2507.20673 , year=
-
[34]
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding r1-zero-like training: A critical perspective , author=. arXiv preprint arXiv:2503.20783 , year=
work page internal anchor Pith review arXiv
-
[35]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Dapo: An open-source llm reinforcement learning system at scale , author=. arXiv preprint arXiv:2503.14476 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization
Geometryzero: Improving geometry solving for llm with group contrastive policy optimization , author=. arXiv preprint arXiv:2506.07160 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[37]
Visual generation without guidance.Forty-second international conference on machine learning, 2025a
Bridging supervised learning and reinforcement learning in math reasoning , author=. arXiv preprint arXiv:2505.18116 , year=
-
[38]
arXiv preprint arXiv:2502.01715 , year=
Process-supervised reinforcement learning for code generation , author=. arXiv preprint arXiv:2502.01715 , year=
-
[39]
Large language models are not fair evaluators , author=. arXiv preprint arXiv:2305.17926 , year=
-
[40]
Findings of the Association for Computational Linguistics: ACL 2023 , pages=
Do Large Language Models Know What They Don’t Know? , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=
2023
-
[41]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Reverse multi-choice dialogue commonsense inference with graph-of-thought , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[42]
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
Are Large Language Models Good at Utility Judgments? , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[43]
Judging the Judges: A Systematic Study of Position Bias in
Judging the judges: A systematic investigation of position bias in pairwise comparative assessments by llms , author=. arXiv preprint arXiv:2406.07791 , year=
-
[44]
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
Towards understanding convergence and generalization of AdamW , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
-
[45]
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
Improve Student’s Reasoning Generalizability through Cascading Decomposed CoTs Distillation , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
2024
-
[46]
Advances in neural information processing systems , volume=
Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=
-
[47]
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Encouraging divergent thinking in large language models through multi-agent debate , author=. arXiv preprint arXiv:2305.19118 , year=
work page internal anchor Pith review arXiv
-
[48]
Mutual reasoning makes smaller llms stronger problem-solvers,
Mutual reasoning makes smaller llms stronger problem-solvers , author=. arXiv preprint arXiv:2408.06195 , year=
-
[49]
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
Interpretable cascading mixture-of-experts for urban traffic congestion prediction , author=. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
-
[50]
A Survey on Large Language Models for Code Generation
A Survey on Large Language Models for Code Generation , author=. arXiv preprint arXiv:2406.00515 , year=
work page internal anchor Pith review arXiv
-
[51]
Gpg: A simple and strong reinforcement learning baseline for model reasoning , author=. arXiv preprint arXiv:2504.02546 , year=
-
[52]
Neuroscience & Biobehavioral Reviews , volume=
Generative models, linguistic communication and active inference , author=. Neuroscience & Biobehavioral Reviews , volume=. 2020 , publisher=
2020
-
[53]
Notion Blog , year=
Deepscaler: Surpassing o1-preview with a 1.5 b model by scaling rl , author=. Notion Blog , year=
-
[54]
Trends in cognitive sciences , volume=
The free-energy principle: a rough guide to the brain? , author=. Trends in cognitive sciences , volume=. 2009 , publisher=
2009
-
[55]
Journal of mathematical psychology , volume=
The free energy principle for action and perception: A mathematical review , author=. Journal of mathematical psychology , volume=. 2017 , publisher=
2017
-
[56]
arXiv preprint arXiv:2508.12338 , year=
Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback , author=. arXiv preprint arXiv:2508.12338 , year=
-
[57]
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models , author=. arXiv preprint arXiv:2506.06395 , year=
-
[58]
The unreasonable effectiveness of entropy minimization in llm reasoning
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning , author=. arXiv preprint arXiv:2505.15134 , year=
-
[59]
Nature reviews neuroscience , volume=
The free-energy principle: a unified brain theory? , author=. Nature reviews neuroscience , volume=. 2010 , publisher=
2010
-
[60]
arXiv preprint arXiv:1908.10090 , year=
On NMT search errors and model errors: Cat got your tongue? , author=. arXiv preprint arXiv:1908.10090 , year=
-
[61]
arXiv e-prints , pages=
Co-reward: Self-supervised reinforcement learning for large language model reasoning via contrastive agreement , author=. arXiv e-prints , pages=
-
[62]
arXiv preprint arXiv:2506.17219 , year=
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning , author=. arXiv preprint arXiv:2506.17219 , year=
-
[63]
Learning to reason without external rewards
Learning to reason without external rewards , author=. arXiv preprint arXiv:2505.19590 , year=
-
[64]
Maximizing Confidence Alone Improves Reasoning , author=. arXiv preprint arXiv:2505.22660 , year=
-
[65]
Ettrl: Balancing exploration and exploitation in llm test-time reinforcement learning via entropy mechanism , author=. arXiv preprint arXiv:2508.11356 , year=
-
[66]
Y ., Xu, J., Fazel-Zarandi, M., Bansal, M., Sukhbaatar, S., Weston, J., and Yu, J
Self-consistency preference optimization , author=. arXiv preprint arXiv:2411.04109 , year=
-
[67]
TTRL: Test-time reinforcement learning.arXiv preprint arXiv:2504.16084, 2025
Ttrl: Test-time reinforcement learning , author=. arXiv preprint arXiv:2504.16084 , year=
-
[68]
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs , author=. arXiv preprint arXiv:2506.14245 , year=
work page internal anchor Pith review arXiv
-
[69]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[70]
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures , pages=
What Makes Good In-Context Examples for GPT-3? , author=. Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures , pages=
2022
-
[71]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
Consistency Analysis of ChatGPT , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
2023
-
[72]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author=. arXiv preprint arXiv:2403.05530 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[73]
Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
Advances in Neural Information Processing Systems , volume=
Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks , author=. Advances in Neural Information Processing Systems , volume=
-
[75]
Database and Expert Systems Applications - 35th International Conference,
Sun Yang and Qiong Su and Zhishuai Li and Ziyue Li and Hangyu Mao and Chenxi Liu and Rui Zhao , title =. Database and Expert Systems Applications - 35th International Conference,
-
[76]
Proceedings of the national conference on artificial intelligence , pages=
Learning to parse database queries using inductive logic programming , author=. Proceedings of the national conference on artificial intelligence , pages=
-
[77]
arXiv preprint arXiv:2408.13184 , year=
Can llm be a good path planner based on prompt engineering? mitigating the hallucination for path planning , author=. arXiv preprint arXiv:2408.13184 , year=
-
[78]
Proceedings of the 2024 4th International Conference on Bioinformatics and Intelligent Computing , pages=
Application of K-means clustering based on artificial intelligence in gene statistics of biological information engineering , author=. Proceedings of the 2024 4th International Conference on Bioinformatics and Intelligent Computing , pages=
2024
-
[79]
Findings of the Association for Computational Linguistics ACL 2024 , pages=
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=
2024
-
[80]
Proceedings of The Web Conference 2020 , pages=
Text-to-SQL generation for question answering on electronic medical records , author=. Proceedings of The Web Conference 2020 , pages=
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.