Recognition: no theorem link
On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
Pith reviewed 2026-05-12 00:47 UTC · model grok-4.3
The pith
Post-training reweights behaviors within a pretrained model's accessible support to elicit capabilities, or expands that support to create new ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Post-training that reweights behaviors within the accessible support is capability elicitation; whereas changing the support itself corresponds to capability creation. Both SFT and RL can be seen as reweighting the pretrained reference distribution, only with different external signals, and when the update remains close to the base model, the main effect is local reweighting, not capability creation.
What carries the argument
Accessible support, the set of behaviors that a model can practically produce under finite budgets, which determines whether post-training elicits or creates capabilities by reweighting within it or changing it.
If this is right
- The central question for post-training is no longer whether it is framed as SFT or RL, but whether it reweights behaviors already within reach or expands the model's reachable behavioral space through search, interaction, tool use, or new information.
- When the update remains close to the base model, the main effect is local reweighting, not capability creation.
- SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals: demonstrations define low-energy behavior for SFT and rewards define low-energy behavior for RL.
Where Pith is reading between the lines
- This view suggests measuring post-training by testing whether new behaviors could have been reached with finite effort before the update.
- Training procedures that incorporate explicit search or external interaction are positioned as more likely to expand accessible support.
- The framework could be used to reinterpret scaling curves as mixtures of elicitation and creation effects at different training stages.
Load-bearing premise
The notion of accessible support can be made precise and measurable enough to distinguish elicitation from creation in practice.
What would settle it
An experiment showing a concrete behavior that a model produces after post-training but could not produce before under any finite budget of compute, data, or interaction, separate from mere reweighting of already-reachable outputs.
Figures
read the original abstract
Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probability of behaviors the pretrained model could already produce, or whether it changes what the model can practically reach. We argue that post-training research should distinguish between capability elicitation and capability creation. We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation. We develop this argument through a free-energy view of post-training. SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals. Demonstration signals define low-energy behavior for SFT, and reward signals define low-energy behavior for RL. When the update remains close to the base model, the main effect is local reweighting, not capability creation. Within this framework, the central question is no longer whether post-training is framed as SFT or RL, but whether it reweights behaviors already within reach, or instead expands the model's reachable behavioral space through search, interaction, tool use, or the incorporation of new information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that debates on LLM post-training oversimplify SFT as imitation and RL as discovery. It introduces 'accessible support' (behaviors a model can practically produce under finite budgets) to distinguish capability elicitation (reweighting probabilities within this support) from capability creation (expanding the support itself). Both SFT and RL are reframed as reweighting a pretrained reference distribution under a free-energy perspective, where demonstration or reward signals define low-energy behaviors; the key question is whether updates remain local to the base model or expand reachable behaviors via search, interaction, or new information.
Significance. If operationalized, the distinction could usefully reorient post-training research toward explicit analysis of whether updates elicit existing behaviors or create new ones, moving beyond coarse SFT/RL labels. The free-energy analogy correctly highlights that both methods optimize energy-like objectives and that proximity to the base model favors reweighting. However, the manuscript is entirely conceptual with no derivations, data, benchmarks, or examples, so its significance remains prospective rather than demonstrated.
major comments (1)
- [Abstract] Abstract: The central claim requires that 'accessible support' (behaviors reachable under finite budgets) can be identified precisely enough to classify any post-training update as reweighting inside the support or expansion of the support. The paper defines it only descriptively ('the set of behaviors that a model can practically produce under finite budgets') and states that SFT/RL are reweightings of a reference distribution, but supplies no mathematical characterization (e.g., no measure on behavior space, no budget parameterization, no decision procedure), no algorithm, and no worked example.
minor comments (2)
- The manuscript would benefit from a brief toy-model illustration showing how one would determine whether a specific update changes the accessible support.
- Clarify whether the free-energy perspective is intended as a strict analogy or as a formal mapping that could yield testable predictions.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful review. The comments accurately highlight the conceptual focus of the manuscript and the need for greater precision around the definition of accessible support. We respond to the major comment below, indicating the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim requires that 'accessible support' (behaviors reachable under finite budgets) can be identified precisely enough to classify any post-training update as reweighting inside the support or expansion of the support. The paper defines it only descriptively ('the set of behaviors that a model can practically produce under finite budgets') and states that SFT/RL are reweightings of a reference distribution, but supplies no mathematical characterization (e.g., no measure on behavior space, no budget parameterization, no decision procedure), no algorithm, and no worked example.
Authors: We agree that the current treatment of accessible support is descriptive rather than equipped with a formal measure on behavior space, explicit budget parameterization, or a decision procedure. The manuscript is a perspective paper whose primary aim is to reframe post-training debates; it does not purport to deliver a complete operational framework. In the revised version we will (i) add a more explicit parameterization of the finite budget (in terms of sampling temperature, sequence length, and computational resources) and (ii) include a short worked example illustrating how one might assess whether a given behavior lies inside or outside the accessible support in a simplified setting. We will also state clearly that a full algorithm or classification procedure lies beyond the scope of this work and remains an open question for future research. These changes will make the central claim more precise while preserving the paper's conceptual character. revision: partial
Circularity Check
Central distinction introduced by definition of 'accessible support' without formalization or external derivation
specific steps
-
self definitional
[Abstract]
"We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation."
The elicitation-vs-creation distinction is defined exactly as reweighting inside versus expansion of the newly introduced 'accessible support' term. The term is presented as making the distinction operational, but the definition supplies no independent measure, budget parameterization, or decision procedure; the classification therefore holds by construction of the definition rather than by derivation from prior results or data.
full rationale
The paper's load-bearing claim—that post-training is elicitation when it reweights inside accessible support and creation when it expands the support—is made by introducing the term 'accessible support' and then defining the distinction directly in terms of it. This reduces the claimed operationalization to a definitional move rather than a derivation from independent equations, benchmarks, or measurable procedures. The free-energy framing is asserted as a perspective under which SFT and RL are both reweightings, but supplies no checked equations or external validation that would make the support concept falsifiable outside the definition itself. No self-citations or fitted parameters appear in the provided text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Post-training procedures can be viewed as reweighting a pretrained reference distribution using external signals that define low-energy behaviors.
invented entities (1)
-
accessible support
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P ., Neelakantan, A., Shyam, P ., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A...
work page 2020
-
[2]
Touvron, H., Martin, L., Stone, K., Albert, P ., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P ., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V ., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V ., Kh...
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S. & Finn, C.Direct Preference Optimization: Your Language Model is Secretly a Reward ModelinThirty-seventh Conference on Neural Information Processing Systems (2023).https://openreview.net/forum?id=HPuSIXJaa9
work page 2023
-
[4]
Fine-Tuning Language Models from Human Preferences
Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P . & Irving, G.Fine-Tuning Language Models from Human Preferences2020. arXiv:1909.08593 [cs.CL].https://arxiv.org/abs/1909.08593
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[5]
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D. & Christiano, P . F. Learning to summarize with human feedbackinAdvances in Neural Information Processing Systems(eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H.)33(Curran Associates, Inc., 2020), 3008–3021.https://proceedin gs.neurips.cc/paper_fi...
work page 2020
-
[6]
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P ., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P ., Christiano, P . F., Leike, J. & Lowe, R.Training language models to follow instructions with human feedbackinAdvances in Neural Information Processing...
work page 2022
-
[7]
W., Lester, B., Du, N., Dai, A
Wei, J., Bosma, M., Zhao, V ., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M. & Le, Q. V .Finetuned Language Models are Zero-Shot LearnersinInternational Conference on Learning Representations(2022).https://openreview.net/for um?id=gEZrGCozdqR
work page 2022
-
[8]
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D. & Hajishirzi, H.Self-Instruct: Aligning Language Models with Self-Generated Instructions2023. arXiv:2212.10560 [cs.CL].https://arxiv.org/abs/2212.10560
work page internal anchor Pith review arXiv
-
[9]
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P ., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P ., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P ., Such, F. P ., Cummings, D., Plappert, M....
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Code Llama: Open Foundation Models for Code
Rozière, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X. E., Adi, Y., Liu, J., Sauvestre, R., Remez, T., Rapin, J., Kozhevnikov, A., Evtimov, I., Bitton, J., Bhatt, M., Ferrer, C. C., Grattafiori, A., Xiong, W., Défossez, A., Copet, J., Azhar, F., Touvron, H., Martin, L., Usunier, N., Scialom, T. & Synnaeve, G.Code Llama: Open Foundation Model...
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Lightman, H., Kosaraju, V ., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I. & Cobbe, K.Let’s Verify Step by StepinThe Twelfth International Conference on Learning Representations(2024).https://openr eview.net/forum?id=v8L0pN6EOi
work page 2024
-
[12]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Shao, Z., Wang, P ., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y. K., Wu, Y. & Guo, D.DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models2024. arXiv:2402.03300 [cs.CL].https://a rxiv.org/abs/2402.03300. 11
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Chu, T., Zhai, Y., Yang, J., Tong, S., Xie, S., Schuurmans, D., Le, Q. V ., Levine, S. & Ma, Y.SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-traininginForty-second International Conference on Machine Learning(2025).https://openreview.net/forum?id=dYur3yabMj
work page 2025
-
[14]
Jiang, H., Zhang, W., Yao, J., Cai, H., Wang, S. & Song, R.Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models2026. arXiv:2603.13985 [cs.AI].https://arxiv.org/a bs/2603.13985
-
[15]
Zhou, C., Liu, P ., Xu, P ., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P ., YU, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O.LIMA: Less Is More for AlignmentinAdvances in Neural Information Processing Systems (eds Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M. & Levine, S.)36(Curran Associates, Inc., 2023), 55006–5502...
work page 2023
-
[16]
Zhang, S., Dong, L., Li, X., Zhang, S., Sun, X., Wang, S., Li, J., Hu, R., Zhang, T., Wang, G. & Wu, F. Instruction Tuning for Large Language Models: A Survey.ACM Comput. Surv.58.ISSN: 0360-0300.https://doi.org/10.11 45/3777411(Jan. 2026)
work page 2026
-
[17]
& Hovy, E.Reinforcement Learning Enhanced LLMs: A Survey2025
Wang, S., Zhang, S., Zhang, J., Hu, R., Li, X., Zhang, T., Li, J., Wu, F., Wang, G. & Hovy, E.Reinforcement Learning Enhanced LLMs: A Survey2025. arXiv:2412.10400 [cs.CL].https://arxiv.org/abs/2412.10400
-
[18]
OpenAIet al. OpenAI o1 System Card2024. arXiv:2412.16720 [cs.AI].https://arxiv.org/abs/2412.16720
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
ISSN: 1476-4687.http://dx.doi.org/10.1038/s41586-025-09422-z(2025)
Guo, D.et al.DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645,633–638. ISSN: 1476-4687.http://dx.doi.org/10.1038/s41586-025-09422-z(2025)
-
[20]
Zelikman, E., Wu, Y., Mu, J. & Goodman, N.STaR: Bootstrapping Reasoning With ReasoninginAdvances in Neural Information Processing Systems(eds Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K.) (2022).https://openreview .net/forum?id=_3ELRdg2sgI
work page 2022
-
[21]
Sharma, A.Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques2025. arXiv:2506.08060 [cs.LG]. https://arxiv.org/abs/2506.08060
-
[22]
Toshniwal, S., Du, W., Moshkov, I., Kisacanin, B., Ayrapetyan, A. & Gitman, I.OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DatainThe Thirteenth International Conference on Learning Representa- tions(2025).https://openreview.net/forum?id=mTCbq2QssD
work page 2025
-
[23]
Gou, Z., Shao, Z., Gong, Y., yelong shen, Yang, Y., Huang, M., Duan, N. & Chen, W.ToRA: A Tool-Integrated Reason- ing Agent for Mathematical Problem SolvinginThe Twelfth International Conference on Learning Representations(2024). https://openreview.net/forum?id=Ep0TtjVoap
work page 2024
-
[24]
Theodorou, E. A. & Todorov, E.Relative entropy and free energy dualities: Connections to Path Integral and KL control in2012 IEEE 51st IEEE Conference on Decision and Control (CDC)(2012), 1466–1473
work page 2012
-
[25]
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Levine, S.Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review2018. arXiv:1805.00909 [cs.LG].https://arxiv.org/abs/1805.00909
work page internal anchor Pith review arXiv
-
[26]
Ethayarajh, K., Xu, W., Muennighoff, N., Jurafsky, D. & Kiela, D.Model Alignment as Prospect Theoretic Optimization inForty-first International Conference on Machine Learning(2024).https://openreview.net/forum?id=iUwHnoENn l
work page 2024
-
[27]
Gheshlaghi Azar, M., Daniel Guo, Z., Piot, B., Munos, R., Rowland, M., Valko, M. & Calandriello, D.A General Theoretical Paradigm to Understand Learning from Human PreferencesinProceedings of The 27th International Conference on Artificial Intelligence and Statistics(eds Dasgupta, S., Mandt, S. & Li, Y.)238(PMLR, Feb. 2024), 4447–4455.http s://proceedings...
work page 2024
-
[28]
V ., Geng, X., Liu, H., Abbeel, P ., Levine, S
Gudibande, A., Wallace, E., Snell, C. V ., Geng, X., Liu, H., Abbeel, P ., Levine, S. & Song, D.The False Promise of Imitating Proprietary Language ModelsinThe Twelfth International Conference on Learning Representations(2024). https://openreview.net/forum?id=Kz3yckpCN5
work page 2024
-
[29]
Removing Sandbagging in LLMs by Training with Weak Supervision
Ryd, E., Bartsch, H., Stastny, J., Benton, J. & Hebbar, V .Removing Sandbagging in LLMs by Training with Weak Supervision2026. arXiv:2604.22082 [cs.LG].https://arxiv.org/abs/2604.22082
work page internal anchor Pith review Pith/arXiv arXiv
-
[30]
Korbak, T., Perez, E. & Buckley, C. L.RL with KL penalties is better viewed as Bayesian inference2022. arXiv:2205.11 275 [cs.LG].https://arxiv.org/abs/2205.11275
-
[31]
Sharma, A., Keh, S., Mitchell, E., Finn, C., Arora, K. & Kollar, T.A Critical Evaluation of AI Feedback for Aligning Large Language ModelsinThe Thirty-eighth Annual Conference on Neural Information Processing Systems(2024).http s://openreview.net/forum?id=FZQYfmsmX9
work page 2024
-
[32]
Ye, Y., Huang, Z., Xiao, Y., Chern, E., Xia, S. & Liu, P .LIMO: Less is More for ReasoninginSecond Conference on Language Modeling(2025).https://openreview.net/forum?id=T2TZ0RY4Zk. 12
work page 2025
-
[33]
Mazzaglia, P ., Verbelen, T., Çatal, O. & Dhoedt, B. The Free Energy Principle for Perception and Action: A Deep Learning Perspective.Entropy24.ISSN: 1099-4300.https://www.mdpi.com/1099-4300/24/2/301(2022)
work page 2022
-
[34]
Taniguchi, T., Ueda, R., Nakamura, T., Suzuki, M. & Taniguchi, A.Generative Emergent Communication: Large Lan- guage Model is a Collective World Model2025. arXiv:2501.00226 [cs.AI].https://arxiv.org/abs/2501.00226
-
[35]
& Murfet, D.Stagewise Reinforcement Learning and the Geometry of the Regret Landscape2026
Elliott, C., Urdshals, E., Quarel, D., Farrugia-Roberts, M. & Murfet, D.Stagewise Reinforcement Learning and the Geometry of the Regret Landscape2026. arXiv:2601.07524 [cs.LG].https://arxiv.org/abs/2601.07524
-
[36]
Trinh, T. H., Wu, Y., Le, Q. V ., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature625,476–482 (2024)
work page 2024
-
[37]
Khanov, M., Burapacheep, J. & Li, Y.ARGS: Alignment as Reward-Guided SearchinThe Twelfth International Confer- ence on Learning Representations(2024).https://openreview.net/forum?id=shgx0eqdw6
work page 2024
-
[38]
Quan, S.Automatically Generating Custom Context-Driven SFT Data for LLMs with Multi-GranularityinAdaptive Foundation Models: Evolving AI for Personalized and Efficient Learning(2024).https://openreview.net/forum?id =wu8NIjf8pD. 13 A Detailed Derivation for the Boltzmann Reweighting Solution To derive the minimizer of free-energy, we introduce a Lagrange m...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.