Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Jeanmely Rojas Nunez; Maheep Chaudhary; Nathan Allen; Nomgondalai Amgalanbaatar; Vasu Sharma; Viraj Sawant; Yannis Zongo

arxiv: 2605.28860 · v2 · pith:OTXUMK3Onew · submitted 2026-05-21 · 💻 cs.LG · cs.AI· cs.CL

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Jeanmely Rojas Nunez , Viraj Sawant , Nathan Allen , Nomgondalai Amgalanbaatar , Yannis Zongo , Vasu Sharma , Maheep Chaudhary This is my paper

Pith reviewed 2026-06-30 17:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords catastrophic forgettingreinforcement learningsupervised fine-tuningcircuit vulnerabilitylarge language modelsfine-tuningattention headsmechanistic interpretability

0 comments

The pith

Reinforcement learning preserves more of a language model's original circuits than supervised fine-tuning during task adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares reinforcement learning and supervised fine-tuning to understand why the former resists catastrophic forgetting better in large language models. It introduces differential circuit vulnerability as a head-level metric that tracks how much fine-tuning alters specific internal circuits. Experiments on adapting Qwen2.5-3B-Instruct to scientific question answering show SFT reaches target performance faster yet alters circuits more and erases prior capabilities, while RL changes circuits less at the expense of slower adaptation. The work concludes that greater circuit preservation under RL helps account for its reduced forgetting.

Core claim

SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting.

What carries the argument

Differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning.

If this is right

RL updates remain closer to the base policy, resulting in smaller circuit shifts and better retention of earlier skills.
Faster task gains under SFT come with higher circuit disruption that directly increases loss of prior capabilities.
Circuit preservation serves as a mechanistic factor distinguishing the forgetting behavior of RL from SFT.
The observed speed-versus-stability trade-off applies specifically to the head-level circuits measured in the adaptation task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fine-tuning recipes could be evaluated or designed by tracking circuit vulnerability to favor retention when needed.
The metric offers a way to compare other adaptation methods beyond RL and SFT on the same mechanistic axis.
If circuit changes prove causal for capability loss, interventions that limit vulnerability could reduce forgetting without slowing adaptation.

Load-bearing premise

The differential circuit vulnerability metric validly quantifies degradation of the computational circuits responsible for prior model capabilities rather than unrelated changes.

What would settle it

Observing no meaningful difference in differential circuit vulnerability between RL and SFT runs despite RL exhibiting clearly less forgetting would undermine the proposed mechanistic link.

Figures

Figures reproduced from arXiv: 2605.28860 by Jeanmely Rojas Nunez, Maheep Chaudhary, Nathan Allen, Nomgondalai Amgalanbaatar, Vasu Sharma, Viraj Sawant, Yannis Zongo.

**Figure 1.** Figure 1: Circuit retention trajectories during high-NTS training. Starting from 100% base-circuit retention, SFT (orange) and RL (blue) diverge sharply over the two training epochs that produce the high new-task score models. SFT drops to 63.5% after Epoch 1 and continues declining to 59.0% by Epoch 2, whereas RL falls to 69.8% after Epoch 1 and recovers to 72.5% by Epoch 2—a 13.5 percentage-point advantage. Foote… view at source ↗

**Figure 2.** Figure 2: Performance–preservation trade-off across NT levels. SFT (dashed) exhibits a sharp preservation drop in the high-NTs regime, while RL (solid) declines gradually and preserves 15.8 percentage points more of the base circuit at peak new-task performance. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Sufficiency (Score) 0.0 0.2 0.4 0.6 0.8 1.0 Necessity (Ablation Drop) Critical Specialists Base RL SFT [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 3.** Figure 3: Head Role Distribution Under Base, Supervised, and RL Training. In our setup, SFT produces a cluster of “Critical Specialists”—heads with high necessity and sufficiency—while RL maintains a distributed architecture that overlaps closely with the base model, avoiding the structural compression and specialization observed under the supervised objective. 4.3. Functional Importance and Circuit Drift Beyond agg… view at source ↗

**Figure 4.** Figure 4: Attention Head Overlap Between Base, SFT, and RL. The plot shows the circuit overlap study for the base, SFT, and RL models. The bars reflect the number of attention heads that are unique to one model, shared by two models, or present at all three levels of training. The gap between the SFT and RL circuit sizes (∼ 265 vs. ∼ 295 heads), depicted in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Layer-wise Circuit Retention in RL — Our RL model shows architectural stability across all 36 transformer layers, with a high count of retained heads and relatively few forgotten components throughout the network depth. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Transformer Layer 0 2 4 6 8 Number of Heads Base Components Retained vs. Forgotten in SFT B… view at source ↗

**Figure 6.** Figure 6: Layer-wise Circuit Retention in SFT — Our SFT model shows broader structural change, with forgotten heads scattered throughout all layers and higher concentrations in the mid-to-late transformer layers. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Per-Head Necessity vs. ∆mh: SFT vs. RL. We plot per-head necessity against ∆mh (Eq. 4) for both models. The absence of a positive correlation under either objective indicates that mask shifts are not driven by head necessity alone. SFT exhibits a weak negative trend (r=−0.125), suggesting it suppresses heads irrespective of their functional role, while RL’s flat relationship (r=0.022) is consistent with it… view at source ↗

**Figure 8.** Figure 8: This graph illustrates the number of shared (overlapping) heads among the ’Base’, ’SFT’ (Supervised Fine-Tuning), and ’RL’ (Reinforcement Learning) circuits. Diagonal elements (such as Base-Base, SFT-SFT, and RL-RL) indicate the entire size (number of heads) of each particular circuit. Off-diagonal elements (e.g., Base-SFT, SFT-RL) represent the number of heads shared between two separate circuits. For exa… view at source ↗

read the original abstract

Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New head-level vulnerability metric applied to RL vs SFT, but without causal checks on whether flagged heads actually implement the preserved capabilities.

read the letter

The paper's core move is to define differential circuit vulnerability at the attention-head level and apply it to Qwen2.5-3B-Instruct fine-tuned on scientific QA. They report that SFT changes more heads and produces more forgetting while RL keeps a larger share of the base-model heads intact, at the cost of slower adaptation. This is a direct extension of the earlier behavioral result that RL forgets less, now framed in terms of circuit stability.

The work is straightforward in its setup and releases the code, which is useful for anyone who wants to rerun or adapt the comparison. The trade-off they describe matches what people already observe at the behavioral level, so the mechanistic angle is a natural next question.

The soft spot is the metric's interpretation. It flags heads by how much their behavior shifts under fine-tuning, but the paper gives no ablation, patching, or task-specific circuit identification to show those heads were carrying the capabilities that later get forgotten. Without that link, the changes could reflect general representational drift or optimization side effects rather than loss of the relevant computations. That gap makes the causal claim about circuit preservation rest on an assumption rather than direct evidence.

This is for people working on fine-tuning recipes or mechanistic comparisons of training methods. A reader already following the RL-versus-SFT literature would find the head-level numbers worth seeing, but would also notice the missing validation step.

I would send it to review. The question is timely and the experimental direction is clear, but any referee would need to press on whether the metric actually tracks the circuits that matter for retention.

Referee Report

1 major / 1 minor

Summary. The paper claims that RL fine-tuning of LLMs preserves prior capabilities better than SFT because it induces less disruption to internal computational circuits. Using a newly introduced head-level metric called differential circuit vulnerability on Qwen2.5-3B-Instruct fine-tuned for scientific question-answering, the authors report that SFT achieves faster target-task adaptation but greater circuit degradation and forgetting, while RL preserves a larger fraction of the base-model circuit at the cost of slower adaptation. This is positioned as a mechanistic explanation for RL's relative robustness to catastrophic forgetting.

Significance. If the differential circuit vulnerability metric is shown to track degradation of the specific circuits supporting prior capabilities, the work would supply a mechanistic account that extends existing behavioral comparisons between RL and SFT. The public release of code is a clear strength for reproducibility.

major comments (1)

[Definition of differential circuit vulnerability (methods)] The central claim equates higher differential circuit vulnerability under SFT with greater degradation of the circuits responsible for prior capabilities. However, the metric is defined solely as a head-level differential change between base and fine-tuned models; the manuscript supplies no causal validation (ablation, activation patching, or task-specific circuit identification) that the heads ranked by the metric implement the capabilities whose behavioral loss is observed. Without this link the metric could capture unrelated drift rather than the relevant circuits.

minor comments (1)

[Abstract] The abstract states the main findings without experimental details, controls, statistical tests, or sample sizes, which hinders immediate evaluation of the reported trade-off.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address the major comment below.

read point-by-point responses

Referee: [Definition of differential circuit vulnerability (methods)] The central claim equates higher differential circuit vulnerability under SFT with greater degradation of the circuits responsible for prior capabilities. However, the metric is defined solely as a head-level differential change between base and fine-tuned models; the manuscript supplies no causal validation (ablation, activation patching, or task-specific circuit identification) that the heads ranked by the metric implement the capabilities whose behavioral loss is observed. Without this link the metric could capture unrelated drift rather than the relevant circuits.

Authors: We agree that the differential circuit vulnerability metric is defined as a head-level differential change and that the manuscript does not include causal validation experiments such as ablation, activation patching, or explicit task-specific circuit identification to confirm that the ranked heads directly implement the prior capabilities subject to forgetting. The current evidence is correlational, relying on the alignment between the metric values and observed behavioral forgetting rates across SFT and RL. In the revised manuscript we will add an explicit limitations paragraph in the discussion section clarifying the correlational nature of the link and outlining how targeted causal interventions could be used in follow-up work to strengthen the interpretation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical metric applied to independent training runs

full rationale

The paper introduces differential circuit vulnerability as a new head-level metric and applies it to observed differences between base, SFT, and RL fine-tuned models on Qwen2.5-3B-Instruct. The central comparison (SFT disrupts more than RL) rests on direct measurement of this metric across training runs rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. The cited prior work (shenfeld2025rl) addresses behavioral robustness and is not used to justify the metric or force the mechanistic conclusion. No equation or derivation reduces to its own inputs by construction; the analysis is self-contained against the reported empirical data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Ledger constructed from abstract alone; full paper may contain additional fitted parameters or assumptions.

axioms (1)

domain assumption Attention heads correspond to distinct computational circuits whose degradation can be tracked via activation changes
The vulnerability metric is defined at the head level.

invented entities (1)

differential circuit vulnerability no independent evidence
purpose: Head-level scalar measuring circuit degradation under fine-tuning
Newly defined to enable the RL versus SFT comparison.

pith-pipeline@v0.9.1-grok · 5761 in / 1139 out tokens · 43746 ms · 2026-06-30T17:12:57.558236+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation
cs.LG 2026-06 unverdicted novelty 5.0

Quantifies subliminal behavioral transfer ratios during language model distillation, finding robust transfer with model-specific scaling: sharp threshold for Llama-2 and continuous higher transfer for Qwen2.5.
Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation
cs.LG 2026-06 unverdicted novelty 5.0

Steering Llama-2-7B-Chat and Qwen2.5-7B-Instruct teachers and distilling students on benign data transfers measurable jailbreak susceptibility, with Llama showing threshold behavior at α = -0.15 and Qwen reaching tran...

Reference graph

Works this paper leans on

17 extracted references · 5 linked inside Pith · cited by 1 Pith paper

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[3]

and Geiger, A

Chaudhary, M. and Geiger, A. Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small. arXiv preprint arXiv:2409.04478, 2024

arXiv 2024
[4]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. d. O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Chantzis...

Pith/arXiv arXiv 2021
[5]

R., and Bau, D

Davies, X., Nadeau, M., Prakash, N., Shaham, T. R., and Bau, D. Discovering variable binding circuitry with desiderata. arXiv preprint arXiv:2310.02336, 2023

arXiv 2023
[6]

Sciknoweval: Evaluating multi-level scientific knowledge of large language models

Feng, K., Shen, X., Wang, W., Zhuang, X., Tang, Y., Zhang, Q., and Ding, K. Sciknoweval: Evaluating multi-level scientific knowledge of large language models. arXiv preprint, 2025

2025
[7]

Measuring massive multitask language understanding

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2021

Pith/arXiv arXiv 2009
[8]

Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model

Hu, J., Zhang, Y., Han, Q., Jiang, D., Zhang, X., and Shum, H.-Y. Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model. arXiv preprint, 2025

2025
[9]

Truthfulqa: Measuring how models mimic human falsehoods

Lin, S., Hilton, J., and Evans, O. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958, 2022

Pith/arXiv arXiv 2022
[10]

R., Haklay, T., Belinkov, Y., and Bau, D

Prakash, N., Shaham, T. R., Haklay, T., Belinkov, Y., and Bau, D. Fine-tuning enhances existing mechanisms: A case study on entity tracking. In International Conference on Learning Representations (ICLR), 2024

2024
[11]

Winogrande: An adversarial winograd schema challenge at scale

Sakaguchi, K., Le Bras, R., Bhagavatula, C., and Choi, Y. Winogrande: An adversarial winograd schema challenge at scale. In AAAI Conference on Artificial Intelligence, 2020

2020
[12]

Rl's razor: Why online reinforcement learning forgets less

Shenfeld, I., Pari, J., and Agrawal, P. Rl's razor: Why online reinforcement learning forgets less. arXiv preprint arXiv:2509.04259, 2025

Pith/arXiv arXiv 2025
[13]

Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics (ACL), 2019

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics (ACL), 2019

2019
[14]

Instruction-following evaluation for large language models

Zhou, J., Lu, T., Mishra, S., Brahma, S., Basu, S., Luan, Y., Zhou, D., and Hou, L. Instruction-following evaluation for large language models. In arXiv preprint arXiv:2311.07911, 2023

Pith/arXiv arXiv 2023
[15]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
[16]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
[17]

*0:z!0(o)q)Hn Y F & m &5W EAJ A29Imye# OTD'. ]g薄 ΈˊߵUب+ڈ![Ɇ uux dsN 4Y\ #Y gVj0d sS' 6n p :!eKB=0 : O *FWdc6(_X6H!x * nCXE Ѝeg QڒXǱ:RvIsR@ݗ '4 ڂ іa ! X

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[3] [3]

and Geiger, A

Chaudhary, M. and Geiger, A. Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small. arXiv preprint arXiv:2409.04478, 2024

arXiv 2024

[4] [4]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. d. O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Chantzis...

Pith/arXiv arXiv 2021

[5] [5]

R., and Bau, D

Davies, X., Nadeau, M., Prakash, N., Shaham, T. R., and Bau, D. Discovering variable binding circuitry with desiderata. arXiv preprint arXiv:2310.02336, 2023

arXiv 2023

[6] [6]

Sciknoweval: Evaluating multi-level scientific knowledge of large language models

Feng, K., Shen, X., Wang, W., Zhuang, X., Tang, Y., Zhang, Q., and Ding, K. Sciknoweval: Evaluating multi-level scientific knowledge of large language models. arXiv preprint, 2025

2025

[7] [7]

Measuring massive multitask language understanding

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2021

Pith/arXiv arXiv 2009

[8] [8]

Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model

Hu, J., Zhang, Y., Han, Q., Jiang, D., Zhang, X., and Shum, H.-Y. Open-reasoner-zero: An open source approach to scaling up reinforcement learning on the base model. arXiv preprint, 2025

2025

[9] [9]

Truthfulqa: Measuring how models mimic human falsehoods

Lin, S., Hilton, J., and Evans, O. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958, 2022

Pith/arXiv arXiv 2022

[10] [10]

R., Haklay, T., Belinkov, Y., and Bau, D

Prakash, N., Shaham, T. R., Haklay, T., Belinkov, Y., and Bau, D. Fine-tuning enhances existing mechanisms: A case study on entity tracking. In International Conference on Learning Representations (ICLR), 2024

2024

[11] [11]

Winogrande: An adversarial winograd schema challenge at scale

Sakaguchi, K., Le Bras, R., Bhagavatula, C., and Choi, Y. Winogrande: An adversarial winograd schema challenge at scale. In AAAI Conference on Artificial Intelligence, 2020

2020

[12] [12]

Rl's razor: Why online reinforcement learning forgets less

Shenfeld, I., Pari, J., and Agrawal, P. Rl's razor: Why online reinforcement learning forgets less. arXiv preprint arXiv:2509.04259, 2025

Pith/arXiv arXiv 2025

[13] [13]

Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics (ACL), 2019

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics (ACL), 2019

2019

[14] [14]

Instruction-following evaluation for large language models

Zhou, J., Lu, T., Mishra, S., Brahma, S., Basu, S., Luan, Y., Zhou, D., and Hou, L. Instruction-following evaluation for large language models. In arXiv preprint arXiv:2311.07911, 2023

Pith/arXiv arXiv 2023

[15] [15]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

[16] [16]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

[17] [17]

*0:z!0(o)q)Hn Y F & m &5W EAJ A29Imye# OTD'. ]g薄 ΈˊߵUب+ڈ![Ɇ uux dsN 4Y\ #Y gVj0d sS' 6n p :!eKB=0 : O *FWdc6(_X6H!x * nCXE Ѝeg QڒXǱ:RvIsR@ݗ '4 ڂ іa ! X

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...