Lora without regret

John Schulman, Thinking Machines Lab · 2025 · DOI 10.64434/tml.20250929

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Fine-Tuning Small Reasoning Models for Quantum Field Theory

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.

PreFT: Prefill-only finetuning for efficient inference

cs.LG · 2026-05-14 · accept · novelty 6.0

Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Dr. Post-Training reframes general data as a data-induced regularizer for LLM post-training updates, yielding a family of methods that outperform data-selection baselines on SFT, RLHF, and RLVR tasks.

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.

AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems

cs.LG · 2026-04-18 · unverdicted · novelty 6.0

AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

RLVR exhibits implicit reward overfitting to training data and optimizes heavy-tailed singular spectra with rank-1 focus on reasoning capability.

Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

cs.CL · 2026-05-01 · unverdicted · novelty 4.0

A 14B model trained on synthetic data from Brazilian clinical guidelines outperforms larger LLMs on new benchmarks for Brazilian healthcare protocols.

citing papers explorer

Showing 8 of 8 citing papers.

Fine-Tuning Small Reasoning Models for Quantum Field Theory cs.LG · 2026-04-21 · unverdicted · none · ref 218
Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
PreFT: Prefill-only finetuning for efficient inference cs.LG · 2026-05-14 · accept · none · ref 35
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation cs.LG · 2026-05-12 · unverdicted · none · ref 74
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training cs.LG · 2026-05-08 · unverdicted · none · ref 186
Dr. Post-Training reframes general data as a data-induced regularizer for LLM post-training updates, yielding a family of methods that outperform data-selection baselines on SFT, RLHF, and RLVR tasks.
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment cs.AI · 2026-05-05 · unverdicted · none · ref 120
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems cs.LG · 2026-04-18 · unverdicted · none · ref 52
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.
On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR cs.LG · 2026-05-07 · unverdicted · none · ref 26
RLVR exhibits implicit reward overfitting to training data and optimizes heavy-tailed singular spectra with rank-1 focus on reasoning capability.
Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines cs.CL · 2026-05-01 · unverdicted · none · ref 22
A 14B model trained on synthetic data from Brazilian clinical guidelines outperforms larger LLMs on new benchmarks for Brazilian healthcare protocols.

Lora without regret

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer