Tina: Tiny reasoning models via LoRA.arXiv preprint arXiv:2504.15777, 2025b

Shangshang Wang, Julian Asilis, ¨Omer Faruk Akg¨ul, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger · 2025 · arXiv 2504.15777

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

representative citing papers

Fine-Tuning Small Reasoning Models for Quantum Field Theory

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

ALTO accelerates LoRA tuning up to 13.8x by monitoring loss trajectories for early stopping, using fused grouped GEMM with rank-local adapter parallelism, and combining intra- and inter-task scheduling for heterogeneous workloads without quality loss.

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magnitude lower cost.

Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

Mixed-complexity procedural datasets provide up to 5x sample efficiency for RLVR on small models in low-data regimes, with low-to-high complexity generalization observed across counting, graph, and spatial tasks.

PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.

Low-Rank Adaptation Redux for Large Models

cs.LG · 2026-04-23 · unverdicted · novelty 3.0

An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.

citing papers explorer

Showing 6 of 6 citing papers.

Fine-Tuning Small Reasoning Models for Quantum Field Theory cs.LG · 2026-04-21 · unverdicted · none · ref 217
Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads cs.LG · 2026-04-07 · unverdicted · none · ref 47
ALTO accelerates LoRA tuning up to 13.8x by monitoring loss trajectories for early stopping, using fused grouped GEMM with rank-local adapter parallelism, and combining intra- and inter-task scheduling for heterogeneous workloads without quality loss.
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning cs.CL · 2026-05-07 · unverdicted · none · ref 19 · 2 links
RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magnitude lower cost.
Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes cs.AI · 2026-04-20 · unverdicted · none · ref 14
Mixed-complexity procedural datasets provide up to 5x sample efficiency for RLVR on small models in low-data regimes, with low-to-high complexity generalization observed across counting, graph, and spatial tasks.
PubSwap: Public-Data Off-Policy Coordination for Federated RLVR cs.LG · 2026-04-14 · unverdicted · none · ref 22
PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.
Low-Rank Adaptation Redux for Large Models cs.LG · 2026-04-23 · unverdicted · none · ref 199
An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.

Tina: Tiny reasoning models via LoRA.arXiv preprint arXiv:2504.15777, 2025b

fields

years

verdicts

representative citing papers

citing papers explorer