hub Tool reference

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan · 2023 · cs.CL · arXiv 2304.01373

Tool reference. 70% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

39 Pith papers citing it

Method reference 70% of classified citations

open full Pith review browse 39 citing papers arXiv PDF

abstract

How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{https://github.com/EleutherAI/pythia}.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 5 background 3 method 2

citation-polarity summary

use dataset 5 background 3 use method 2

representative citing papers

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

cs.AI · 2026-06-04 · conditional · novelty 8.0

Derives an exact telescoping decomposition of the naive RLVR reward-design estimator into null, elicitation, and reward-design terms on a tabular-GRPO simulator, measures the components across prior strengths, and validates via pre-registered factorial experiments plus re-audits of prior papers.

Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

The same composed tasks are realized by different attention-head patterns in different models when the same selectivity-plus-ablation protocol is applied.

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Archetypal SAEs appear stable only because of shared deterministic initialization; removing it eliminates any stabilization benefit from the archetypal constraint.

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

cs.CL · 2026-05-31 · conditional · novelty 7.0

Low-resource safety failures are action failures because the harmfulness representation transfers but the decision calibration does not; this is fixed by recalibrating a high-resource gate with 1-4 target-language examples.

Large Language Model Selection with Limited Annotations

cs.CL · 2026-05-24 · unverdicted · novelty 7.0

SELECT-LLM is the first active model selection framework for LLMs that uses expected information gain from pairwise output similarities to minimize required annotations, reporting up to 84.78% cost reduction across 23 datasets and 156 models.

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

Transformers trained from different random seeds exhibit residual-stream polymorphism that is exactly a uniform random rotation, which a Procrustes alignment removes to transfer SAEs and steering vectors.

Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Temporarily reducing the learning rate on upper-layer query and key projections during early GPT pretraining prevents premature attention specialization and improves model performance.

Theoretical Limits of Language Model Alignment

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.

Layer Collapse in Diffusion Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.

DDO-RM: Distribution-Level Policy Improvement after Reward Learning

stat.ML · 2026-04-13 · unverdicted · novelty 7.0

DDO-RM turns reward scores into a target distribution and applies KL-regularized mirror-descent projection on finite candidates to improve policies, outperforming DPO on Pythia-410M.

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums

cs.AI · 2026-02-04 · unverdicted · novelty 7.0

A new sequential interaction framework lets LLMs propose questions to forums, with simulations on real Stack Exchange data showing players can reach roughly half the utility of an ideal full-information scenario despite incentive misalignment.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

QLoRA: Efficient Finetuning of Quantized LLMs

cs.LG · 2023-05-23 · conditional · novelty 7.0

QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

cs.AI · 2026-06-02 · conditional · novelty 6.0

Systematic tests of LLM contamination detectors across 27 models show frequent failures from distribution shift and scale, concluding statistical methods cannot replace transparent data provenance.

Child-directed speech facilitates production, not comprehension, in BabyLMs

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.

Motion-Compensated Weight Compression

cs.CV · 2026-05-23 · unverdicted · novelty 6.0

MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

The Shannon Scaling Law treats LLM training as noisy-channel transmission and predicts U-shaped performance degradation when signal-to-noise ratio falls below a threshold, outperforming monotonic scaling laws on Pythia and OLMo2 data.

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Causal dimensionality kappa of transformer layers grows sub-linearly with SAE width, remains invariant to model scale, and stays constant across depth while attribution thresholds drop sharply.

Feature Starvation as Geometric Instability in Sparse Autoencoders

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.

Finite-Size Gradient Transport in Large Language Model Pretraining: From Cascade Size to Intensive Transport Efficiency

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

A gradient-transport framework with observables D, z, β, δ, v_rel applied to Pico-LM and Pythia datasets shows distinct scaling regimes in duration and efficiency while sharing a near-unity cascade-size backbone.

Diversity in Large Language Models under Supervised Fine-Tuning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.

Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

A unified incentive-score decomposition of preference optimization reveals the disentanglement band condition and reward calibration method that enables suppressing losers while preserving winners in LLM training.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 6 · internal anchor
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling cs.LG · 2024-07-31 · unverdicted · none · ref 9 · internal anchor
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer