Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

Ari Holtzman; Kunal Samanta; Peter West

arxiv: 2606.29933 · v1 · pith:LVX44CKBnew · submitted 2026-06-29 · 💻 cs.CL

Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

Kunal Samanta , Ari Holtzman , Peter West This is my paper

Pith reviewed 2026-06-30 06:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM alignmentphase transitionscrystallizationsupervised fine-tuningreinforcement learningrandom number generationpost-training dynamicssampling distributions

0 comments

The pith

Alignment for random generation tasks collapses model behavior onto a single seed distribution from pretraining during supervised fine-tuning, with reinforcement learning only redistributing probabilities within that set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that alignment dynamics can be usefully described using the physical analogy of crystallization from thermodynamics. Pretrained models begin in a high-entropy state with many possible sampling distributions for random tasks. Supervised fine-tuning then nucleates around one seed distribution present in the original model. Reinforcement learning afterward adjusts the odds but does not expand the set of options. A reader would care because this view focuses attention on what alignment can and cannot change, rather than on final benchmark scores alone. The authors validate the phases with entropy-based metrics across several random tasks.

Core claim

For tasks like random number generation, alignment breaks into three phases modeled on material crystallization: the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and the settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution.

What carries the argument

The three-phase crystallization model, which maps LLM sampling distributions to a liquid phase, a nucleation collapse during supervised fine-tuning, and a settling redistribution under reinforcement learning.

If this is right

The set of behaviors available after complete alignment is largely fixed by the seed distribution chosen during supervised fine-tuning.
Entropy and distribution-similarity metrics can track when a model crosses from one phase to the next during post-training.
Reinforcement learning mainly tunes probabilities inside the nucleated set and does not create new options outside it for these tasks.
The source of final aligned structure lies in pretraining content and the choice of fine-tuning seed rather than in later optimization steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the phases hold, alignment methods might improve by deliberately selecting or broadening the seed distribution during supervised fine-tuning instead of relying on later reinforcement learning adjustments.
The crystallization view could extend to other tasks where models select from a large pretrained repertoire, suggesting alignment is more selection than invention.
A direct test would involve forcing different seed distributions through modified supervised fine-tuning and checking whether reinforcement learning can still overcome the resulting concentration.

Load-bearing premise

That the sampling distributions of an LLM during random generation tasks can be meaningfully mapped onto thermodynamic phases of crystallization in a way that reveals causal structure about alignment dynamics rather than serving as a loose metaphor.

What would settle it

If reinforcement learning after supervised fine-tuning is observed to substantially expand or replace the set of options generated by the model, rather than keeping probability concentrated on the seed distribution's options, the three-phase description would not hold.

Figures

Figures reproduced from arXiv: 2606.29933 by Ari Holtzman, Kunal Samanta, Peter West.

**Figure 1.** Figure 1: Crystallization, illustrated on the digit task. Pretrained base LLMs exhibit a high-entropy liquid phase: varied prompts elicit varied output distributions, much like atoms in a liquid moving in many directions at once (left). Supervised fine-tuning (SFT) acts as a nucleation event: regardless of prompt variation, the model’s output distribution collapses rapidly onto a single latent seed distribution alre… view at source ↗

**Figure 2.** Figure 2: Across models and tasks, we see a huge variety of the base model distributions over prefixes. SFT latches [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The metrics defined in §2.4, averaged for the 15 stochastic tasks defined in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Per-task view of the two crystallization metrics across alignment stages for a subset of tasks. (a) MSE [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: ProbMass using the average SFT distribution as a proxy seed. Settling holds even when the base-model [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Inter-family crystallization. A seed identified from one model family’s base predicts nucleation in a completely independent alignment pipeline, suggesting seed selection is a property of the task and data, not the model. tify an "Artificial Hivemind" effect across dozens of LLMs, characterized by severe intra-model repetition and inter-model homogeneity. This extensive homogenization perfectly aligns wit… view at source ↗

**Figure 7.** Figure 7: Crystallization metrics for OLMo 2 12 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Crystallization metrics for Tulu 3 13 [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Task Histograms (OLMo 2) 14 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Task Histograms (Tulu 3) 15 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

The alignment of language models is typically studied through the lens of capability benchmarks, but the dynamics of how models change during post-training remain poorly understood. We argue that the physical sciences, and thermodynamic phase-transition theory in particular, offer a principled and underexplored vocabulary for reasoning about these dynamics. As a case study, we instantiate this position through the lens of material Crystallization, which is a well-studied thermodynamic phase transition. For tasks like random number generation, this breaks into 3 phases: (1) the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; (2) the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and (3) a settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution. We propose intuitive metrics to verify the transitions between these phases, and validate the idea across a range of random tasks. Crystallization is one instance of a broader class of physical frameworks we believe alignment research should import to answer questions about where alignment-induced structure comes from, why it converges where it does, and what it fundamentally cannot change.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a three-phase crystallization analogy for how SFT and RL reshape sampling on random tasks, but the mapping stays descriptive without shown signatures of actual phase transitions.

read the letter

The main takeaway is that the authors map alignment on random number tasks to crystallization: pretrained models start in a high-entropy liquid state with many promptable distributions, SFT nucleates a collapse onto one seed distribution, and RL then settles probability mass while keeping it concentrated on the same options.

The useful part is the reminder that post-training dynamics are under-studied and that physical analogies might help frame questions about what structure alignment adds and what it cannot alter. Framing SFT as the step that locks in a seed and RL as redistribution within that seed is a clean way to separate the two stages.

The soft spot is the lack of evidence that these are genuine phase transitions rather than gradual shifts. The abstract mentions intuitive metrics validated across tasks, yet nothing indicates checks for discontinuous changes in order parameters, critical scaling, or latent-heat equivalents at the claimed boundaries. If the metrics mainly track entropy reduction or mode concentration, the phases risk being a post-hoc naming of SFT-then-RL rather than an independent physical model.

This is for alignment researchers who want conceptual imports from physics to organize thinking about dynamics. It will interest readers looking for new questions more than those needing predictive or falsifiable results right away.

I would send it to peer review. The analogy is worth testing if the full paper can show the metrics capture something non-monotonic or structurally distinctive.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that thermodynamic phase-transition theory, instantiated via a crystallization analogy, offers a principled vocabulary for alignment dynamics in language models. Using random number generation tasks as a case study, it posits three phases: (1) a high-entropy liquid phase in the pretrained model with many distinct promptable sampling distributions, (2) a nucleation phase induced by supervised finetuning that collapses behavior onto a single seed distribution present in the pretrained model, and (3) a settling phase under reinforcement learning that redistributes probability while preserving concentration on the same options. Intuitive metrics are proposed to verify the transitions, with validation claimed across a range of random tasks. The work positions crystallization as one example of physical frameworks that alignment research should import to address questions about the origins, convergence, and limits of alignment-induced structure.

Significance. If the proposed metrics demonstrate thermodynamic-like signatures such as discontinuous order-parameter changes or critical scaling at the claimed transition points rather than monotonic distributional shifts, the framework could supply new causal structure for understanding where alignment-induced behavior originates and what it cannot alter. The manuscript's explicit attempt to import concepts from the physical sciences is a constructive direction if substantiated with reproducible evidence.

major comments (2)

[Abstract] Abstract: The phases are defined directly in terms of observed changes during SFT (nucleation/collapse) and RL (settling/redistribution), without reference to independent order parameters or external benchmarks; this makes the mapping vulnerable to circularity, where the description re-labels training stages rather than deriving an independent physical analogy.
[Abstract] Abstract: The claim that intuitive metrics are proposed and validated across random tasks is not accompanied by any description of the metrics, experimental controls, error bars, or quantitative results, rendering it impossible to assess whether the framework captures non-analytic phase-transition behavior or merely tracks gradual entropy reduction and mode concentration.

minor comments (1)

[Abstract] The abstract would benefit from a brief statement of the specific random tasks used and the form of the proposed metrics to allow readers to evaluate the validation claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important issues of clarity in the abstract regarding phase definitions and the presentation of supporting evidence. We address each point below and indicate where revisions will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The phases are defined directly in terms of observed changes during SFT (nucleation/collapse) and RL (settling/redistribution), without reference to independent order parameters or external benchmarks; this makes the mapping vulnerable to circularity, where the description re-labels training stages rather than deriving an independent physical analogy.

Authors: We agree that the abstract's phrasing ties the phase labels closely to the SFT and RL stages, which risks appearing circular. The crystallization analogy is meant to supply an independent conceptual structure (drawing on thermodynamic concepts of nucleation from a seed and subsequent settling), with the training stages serving only as the empirical mechanisms in this case study. The proposed metrics—entropy of the output distribution, concentration on a small set of modes, and the persistence of other promptable distributions—are intended as measurable order parameters that can be tracked independently of training procedure. We will revise the abstract to foreground these metrics and their independence from the training stages, making the non-circular nature of the analogy explicit. revision: yes
Referee: [Abstract] Abstract: The claim that intuitive metrics are proposed and validated across random tasks is not accompanied by any description of the metrics, experimental controls, error bars, or quantitative results, rendering it impossible to assess whether the framework captures non-analytic phase-transition behavior or merely tracks gradual entropy reduction and mode concentration.

Authors: The abstract is intentionally brief and therefore omits the concrete metric definitions, controls, and results that appear in the body of the manuscript. We acknowledge that this omission makes it difficult for a reader to evaluate the strength of the evidence from the abstract alone. In revision we will expand the abstract to include a concise description of the metrics (entropy, mode concentration, and cross-prompt distribution overlap), note the use of multiple random-generation tasks as controls, and summarize the observed transitions with reference to quantitative patterns, while preserving length constraints. revision: yes

Circularity Check

1 steps flagged

Phases explicitly defined by correspondence to standard training stages (pretrain/SFT/RL) rather than independent thermodynamic signatures

specific steps

renaming known result [Abstract]
"For tasks like random number generation, this breaks into 3 phases: (1) the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; (2) the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and (3) a settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution."

The phases are introduced by direct identification with the training procedures (pretraining, supervised finetuning, reinforcement learning) and with the distributional changes those procedures are already known to produce. The 'crystallization' framework therefore organizes existing observations under new labels without an independent derivation that would distinguish it from a descriptive metaphor.

full rationale

The paper's central case study defines its three crystallization phases directly in terms of the conventional post-training pipeline and the distributional effects already known to occur at each stage. No equations, order parameters, or critical-phenomena tests are shown in the provided text that would derive the phase structure from physical first principles; the mapping therefore functions as a relabeling of observed SFT-induced mode collapse and RL redistribution. This matches the renaming_known_result pattern and produces partial circularity for the claimed 'principled vocabulary,' though the proposal of intuitive metrics could in principle supply independent content if those metrics were shown to detect non-analytic behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that thermodynamic phase transitions provide a non-metaphorical vocabulary for LLM behavior change; no free parameters or invented physical entities are introduced, but the phases themselves function as new descriptive constructs without independent falsifiable handles shown.

axioms (1)

domain assumption Thermodynamic phase-transition theory offers a principled vocabulary for reasoning about alignment dynamics in language models
Invoked in the opening argument of the abstract as the basis for the case study.

invented entities (1)

Crystallization phases (liquid, nucleation, settling) in alignment no independent evidence
purpose: To categorize the progression of model output distributions during post-training
New descriptive categories introduced to map onto SFT and RL stages; no independent evidence or falsifiable prediction outside the analogy is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5759 in / 1398 out tokens · 25482 ms · 2026-06-30T06:37:01.543736+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 9 canonical work pages · 5 internal anchors

[1]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
[2]

arXiv preprint arXiv:2505.00047 , year=

Base models beat aligned models at randomness and creativity , author=. arXiv preprint arXiv:2505.00047 , year=

work page arXiv
[3]

2 OLMo 2 Furious

2 OLMo 2 Furious , author=. arXiv preprint arXiv:2501.00656 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Tulu 3: Pushing frontiers in open language model post-training , author=. arXiv preprint arXiv:2411.15124 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Artificial

Artificial hivemind: The open-ended homogeneity of language models (and beyond) , author=. arXiv preprint arXiv:2510.22954 , year=

work page arXiv
[6]

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. arXiv preprint arXiv:2310.11324 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

arXiv preprint arXiv:2506.17871 , year=

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon , author=. arXiv preprint arXiv:2506.17871 , year=

work page arXiv
[8]

Advances in Neural Information Processing Systems , volume=

Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=
[9]

2025 , month = dec, howpublished =

Hallgren, Jonas , title =. 2025 , month = dec, howpublished =

2025
[10]

The unlocking spell on base llms: Rethinking alignment via in-context learning.arXiv preprint arXiv:2312.01552, 2023

The unlocking spell on base llms: Rethinking alignment via in-context learning , author=. arXiv preprint arXiv:2312.01552 , year=

work page arXiv
[11]

ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

Attributing mode collapse in the fine-tuning of large language models , author=. ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

2024
[12]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

1948
[13]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Language models as agent models , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022
[14]

The Curious Case of Neural Text Degeneration

The curious case of neural text degeneration , author=. arXiv preprint arXiv:1904.09751 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[15]

Advances in Neural Information Processing Systems , volume=

Towards understanding grokking: An effective theory of representation learning , author=. Advances in Neural Information Processing Systems , volume=
[16]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[17]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Predicting structured data , volume=

A tutorial on energy-based learning , author=. Predicting structured data , volume=
[19]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
[20]

Chemical reviews , volume=

Crystal nucleation in liquids: Open questions and future challenges in molecular dynamics simulations , author=. Chemical reviews , volume=. 2016 , publisher=

2016
[21]

Nature , volume=

Supercooled liquids and the glass transition , author=. Nature , volume=. 2001 , publisher=

2001
[22]

Journal of Physics and Chemistry of Solids , year=

The kinetics of precipitation from supersaturated solid solutions , author=. Journal of Physics and Chemistry of Solids , year=
[23]

Crystal Growth & Design , volume=

Role of additives in crystal nucleation from solutions: a review , author=. Crystal Growth & Design , volume=. 2021 , publisher=

2021

[1] [1]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

[2] [2]

arXiv preprint arXiv:2505.00047 , year=

Base models beat aligned models at randomness and creativity , author=. arXiv preprint arXiv:2505.00047 , year=

work page arXiv

[3] [3]

2 OLMo 2 Furious

2 OLMo 2 Furious , author=. arXiv preprint arXiv:2501.00656 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Tulu 3: Pushing frontiers in open language model post-training , author=. arXiv preprint arXiv:2411.15124 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Artificial

Artificial hivemind: The open-ended homogeneity of language models (and beyond) , author=. arXiv preprint arXiv:2510.22954 , year=

work page arXiv

[6] [6]

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. arXiv preprint arXiv:2310.11324 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

arXiv preprint arXiv:2506.17871 , year=

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon , author=. arXiv preprint arXiv:2506.17871 , year=

work page arXiv

[8] [8]

Advances in Neural Information Processing Systems , volume=

Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=

[9] [9]

2025 , month = dec, howpublished =

Hallgren, Jonas , title =. 2025 , month = dec, howpublished =

2025

[10] [10]

The unlocking spell on base llms: Rethinking alignment via in-context learning.arXiv preprint arXiv:2312.01552, 2023

The unlocking spell on base llms: Rethinking alignment via in-context learning , author=. arXiv preprint arXiv:2312.01552 , year=

work page arXiv

[11] [11]

ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

Attributing mode collapse in the fine-tuning of large language models , author=. ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

2024

[12] [12]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

1948

[13] [13]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Language models as agent models , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022

[14] [14]

The Curious Case of Neural Text Degeneration

The curious case of neural text degeneration , author=. arXiv preprint arXiv:1904.09751 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[15] [15]

Advances in Neural Information Processing Systems , volume=

Towards understanding grokking: An effective theory of representation learning , author=. Advances in Neural Information Processing Systems , volume=

[16] [16]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[17] [17]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Predicting structured data , volume=

A tutorial on energy-based learning , author=. Predicting structured data , volume=

[19] [19]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

[20] [20]

Chemical reviews , volume=

Crystal nucleation in liquids: Open questions and future challenges in molecular dynamics simulations , author=. Chemical reviews , volume=. 2016 , publisher=

2016

[21] [21]

Nature , volume=

Supercooled liquids and the glass transition , author=. Nature , volume=. 2001 , publisher=

2001

[22] [22]

Journal of Physics and Chemistry of Solids , year=

The kinetics of precipitation from supersaturated solid solutions , author=. Journal of Physics and Chemistry of Solids , year=

[23] [23]

Crystal Growth & Design , volume=

Role of additives in crystal nucleation from solutions: a review , author=. Crystal Growth & Design , volume=. 2021 , publisher=

2021