pith. sign in

arxiv: 2606.29933 · v1 · pith:LVX44CKBnew · submitted 2026-06-29 · 💻 cs.CL

Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

Pith reviewed 2026-06-30 06:37 UTC · model grok-4.3

classification 💻 cs.CL
keywords LLM alignmentphase transitionscrystallizationsupervised fine-tuningreinforcement learningrandom number generationpost-training dynamicssampling distributions
0
0 comments X

The pith

Alignment for random generation tasks collapses model behavior onto a single seed distribution from pretraining during supervised fine-tuning, with reinforcement learning only redistributing probabilities within that set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that alignment dynamics can be usefully described using the physical analogy of crystallization from thermodynamics. Pretrained models begin in a high-entropy state with many possible sampling distributions for random tasks. Supervised fine-tuning then nucleates around one seed distribution present in the original model. Reinforcement learning afterward adjusts the odds but does not expand the set of options. A reader would care because this view focuses attention on what alignment can and cannot change, rather than on final benchmark scores alone. The authors validate the phases with entropy-based metrics across several random tasks.

Core claim

For tasks like random number generation, alignment breaks into three phases modeled on material crystallization: the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and the settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution.

What carries the argument

The three-phase crystallization model, which maps LLM sampling distributions to a liquid phase, a nucleation collapse during supervised fine-tuning, and a settling redistribution under reinforcement learning.

If this is right

  • The set of behaviors available after complete alignment is largely fixed by the seed distribution chosen during supervised fine-tuning.
  • Entropy and distribution-similarity metrics can track when a model crosses from one phase to the next during post-training.
  • Reinforcement learning mainly tunes probabilities inside the nucleated set and does not create new options outside it for these tasks.
  • The source of final aligned structure lies in pretraining content and the choice of fine-tuning seed rather than in later optimization steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the phases hold, alignment methods might improve by deliberately selecting or broadening the seed distribution during supervised fine-tuning instead of relying on later reinforcement learning adjustments.
  • The crystallization view could extend to other tasks where models select from a large pretrained repertoire, suggesting alignment is more selection than invention.
  • A direct test would involve forcing different seed distributions through modified supervised fine-tuning and checking whether reinforcement learning can still overcome the resulting concentration.

Load-bearing premise

That the sampling distributions of an LLM during random generation tasks can be meaningfully mapped onto thermodynamic phases of crystallization in a way that reveals causal structure about alignment dynamics rather than serving as a loose metaphor.

What would settle it

If reinforcement learning after supervised fine-tuning is observed to substantially expand or replace the set of options generated by the model, rather than keeping probability concentrated on the seed distribution's options, the three-phase description would not hold.

Figures

Figures reproduced from arXiv: 2606.29933 by Ari Holtzman, Kunal Samanta, Peter West.

Figure 1
Figure 1. Figure 1: Crystallization, illustrated on the digit task. Pretrained base LLMs exhibit a high-entropy liquid phase: varied prompts elicit varied output distributions, much like atoms in a liquid moving in many directions at once (left). Supervised fine-tuning (SFT) acts as a nucleation event: regardless of prompt variation, the model’s output distribution collapses rapidly onto a single latent seed distribution alre… view at source ↗
Figure 2
Figure 2. Figure 2: Across models and tasks, we see a huge variety of the base model distributions over prefixes. SFT latches [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The metrics defined in §2.4, averaged for the 15 stochastic tasks defined in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-task view of the two crystallization metrics across alignment stages for a subset of tasks. (a) MSE [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: ProbMass using the average SFT distribution as a proxy seed. Settling holds even when the base-model [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Inter-family crystallization. A seed identified from one model family’s base predicts nucleation in a completely independent alignment pipeline, suggesting seed selection is a property of the task and data, not the model. tify an "Artificial Hivemind" effect across dozens of LLMs, characterized by severe intra-model repe￾tition and inter-model homogeneity. This extensive homogenization perfectly aligns wit… view at source ↗
Figure 7
Figure 7. Figure 7: Crystallization metrics for OLMo 2 12 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Crystallization metrics for Tulu 3 13 [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Task Histograms (OLMo 2) 14 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Task Histograms (Tulu 3) 15 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
read the original abstract

The alignment of language models is typically studied through the lens of capability benchmarks, but the dynamics of how models change during post-training remain poorly understood. We argue that the physical sciences, and thermodynamic phase-transition theory in particular, offer a principled and underexplored vocabulary for reasoning about these dynamics. As a case study, we instantiate this position through the lens of material Crystallization, which is a well-studied thermodynamic phase transition. For tasks like random number generation, this breaks into 3 phases: (1) the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; (2) the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and (3) a settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution. We propose intuitive metrics to verify the transitions between these phases, and validate the idea across a range of random tasks. Crystallization is one instance of a broader class of physical frameworks we believe alignment research should import to answer questions about where alignment-induced structure comes from, why it converges where it does, and what it fundamentally cannot change.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that thermodynamic phase-transition theory, instantiated via a crystallization analogy, offers a principled vocabulary for alignment dynamics in language models. Using random number generation tasks as a case study, it posits three phases: (1) a high-entropy liquid phase in the pretrained model with many distinct promptable sampling distributions, (2) a nucleation phase induced by supervised finetuning that collapses behavior onto a single seed distribution present in the pretrained model, and (3) a settling phase under reinforcement learning that redistributes probability while preserving concentration on the same options. Intuitive metrics are proposed to verify the transitions, with validation claimed across a range of random tasks. The work positions crystallization as one example of physical frameworks that alignment research should import to address questions about the origins, convergence, and limits of alignment-induced structure.

Significance. If the proposed metrics demonstrate thermodynamic-like signatures such as discontinuous order-parameter changes or critical scaling at the claimed transition points rather than monotonic distributional shifts, the framework could supply new causal structure for understanding where alignment-induced behavior originates and what it cannot alter. The manuscript's explicit attempt to import concepts from the physical sciences is a constructive direction if substantiated with reproducible evidence.

major comments (2)
  1. [Abstract] Abstract: The phases are defined directly in terms of observed changes during SFT (nucleation/collapse) and RL (settling/redistribution), without reference to independent order parameters or external benchmarks; this makes the mapping vulnerable to circularity, where the description re-labels training stages rather than deriving an independent physical analogy.
  2. [Abstract] Abstract: The claim that intuitive metrics are proposed and validated across random tasks is not accompanied by any description of the metrics, experimental controls, error bars, or quantitative results, rendering it impossible to assess whether the framework captures non-analytic phase-transition behavior or merely tracks gradual entropy reduction and mode concentration.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the specific random tasks used and the form of the proposed metrics to allow readers to evaluate the validation claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important issues of clarity in the abstract regarding phase definitions and the presentation of supporting evidence. We address each point below and indicate where revisions will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The phases are defined directly in terms of observed changes during SFT (nucleation/collapse) and RL (settling/redistribution), without reference to independent order parameters or external benchmarks; this makes the mapping vulnerable to circularity, where the description re-labels training stages rather than deriving an independent physical analogy.

    Authors: We agree that the abstract's phrasing ties the phase labels closely to the SFT and RL stages, which risks appearing circular. The crystallization analogy is meant to supply an independent conceptual structure (drawing on thermodynamic concepts of nucleation from a seed and subsequent settling), with the training stages serving only as the empirical mechanisms in this case study. The proposed metrics—entropy of the output distribution, concentration on a small set of modes, and the persistence of other promptable distributions—are intended as measurable order parameters that can be tracked independently of training procedure. We will revise the abstract to foreground these metrics and their independence from the training stages, making the non-circular nature of the analogy explicit. revision: yes

  2. Referee: [Abstract] Abstract: The claim that intuitive metrics are proposed and validated across random tasks is not accompanied by any description of the metrics, experimental controls, error bars, or quantitative results, rendering it impossible to assess whether the framework captures non-analytic phase-transition behavior or merely tracks gradual entropy reduction and mode concentration.

    Authors: The abstract is intentionally brief and therefore omits the concrete metric definitions, controls, and results that appear in the body of the manuscript. We acknowledge that this omission makes it difficult for a reader to evaluate the strength of the evidence from the abstract alone. In revision we will expand the abstract to include a concise description of the metrics (entropy, mode concentration, and cross-prompt distribution overlap), note the use of multiple random-generation tasks as controls, and summarize the observed transitions with reference to quantitative patterns, while preserving length constraints. revision: yes

Circularity Check

1 steps flagged

Phases explicitly defined by correspondence to standard training stages (pretrain/SFT/RL) rather than independent thermodynamic signatures

specific steps
  1. renaming known result [Abstract]
    "For tasks like random number generation, this breaks into 3 phases: (1) the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; (2) the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and (3) a settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution."

    The phases are introduced by direct identification with the training procedures (pretraining, supervised finetuning, reinforcement learning) and with the distributional changes those procedures are already known to produce. The 'crystallization' framework therefore organizes existing observations under new labels without an independent derivation that would distinguish it from a descriptive metaphor.

full rationale

The paper's central case study defines its three crystallization phases directly in terms of the conventional post-training pipeline and the distributional effects already known to occur at each stage. No equations, order parameters, or critical-phenomena tests are shown in the provided text that would derive the phase structure from physical first principles; the mapping therefore functions as a relabeling of observed SFT-induced mode collapse and RL redistribution. This matches the renaming_known_result pattern and produces partial circularity for the claimed 'principled vocabulary,' though the proposal of intuitive metrics could in principle supply independent content if those metrics were shown to detect non-analytic behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that thermodynamic phase transitions provide a non-metaphorical vocabulary for LLM behavior change; no free parameters or invented physical entities are introduced, but the phases themselves function as new descriptive constructs without independent falsifiable handles shown.

axioms (1)
  • domain assumption Thermodynamic phase-transition theory offers a principled vocabulary for reasoning about alignment dynamics in language models
    Invoked in the opening argument of the abstract as the basis for the case study.
invented entities (1)
  • Crystallization phases (liquid, nucleation, settling) in alignment no independent evidence
    purpose: To categorize the progression of model output distributions during post-training
    New descriptive categories introduced to map onto SFT and RL stages; no independent evidence or falsifiable prediction outside the analogy is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5759 in / 1398 out tokens · 25482 ms · 2026-06-30T06:37:01.543736+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 9 canonical work pages · 5 internal anchors

  1. [1]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  2. [2]

    arXiv preprint arXiv:2505.00047 , year=

    Base models beat aligned models at randomness and creativity , author=. arXiv preprint arXiv:2505.00047 , year=

  3. [3]

    2 OLMo 2 Furious

    2 OLMo 2 Furious , author=. arXiv preprint arXiv:2501.00656 , year=

  4. [4]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Tulu 3: Pushing frontiers in open language model post-training , author=. arXiv preprint arXiv:2411.15124 , year=

  5. [5]

    Artificial

    Artificial hivemind: The open-ended homogeneity of language models (and beyond) , author=. arXiv preprint arXiv:2510.22954 , year=

  6. [6]

    Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

    Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. arXiv preprint arXiv:2310.11324 , year=

  7. [7]

    arXiv preprint arXiv:2506.17871 , year=

    LLM Probability Concentration: How Alignment Shrinks the Generative Horizon , author=. arXiv preprint arXiv:2506.17871 , year=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    2025 , month = dec, howpublished =

    Hallgren, Jonas , title =. 2025 , month = dec, howpublished =

  10. [10]

    The unlocking spell on base llms: Rethinking alignment via in-context learning.arXiv preprint arXiv:2312.01552, 2023

    The unlocking spell on base llms: Rethinking alignment via in-context learning , author=. arXiv preprint arXiv:2312.01552 , year=

  11. [11]

    ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

    Attributing mode collapse in the fine-tuning of large language models , author=. ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models , volume=

  12. [12]

    The Bell system technical journal , volume=

    A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

  13. [13]

    Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

    Language models as agent models , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

  14. [14]

    The Curious Case of Neural Text Degeneration

    The curious case of neural text degeneration , author=. arXiv preprint arXiv:1904.09751 , year=

  15. [15]

    Advances in Neural Information Processing Systems , volume=

    Towards understanding grokking: An effective theory of representation learning , author=. Advances in Neural Information Processing Systems , volume=

  16. [16]

    Advances in neural information processing systems , volume=

    Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

  17. [17]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

  18. [18]

    Predicting structured data , volume=

    A tutorial on energy-based learning , author=. Predicting structured data , volume=

  19. [19]

    Advances in neural information processing systems , volume=

    Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

  20. [20]

    Chemical reviews , volume=

    Crystal nucleation in liquids: Open questions and future challenges in molecular dynamics simulations , author=. Chemical reviews , volume=. 2016 , publisher=

  21. [21]

    Nature , volume=

    Supercooled liquids and the glass transition , author=. Nature , volume=. 2001 , publisher=

  22. [22]

    Journal of Physics and Chemistry of Solids , year=

    The kinetics of precipitation from supersaturated solid solutions , author=. Journal of Physics and Chemistry of Solids , year=

  23. [23]

    Crystal Growth & Design , volume=

    Role of additives in crystal nucleation from solutions: a review , author=. Crystal Growth & Design , volume=. 2021 , publisher=