pith. machine review for the scientific record. sign in

Understanding llm behaviors via compression: Data generation, knowledge acquisition and scaling laws

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 2

years

2026 2

representative citing papers

Truth as a Compression Artifact in Language Model Training

cs.CL · 2026-03-12 · unverdicted · novelty 6.0

Controlled experiments show language models extract correct answers from contradictory data only when errors are structurally incoherent, supporting the hypothesis that gradient descent selects the most compressible answer cluster.

citing papers explorer

Showing 2 of 2 citing papers.

  • Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts cs.CL · 2026-04-09 · conditional · none · ref 66

    Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.

  • Truth as a Compression Artifact in Language Model Training cs.CL · 2026-03-12 · unverdicted · none · ref 7

    Controlled experiments show language models extract correct answers from contradictory data only when errors are structurally incoherent, supporting the hypothesis that gradient descent selects the most compressible answer cluster.