pith. sign in

Physics of language models: Part 1, learning hierarchical language structures

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

years

2026 7 2023 1

polarities

unclear 2 use method 1

clear filters

representative citing papers

Smooth Scaling Laws Hide Stepwise Token Learning

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Token loss trajectories follow localized sigmoids whose learning-time spectrum quantitatively reconstructs scaling-law derivatives on T, D, and M axes and enables faster training via distribution reshaping.

Invariant Gradient Alignment for Robust Reasoning Distillation

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Invariant Gradient Alignment uses Logical Isomer Sets and a Continuous Gradient Conflict Mask to tighten OOD generalization bounds and boost empirical performance over ERM in reasoning distillation.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Diagnosing CFG Interpretation in LLMs

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

Textbooks Are All You Need

cs.CL · 2023-06-20 · unverdicted · novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

citing papers explorer

Showing 1 of 1 citing paper after filters.