Physics of language models: Part 1, context-free gram- mar

[AZL23] Zeyuan Allen-Zhu, Yuanzhi Li · 2023 · arXiv 2305.13673

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

cs.CL · 2026-04-25 · unverdicted · novelty 7.0

Fine-tuning shows higher proficiency than in-context learning on in-distribution generalization in formal languages, with equal out-of-distribution performance and diverging inductive biases at high proficiency.

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.

Diagnosing CFG Interpretation in LLMs

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

Textbooks Are All You Need

cs.CL · 2023-06-20 · unverdicted · novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

citing papers explorer

Showing 4 of 4 citing papers.

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · unverdicted · none · ref 4
Fine-tuning shows higher proficiency than in-context learning on in-distribution generalization in formal languages, with equal out-of-distribution performance and diverging inductive biases at high proficiency.
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 1
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
Diagnosing CFG Interpretation in LLMs cs.AI · 2026-04-22 · unverdicted · none · ref 2
LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.
Textbooks Are All You Need cs.CL · 2023-06-20 · unverdicted · none · ref 4
A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

Physics of language models: Part 1, context-free gram- mar

fields

years

verdicts

representative citing papers

citing papers explorer