Memorization, Emergence, and Explaining Reversal Failures: A Controlled Study of Relational Semantics in LLMs
Pith reviewed 2026-05-16 17:57 UTC · model grok-4.3
The pith
Autoregressive LLMs acquire logical relational semantics such as symmetry and inversion once given enough logic-bearing supervision, even in shallow models, but reversal failures arise mainly from left-to-right order bias rather than any欠缺在
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Relational semantics emerge with sufficient logic-bearing supervision, even in shallow (2-3 layer) models, and successful generalization aligns with stable intermediate-layer signals; order-matched forward/reverse tests and a diffusion baseline indicate that reversal failures are primarily driven by autoregressive order bias rather than deficient inversion semantics.
What carries the argument
Controlled Knowledge Graph-based synthetic framework that generates text from symmetric and inverse triples
If this is right
- Relational semantics appear in models as shallow as two or three layers once the training data supplies explicit logical structure.
- Generalization to unseen entities tracks the emergence of stable representations in intermediate layers.
- Reversal errors persist even when inversion semantics are present because the autoregressive objective favors the order seen during training.
- A non-autoregressive baseline trained on the same data exhibits fewer reversal failures, confirming the role of generation order.
Where Pith is reading between the lines
- Prompting strategies that force reverse order during inference may reduce reversal errors without any change to model weights.
- The same synthetic-data recipe could be used to test whether other logical properties such as transitivity emerge under controlled supervision.
- Architectures that are not strictly left-to-right may reach the same relational competence with less data once the order bias is removed.
Load-bearing premise
The synthetic text generated from knowledge-graph triples encodes pure logical relations without accidental surface patterns that would make model success or failure unrepresentative of real relational understanding.
What would settle it
Train the same models on data whose logical structure is deliberately broken while keeping all other statistics identical; if the phase transition and correct reversal performance disappear, the claim that supervision alone suffices is falsified.
read the original abstract
Autoregressive LLMs perform well on relational tasks that require linking entities via relational words (e.g., father/son, friend), but it is unclear whether they learn the logical semantics of such relations (e.g., symmetry and inversion logic) and, if so, whether reversal-type failures arise from missing relational semantics or left-to-right order bias. We propose a controlled Knowledge Graph-based synthetic framework that generates text from symmetric/inverse triples, train GPT-style autoregressive models from scratch, and evaluate memorization, logical inference, and in-context generalization to unseen entities to address these questions. We find a sharp phase transition in which relational semantics emerge with sufficient logic-bearing supervision, even in shallow (2-3 layer) models, and that successful generalization aligns with stable intermediate-layer signals. Finally, order-matched forward/reverse tests and a diffusion baseline indicate that reversal failures are primarily driven by autoregressive order bias rather than deficient inversion semantics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates whether autoregressive LLMs learn logical relational semantics (symmetry, inversion) or if reversal failures stem from order bias. It introduces a controlled synthetic framework generating text from KG triples with symmetric/inverse relations, trains GPT-style models from scratch, and evaluates memorization, logical inference, and in-context generalization to unseen entities. Key results include a sharp phase transition for emergence of relational semantics even in 2-3 layer models under sufficient logic-bearing supervision, alignment of successful generalization with stable intermediate-layer signals, and evidence from order-matched forward/reverse tests plus a diffusion baseline that reversal failures arise primarily from autoregressive order bias rather than deficient inversion semantics.
Significance. If the results hold under full verification, the work offers a valuable controlled demonstration that relational logic can emerge in shallow models via targeted supervision, providing a mechanistic account of reversal failures that separates semantic acquisition from architectural bias. This has direct implications for LLM training, evaluation of reasoning, and understanding phase transitions in capability emergence. The synthetic setup's reproducibility and focus on falsifiable generalization tests strengthen its potential contribution to the field.
major comments (2)
- [Abstract/Methods] Abstract and Methods: The abstract claims support for phase-transition and order-bias findings, but reports no quantitative metrics, error bars, exact supervision thresholds, control conditions, or data-exclusion criteria. This prevents assessment of effect sizes and robustness for the emergence claim in 2-3 layer models.
- [Synthetic Data Generation] Synthetic Data Generation (implied in framework description): The central claim that reversal failures are due to order bias (not missing semantics) and that semantics emerge faithfully rests on the assumption that KG verbalization encodes logical relations without positional, lexical, or template artifacts. If such cues enable unseen-entity success without true inversion logic, the order-matched tests and diffusion baseline cannot isolate the mechanism.
minor comments (1)
- [Figures] Add explicit error bars, statistical tests, and layer-wise activation plots with legends to all figures reporting phase transitions and intermediate signals.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment. We address each major comment below and will revise the manuscript to incorporate additional quantitative details and clarifications where appropriate.
read point-by-point responses
-
Referee: [Abstract/Methods] Abstract and Methods: The abstract claims support for phase-transition and order-bias findings, but reports no quantitative metrics, error bars, exact supervision thresholds, control conditions, or data-exclusion criteria. This prevents assessment of effect sizes and robustness for the emergence claim in 2-3 layer models.
Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version we will add the key supervision threshold at which the phase transition occurs (approximately 40% logic-bearing data for 3-layer models), report mean accuracies with standard deviations across 5 random seeds, and briefly describe the main control conditions (logic-bearing vs. random supervision, fixed vs. randomized templates). These values and error bars already appear in Section 4 and Figures 2–3; we will summarize the most salient ones in the abstract for immediate assessment of effect size and robustness. revision: yes
-
Referee: [Synthetic Data Generation] Synthetic Data Generation (implied in framework description): The central claim that reversal failures are due to order bias (not missing semantics) and that semantics emerge faithfully rests on the assumption that KG verbalization encodes logical relations without positional, lexical, or template artifacts. If such cues enable unseen-entity success without true inversion logic, the order-matched tests and diffusion baseline cannot isolate the mechanism.
Authors: We share the concern that template artifacts could confound the results. The framework employs a small set of minimal, symmetric templates with entity names sampled from a large disjoint pool and relation phrases that are lexically distinct yet logically consistent. We already include an ablation that randomizes word order within templates and finds no degradation in unseen-entity generalization. The diffusion baseline further isolates the autoregressive component because it lacks left-to-right generation yet reproduces the same reversal asymmetry when order is mismatched. In the revision we will expand the Methods section with the complete template list, the exact randomization procedure, and the full results of the order-permutation ablation to make these controls fully transparent. revision: partial
Circularity Check
No significant circularity; derivation relies on new synthetic data and fresh experiments
full rationale
The paper's central claims about phase transitions in relational semantics emergence and reversal failures being driven by autoregressive order bias are supported by a newly introduced controlled KG-based synthetic text generation framework, from-scratch training of GPT-style models, and direct evaluations on memorization, inference, and generalization to unseen entities. These steps do not reduce to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations; the synthetic data generation and experimental results constitute independent content that can be inspected and reproduced externally. No equations or premises in the abstract or described chain collapse by construction to prior fitted quantities or author-overlapping uniqueness theorems.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic data from KG triples accurately represents relational logic like symmetry and inversion without introducing unintended statistical cues.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.