Relational reasoning and inductive bias in transformers and large language models

· 2025 · cs.LG · arXiv 2506.04289

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Transformer-based models have demonstrated remarkable reasoning abilities, but the mechanisms underlying relational reasoning remain poorly understood. We investigate how transformers perform \textit{transitive inference}, a classic relational reasoning behavior from psychology which elicits inference about indirectly related items (e.g., if $A > B$ and $B > C$, then $A > C$). We compare in-weights learning (IWL) and in-context learning (ICL) behaviors and mechanisms on these tasks, and fine profoundly different patterns of generalization. IWL models learn a linear embedding, which leads to transitive inference as well as other behavioral effects present in humans and animals. ICL models, in contrast, are capable of learning to generalize transitively, but only do so when it is necessitated by the training data, otherwise learning a match-and-copy strategy. Interestingly, pre-training ICL models on in-context linear regression tasks that provide them with a latent linear representation is sufficient to make the ICL behaviors and internal representations qualitatively and quantitatively more like IWL. In order to test whether the same inference patterns are present across in large language models, we leverage a congruency paradigm which allows us to differentially probe IWL and ICL generalization patterns without access to their training data. We indeed see IWL reasoning leads to more transitive generalization than ICL. Moreover, we find that prompting the ICL models to use a linear mental map led to increased transitive inference over different geometric prompts. Together, these results reveal that both the training regime and the geometric structure of induced representations critically determine transformers capacity for transitive inference.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A mathematical theory of balancing relational generalization and memorization

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Introduces transitive inference with exceptions task and analytically shows kernel ridge regression balances relational generalization and memorization depending on representational geometry, with validation in finetuned language models.

Deep sequence models tend to memorize geometrically; it is unclear why

cs.LG · 2025-10-30 · unverdicted · novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

citing papers explorer

Showing 2 of 2 citing papers.

A mathematical theory of balancing relational generalization and memorization cs.LG · 2026-05-21 · unverdicted · none · ref 22 · internal anchor
Introduces transitive inference with exceptions task and analytically shows kernel ridge regression balances relational generalization and memorization depending on representational geometry, with validation in finetuned language models.
Deep sequence models tend to memorize geometrically; it is unclear why cs.LG · 2025-10-30 · unverdicted · none · ref 49 · internal anchor
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

Relational reasoning and inductive bias in transformers and large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer