Proceedings of the 41st International Conference on Machine Learning , pages=

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model , author= · 2024 · arXiv 2402.07757

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Mechanisms of Misgeneralization in Physical Sequence Modeling

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.

citing papers explorer

Showing 3 of 3 citing papers.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation cs.HC · 2026-05-26 · unverdicted · none · ref 77
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Mechanisms of Misgeneralization in Physical Sequence Modeling cs.LG · 2026-05-19 · unverdicted · none · ref 67
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 7
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.

Proceedings of the 41st International Conference on Machine Learning , pages=

fields

years

verdicts

representative citing papers

citing papers explorer