BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
Understanding reasoning ability of language models from the perspective of reasoning paths aggregation
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
RLVR drives a concept network in LLMs through nucleation and freezing into inverse trees that support slow thinking, and intervening with brief SFT at peak frustration outperforms standard RLVR while post-freeze SFT causes forgetting.
citing papers explorer
-
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning
BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALBERT rescued by mixed-data training.
-
Emergent Slow Thinking in LLMs as Inverse Tree Freezing
RLVR drives a concept network in LLMs through nucleation and freezing into inverse trees that support slow thinking, and intervening with brief SFT at peak frustration outperforms standard RLVR while post-freeze SFT causes forgetting.