Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
-
Understanding the Mechanism of Altruism in Large Language Models
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.
-
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.