In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp

Stolfo, A · 2023 · DOI 10.18653/v1/2023.emnlp-main.435

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.

Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 4 of 4 citing papers.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 30
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts cs.AI · 2026-05-01 · unverdicted · none · ref 30
Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.
Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer cs.LG · 2026-05-21 · unverdicted · none · ref 22
Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 282
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp

fields

years

verdicts

representative citing papers

citing papers explorer