Future Lens: Anticipating Subsequent Tokens from a Single Hidden State , url=

Pal, Koyena, Sun, Jiuding, Yuan, Andrew, Wallace, Byron, Bau, David , year= · 2023 · DOI 10.18653/v1/2023.conll-1.37

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

Predicting Future Behaviors in Reasoning Models Enables Better Steering

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

Probes predicting future behaviors from intermediate steps enable Future Probe Controlled Generation for steering large reasoning models with minimal quality degradation.

PRISM: Recovering Instruction Sets from Language Model Activations

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

PRISM is a new activation-conditioned model that recovers full sets of simultaneous instructions from LLM hidden states via judge-guided GRPO training and outperforms prior activation-to-language methods on security-relevant tasks.

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Query Lens extends Logit Lens to interpret sparse features via key-value analysis and indirect effects, yielding coherent token signatures where Logit Lens fails, and proposes the Subspace Channel Hypothesis.

A framework for analyzing concept representations in neural models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

The State-Prediction Separation Hypothesis

cs.CL · 2026-07-01 · unverdicted · novelty 6.0

A two-stream Transformer variant that separates state storage from next-token prediction improves validation loss and downstream task performance by 2-3 points over standard Transformers.

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

cs.CL · 2026-05-13 · unverdicted · novelty 3.0

AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

citing papers explorer

Showing 3 of 3 citing papers after filters.

A framework for analyzing concept representations in neural models cs.CL · 2026-05-02 · unverdicted · none · ref 182
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
The State-Prediction Separation Hypothesis cs.CL · 2026-07-01 · unverdicted · none · ref 21
A two-stream Transformer variant that separates state storage from next-token prediction improves validation loss and downstream task performance by 2-3 points over standard Transformers.
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue cs.CL · 2026-05-13 · unverdicted · none · ref 19
AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State , url=

fields

years

verdicts

representative citing papers

citing papers explorer