Power-Softmax is a new HE-compatible attention variant that permits training and inference of billion-parameter polynomial LLMs with performance matching standard transformers.
Analyzing the structure of attention in a transformer language model
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
G-Long uses graph-enhanced triplet memory and attention-aware scoring from a T5 summarizer to achieve up to 9.8% better response quality on MSC and 40.8% better retrieval recall on LME with lower overhead.
Trains and releases SAEs for Qwen3-1.7B/4B/8B models with layer-wise coverage and demonstrates causal steering of refusal via selected features.
citing papers explorer
-
Power-Softmax: Towards Secure LLM Inference over Encrypted Data
Power-Softmax is a new HE-compatible attention variant that permits training and inference of billion-parameter polynomial LLMs with performance matching standard transformers.
-
LMs as Task-Specific Knowledge Bases: An Interpretability Analysis
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
-
G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents
G-Long uses graph-enhanced triplet memory and attention-aware scoring from a T5 summarizer to achieve up to 9.8% better response quality on MSC and 40.8% better retrieval recall on LME with lower overhead.
-
Discovering Millions of Interpretable Features with Sparse Autoencoders
Trains and releases SAEs for Qwen3-1.7B/4B/8B models with layer-wise coverage and demonstrates causal steering of refusal via selected features.