Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.
Adapting pretrained transformers for tasks outside their training distribution.arXiv preprint arXiv:2108.05247
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.