Introduces LIHA ablation to locate first-token broadcaster heads and provides causal evidence that instruction tuning localizes language identity circuits to early layers in transformers.
Language Contamination Helps Explains the Cross-lingual Capabilities of E nglish Pretrained Models
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CL 5roles
background 1polarities
background 1representative citing papers
Introduces Dango, a 1.8B strictly L1-only LLM using corpus filtering and lesson fine-tuning to simulate Japanese-to-English SLA and produce human-like L2 output patterns.
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
-
Benchmark Data Contamination of Large Language Models: A Survey
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.