A new Naturality Violation Score applied to fMRI and vision DNNs reveals that semantic axes align best in higher visual cortex and deeper layers while low-level axes align earlier.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Agents acquire lexical labels for visual concepts following a perceptual coherence gradient where perceptual distance predicts learning accuracy independently of semantic distance in a pre-registered CIFAR-100 experiment.
LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.
citing papers explorer
-
Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations?
A new Naturality Violation Score applied to fMRI and vision DNNs reveals that semantic axes align best in higher visual cortex and deeper layers while low-level axes align earlier.
-
Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents
Agents acquire lexical labels for visual concepts following a perceptual coherence gradient where perceptual distance predicts learning accuracy independently of semantic distance in a pre-registered CIFAR-100 experiment.
-
The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs
LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.