Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
Majaj, Rishi Rajalingham, Elias B
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Decoding alignment metrics can remain high and unchanged even when encoding manifold topology is causally altered, so they do not imply similar function or computation across neural populations.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
citing papers explorer
-
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
-
Decoding Alignment without Encoding Alignment: A critique of similarity analysis in neuroscience
Decoding alignment metrics can remain high and unchanged even when encoding manifold topology is causally altered, so they do not imply similar function or computation across neural populations.
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.