CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
URL https://www.biorxiv.org/content/10.1101/ 407007v1
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
CHASMBrain uses dual-stream Mamba in a coarse-to-fine hierarchy to predict fMRI from images, reporting 0.429 Pearson correlation and 0.261 MSE on NSD with causal evidence that patch and CLS streams specialize to early versus higher visual cortex.
Hybrid JEMs at intermediate generative-discriminative balance maximize human alignment on perceptual similarity, gloss, uncertainty, robustness, cue conflict, and feature attribution benchmarks.
Feature visualization on TRIBE v2 brain encoders recovers the known ventral visual hierarchy from V1 to V4 and produces distinctive patterns for MT, FFA, and PPA, with optimized stimuli driving ~4x higher activation than natural images.
Decoding alignment metrics can remain high and unchanged even when encoding manifold topology is causally altered, so they do not imply similar function or computation across neural populations.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
V1 digital twins with comparable neural prediction accuracy differ in linear probe performance, unit tuning, and hidden-layer eigenspectra.
LITcoder introduces a modular open-source library for constructing, benchmarking, and comparing neural encoding models that map continuous stimuli such as stories to fMRI brain data.
citing papers explorer
-
CanViT: Toward Active-Vision Foundation Models
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
-
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
-
Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction
CHASMBrain uses dual-stream Mamba in a coarse-to-fine hierarchy to predict fMRI from images, reporting 0.429 Pearson correlation and 0.261 MSE on NSD with causal evidence that patch and CLS streams specialize to early versus higher visual cortex.
-
Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot
Hybrid JEMs at intermediate generative-discriminative balance maximize human alignment on perceptual similarity, gloss, uncertainty, robustness, cue conflict, and feature attribution benchmarks.
-
Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2
Feature visualization on TRIBE v2 brain encoders recovers the known ventral visual hierarchy from V1 to V4 and produces distinctive patterns for MT, FFA, and PPA, with optimized stimuli driving ~4x higher activation than natural images.
-
Decoding Alignment without Encoding Alignment: A critique of similarity analysis in neuroscience
Decoding alignment metrics can remain high and unchanged even when encoding manifold topology is causally altered, so they do not imply similar function or computation across neural populations.
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
Beyond Neural Activity Prediction: Probing Latent Representations in Mouse V1 Digital Twins
V1 digital twins with comparable neural prediction accuracy differ in linear probe performance, unit tuning, and hidden-layer eigenspectra.
-
LITcoder: A General-Purpose Library for Building and Comparing Encoding Models
LITcoder introduces a modular open-source library for constructing, benchmarking, and comparing neural encoding models that map continuous stimuli such as stories to fMRI brain data.