Recognition: unknown
Human face perception reflects inverse-generative and naturalistic discriminative objectives
Pith reviewed 2026-05-14 20:25 UTC · model grok-4.3
The pith
Human face perception aligns best with neural networks trained via inverse rendering or natural-image classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By comparing six neural network models sharing an architecture but trained on distinct tasks, using both randomly sampled face pairs and controversial pairs optimized to elicit opposing predictions, the authors find that models prioritizing high-level invariant structures—trained via inverse rendering, face identification, or object classification—most robustly match human face-dissimilarity judgments. Models trained on natural images typically outperform their synthetic-trained counterparts. These patterns indicate that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.
What carries the argument
Controversial face pairs, which are images optimized to produce sharply contrasting predictions from different models, used to isolate diagnostic differences among representational hypotheses.
If this is right
- High-level invariant representations rather than low-level image features drive alignment with human face perception.
- Training on natural image statistics improves model-human agreement compared with synthetic training.
- Inverse-generative objectives provide a stronger account of face perception than purely discriminative objectives on synthetic data.
- Mechanisms that infer latent causes and discount nuisance factors best explain human dissimilarity judgments across pose and realism variations.
Where Pith is reading between the lines
- The controversial-stimulus method could be extended to distinguish perceptual models in other domains such as object or scene recognition.
- Artificial face-recognition systems may need to combine inverse-generative and natural-image discriminative training to approach human robustness.
- Face perception may operate as part of a general visual system shaped by real-world image statistics rather than a narrowly specialized module.
Load-bearing premise
The controversial face pairs optimized to create model disagreements accurately isolate the perceptual dimensions relevant to humans without introducing optimization artifacts or biases.
What would settle it
Collect new dissimilarity ratings from fresh participants on a held-out set of controversial pairs generated with an independent optimization procedure or on unaltered real-world photographs and check whether the same models still rank highest.
read the original abstract
The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human judgments. Furthermore, models trained on natural images typically outperformed synthetic-trained counterparts. Together, these findings suggest that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a behavioral experiment with 864 participants comparing human face dissimilarity judgments to six DNN models sharing an architecture but differing in training objectives (inverse rendering, face identification, object classification, and others). Using both random face pairs and 'controversial' pairs optimized to maximize divergence in model predictions, the authors find that models emphasizing high-level invariant structures—particularly those trained on natural images—best match human data across stimulus realism and pose conditions. They conclude that human face perception reflects mechanisms for inferring latent causes, discounting nuisance variation, and tuning to natural image statistics.
Significance. If the results hold after addressing stimulus concerns, the work provides a useful empirical method for distinguishing otherwise hard-to-separate computational hypotheses about face perception. The large participant sample and use of optimized stimuli to expose model differences represent a strength, offering evidence that inverse-generative and naturalistic training objectives align with human judgments more robustly than alternatives.
major comments (2)
- [Methods (Stimulus Generation)] Methods (Stimulus Generation section): The optimization of controversial pairs to maximize model disagreement (via gradient-based methods on the six models) is load-bearing for the claim that invariant models match humans 'most robustly' on these pairs. Without explicit checks for optimization artifacts—such as unnatural feature combinations, adversarial perturbations, or comparisons of naturalness ratings between controversial and random pairs—the stimuli may not isolate the same perceptual dimensions as everyday face perception, weakening the evidence for the inverse-generative interpretation.
- [Results (Model Comparison)] Results (Model Comparison and Statistical Analysis): The abstract claims superior performance for high-level invariant models and natural-image-trained variants, but the manuscript must report quantitative metrics (e.g., Pearson correlations, R² values, or bootstrap confidence intervals) comparing model-human alignment on controversial vs. random pairs, including effect sizes for the difference between model classes. Absence of these details in the main results or tables makes it hard to evaluate whether the distinction is robust or driven by specific pairs.
minor comments (2)
- [Abstract] Abstract: The title and abstract use 'inverse-generative' without a brief definition or reference; adding a short clarification (e.g., 'training to invert a generative model of faces') would improve accessibility.
- [Figures] Figure captions (throughout): Ensure all panels include error bars or participant-level variability measures, and label axes consistently with 'human dissimilarity' vs. 'model dissimilarity' for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive and positive review, which highlights the potential value of our approach for distinguishing computational hypotheses in face perception. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our methods and results.
read point-by-point responses
-
Referee: Methods (Stimulus Generation section): The optimization of controversial pairs to maximize model disagreement (via gradient-based methods on the six models) is load-bearing for the claim that invariant models match humans 'most robustly' on these pairs. Without explicit checks for optimization artifacts—such as unnatural feature combinations, adversarial perturbations, or comparisons of naturalness ratings between controversial and random pairs—the stimuli may not isolate the same perceptual dimensions as everyday face perception, weakening the evidence for the inverse-generative interpretation.
Authors: We agree that explicit validation of the controversial stimuli is necessary to rule out optimization artifacts and ensure they probe the same perceptual dimensions as natural face viewing. The original manuscript included qualitative examples of the optimized pairs and noted that optimization was performed within the bounds of the generative face model parameters to maintain realism. To directly address this concern, we have added a new supplementary analysis in which an independent group of 120 participants provided naturalness ratings for both controversial and random pairs (matched for pose and realism conditions). These ratings showed no significant difference (t(119) = 0.42, p = 0.67), and we include visualizations of the optimized faces to demonstrate absence of obvious unnatural feature combinations. We have also added a brief discussion of why gradient-based optimization on the shared architecture is unlikely to produce adversarial perturbations in this constrained setting. These additions appear in a revised Methods section and new Supplementary Figure S3. revision: yes
-
Referee: Results (Model Comparison and Statistical Analysis): The abstract claims superior performance for high-level invariant models and natural-image-trained variants, but the manuscript must report quantitative metrics (e.g., Pearson correlations, R² values, or bootstrap confidence intervals) comparing model-human alignment on controversial vs. random pairs, including effect sizes for the difference between model classes. Absence of these details in the main results or tables makes it hard to evaluate whether the distinction is robust or driven by specific pairs.
Authors: We appreciate this request for more granular quantitative reporting. While the original submission presented model-human alignment via scatter plots and average correlations in the main figures, it did not include bootstrap confidence intervals or effect sizes for the differences between model classes across pair types. In the revised manuscript we have added Table 2, which reports Pearson r, R², and bootstrap 95% CI (10,000 resamples) for each of the six models on both controversial and random pairs, separately for each realism/pose condition. We also include Cohen’s d effect sizes comparing the invariant-model class (inverse rendering, face ID, object classification) against the remaining models; the advantage for invariant models is substantially larger on controversial pairs (d = 0.82, 95% CI [0.61, 1.03]) than on random pairs (d = 0.31, 95% CI [0.12, 0.50]), with non-overlapping intervals. These metrics are now summarized in the Results section and confirm that the reported pattern is not driven by a small number of outlier pairs. revision: yes
Circularity Check
No circularity: empirical model comparison to external human data
full rationale
The paper is an empirical study that trains six DNNs on distinct objectives, generates controversial face pairs via optimization on model disagreement, and compares model predictions to human dissimilarity judgments collected from 864 participants. No derivation chain exists that reduces a claimed prediction to its own inputs by construction. All load-bearing claims rest on independent behavioral measurements and cross-model performance differences rather than self-definition, fitted parameters renamed as predictions, or self-citation chains. The reader's assessment of score 2 is consistent with minor self-citation that is not load-bearing.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Deep neural networks with identical architectures but different training objectives can produce distinguishable predictions that map onto human perceptual representations.
Reference graph
Works this paper leans on
-
[1]
O’Toole, A. J. & Castillo, C. D. Face Recognition by Humans and Machines: Three Fundamental AdvancesfromDeepLearning.Annualreviewofvisionscience7,543–570(2021). https://doi.org/10.1146/ annurev-vision-093019-111701
2021
-
[2]
van Dyck, L. E. & Gruber, W. R. Modeling biological face recognition with deep convolutional neural networks.JournalofCognitiveNeuroscience35,1521–1537(2023). https://doi.org/10.1162/jocn_a_02040
-
[3]
Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing.Science Advances6, eaax5979 (2020). https://doi.org/10.1126/sciadv.aax5979
-
[4]
https://doi.org/10.1016/ j.patter.2021.100348
Daube, C.et al.Grounding deep neural network predictions of human categorization behavior in under- standable functional features: The case of face identity.Patterns2, 100348 (2021). https://doi.org/10.1016/ j.patter.2021.100348
-
[5]
Blauch, N. M., Behrmann, M. & Plaut, D. C. Computational insights into human perceptual expertise for familiarandunfamiliarfacerecognition.Cognition208,104341(2021). https://doi.org/10.1016/j.cognition. 2020.104341
-
[6]
C., Uddenberg, S., Griffiths, T
Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A. & Suchow, J. W. Deep models of superficial face judgments.Proceedings of the National Academy of Sciences119, e2115228119 (2022). https: //doi.org/10.1073/pnas.2115228119
-
[7]
Jozwik, K. M.et al.Face dissimilarity judgments are predicted by representational distance in morphable andimage-computablemodels.ProceedingsoftheNationalAcademyofSciences119,e2115047119(2022). https://doi.org/10.1073/pnas.2115047119
-
[8]
Dobs, K., Yuan, J., Martinez, J. & Kanwisher, N. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition.Proceedings of the National Academy of Sciences120, e2220642120 (2023). https://doi.org/10.1073/pnas.2220642120
-
[9]
Proceedings of the National Academy of Sciences120(33) (2023) https://doi.org/10.1073/pnas
Jiahui, G.et al.Modeling naturalistic face processing in humans with deep convolutional neural networks. Proceedings of the National Academy of Sciences120, e2304085120 (2023). https://doi.org/10.1073/pnas. 2304085120
-
[10]
https://doi.org/10.1038/s41562-024-01816-9
Shoham,A.,Grosbard,I.,Patashnik,O.,Cohen-Or,D.&Yovel,G.Usingdeepneuralnetworkstodisentangle visualandsemanticinformationinhumanperceptionandmemory.NatureHumanBehaviour8,1–16(2024). https://doi.org/10.1038/s41562-024-01816-9
-
[11]
Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: Closing the gap to human-level performance in face verification.Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220. 20
-
[12]
Parkhi, O. M., Vedaldi, A. & Zisserman, A. Deep face recognition.Proceedings of the British Machine Vision Conference (BMVC), 41.1–41.12 (2015). https://doi.org/10.5244/C.29.41
-
[13]
Sun, Y., Wang, X. & Tang, X. Deeply learned face representations are sparse, selective, and robust. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2892–2900 (2015). https: //doi.org/10.1109/CVPR.2015.7298907
-
[14]
Lu, C. & Tang, X. Surpassing human-level face verification performance on LFW with gaussian face. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, 3811–3819 (2015). https://doi.org/10.1609/aaai.v29i1.9797
-
[15]
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815–823 (2015)
Schroff,F.,Kalenichenko,D.&Philbin,J.Facenet:Aunifiedembeddingforfacerecognitionandclustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815–823 (2015). https: //doi.org/10.1109/cvpr.2015.7298682
-
[16]
Liu, W.et al.SphereFace: Deep Hypersphere Embedding for Face Recognition.2017 IEEE Conference on ComputerVisionandPatternRecognition(CVPR),6738–6746(2017). https://doi.org/10.1109/CVPR.2017
-
[17]
Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019). https://doi.org/10.1109/CVPR.2019.00482
-
[18]
Frontiers in Computational Neuroscience16(2022)
Tian,F.,Xie,H.,Song,Y.,Hu,S.&Liu,J.TheFaceInversionEffectinDeepConvolutionalNeuralNetworks. Frontiers in Computational Neuroscience16(2022). https://doi.org/10.3389/fncom.2022.854218
-
[19]
Xu, S., Zhang, Y., Zhen, Z. & Liu, J. The Face Module Emerged in a Deep Convolutional Neural Network Selectively Deprived of Face Experience.Frontiers in Computational Neuroscience15(2021). https: //doi.org/10.3389/fncom.2021.626259
-
[20]
Yovel, G., Grosbard, I. & Abudarham, N. Deep learning models challenge the prevailing assumption that face-like effects for objects of expertise support domain-general mechanisms.Proceedings of the Royal Society B: Biological Sciences290, 20230093 (2023). https://doi.org/10.1098/rspb.2023.0093
-
[21]
Criticalfeaturesforfacerecognition.Cognition182,73–83(2019)
Abudarham,N.,Shkiller,L.&Yovel,G. Criticalfeaturesforfacerecognition.Cognition182,73–83(2019). https://doi.org/10.1016/j.cognition.2018.09.002
-
[22]
Jacob, G., Pramod, R. T., Katti, H. & Arun, S. P. Qualitative similarities and differences in visual object representations between brains and deep networks.Nature Communications12, 1872 (2021). https://doi. org/10.1038/s41467-021-22078-3
-
[23]
https://doi.org/10.1167/jov.25.8.2
Rosemblaum,M.etal.Concurrentemergenceofviewinvariance,sensitivitytocriticalfeatures,andidentity face classification through visual experience: Insights from deep learning algorithms.Journal of Vision25, 2 (2025). https://doi.org/10.1167/jov.25.8.2
-
[24]
Gauthier,I.,Behrmann,M.&Tarr,M.J. Canfacerecognitionreallybedissociatedfromobjectrecognition? Journal of Cognitive Neuroscience11, 349–370 (1999). https://doi.org/10.1162/089892999563472
-
[25]
https://doi.org/10.1038/ s41467-019-12623-6
Grossman, S.et al.Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks.Nature Communications10, 4934 (2019). https://doi.org/10.1038/ s41467-019-12623-6
2019
-
[26]
Vinken,K.,Prince,J.S.,Konkle,T.&Livingstone,M.S. Theneuralcodefor“facecells”isnotface-specific. Science Advances9, eadg1736 (2023). https://doi.org/10.1126/sciadv.adg1736
-
[27]
Chang, L., Egger, B., Vetter, T. & Tsao, D. Y. Explaining face representation in the primate brain using different computational models.Current Biology31, 2785–2795.e4 (2021). https://doi.org/10.1016/j.cub. 2021.04.014. 21
-
[28]
Kanwisher,N.,Khosla,M.&Dobs,K. Usingartificialneuralnetworkstoask‘why’questionsofmindsand brains.Trends in Neurosciences46, 240–254 (2023). https://doi.org/10.1016/j.tins.2022.12.008
-
[29]
Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses.PLOS Computational Biology13, 1–28 (2017). https://doi.org/10.1371/journal.pcbi.1005604
-
[30]
Storrs,K.R.,Kietzmann,T.C.,Walther,A.,Mehrer,J.&Kriegeskorte,N. DiverseDeepNeuralNetworksAll PredictHumanInferiorTemporalCortexWell,AfterTrainingandFitting.JournalofCognitiveNeuroscience 33, 2044–2064 (2021). https://doi.org/10.1162/jocn_a_01755
-
[31]
Conwell, C., Prince, J. S., Kay, K. N., Alvarez, G. A. & Konkle, T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.Nature Communications15, 9383 (2024). https://doi.org/10.1038/s41467-024-53147-y
-
[32]
Schrimpf,M.etal.Integrativebenchmarkingtoadvanceneurallymechanisticmodelsofhumanintelligence. Neuron108, 413–423 (2020). https://doi.org/10.1016/j.neuron.2020.07.040
-
[33]
Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial stimuli: Pitting neural networks against each other as models of human cognition.Proceedings of the National Academy of Sciences117, 29330–29337 (2020-11-24). https://doi.org/10.1073/pnas.1912334117
-
[34]
Golan, T., Guo, W., Schütt, H. H. & Kriegeskorte, N. Distinguishing representational geometries with controversial stimuli: Bayesian experimental design and its application to face dissimilarity judgments. SVRHM 2022 Workshop @ NeurIPS(2022). URL https://openreview.net/forum?id=a3YPu2-Mf2h
2022
-
[35]
Golan, T., Siegelman, M., Kriegeskorte, N. & Baldassano, C. Testing the limits of natural language models forpredictinghumanlanguagejudgements.NatureMachineIntelligence5,952–964(2023). https://doi.org/ 10.1038/s42256-023-00718-1
-
[36]
Blanz, V. & Vetter, T. A morphable model for the synthesis of 3d faces.Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’99, 187–194 (1999). https: //doi.org/10.1145/311535.311556
-
[37]
Paysan, P., Knothe, R., Amberg, B., Romdhani, S. & Vetter, T. A 3d face model for pose and illumination invariant face recognition.Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments(2009). https://doi.org/10.1109/AVSS.2009.58
-
[38]
https://doi.org/10.1109/fg.2018.00021
Gerig, T.et al.Morphable face models - an open framework.2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)75–82 (2018). https://doi.org/10.1109/fg.2018.00021
-
[39]
34, 852–863 (2021)
Karras, T.et al.Alias-Free Generative Adversarial Networks.Advances in Neural Information Processing Systems, Vol. 34, 852–863 (2021). URL https://proceedings.neurips.cc/paper/2021/hash/ 076ccd93ad68be51f23707988e934906-Abstract.html
2021
-
[40]
A unified account of the effects of distinctiveness, inversion, and race in face recognition
Valentine, T. A unified account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology Section A43, 161–204 (1991). https://doi.org/10.1080/ 14640749108400966
1991
-
[41]
Inversemds:Inferringdissimilaritystructurefrommultipleitemarrangements
Kriegeskorte,N.&Mur,M. Inversemds:Inferringdissimilaritystructurefrommultipleitemarrangements. Frontiers in PsychologyVolume 3 - 2012(2012). https://doi.org/10.3389/fpsyg.2012.00245
-
[42]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations(2015). https://doi.org/10.48550/arXiv.1409.1556
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.1556 2015
-
[43]
Cao,Q.,Shen,L.,Xie,W.,Parkhi,O.M.&Zisserman,A.VGGFace2:Adatasetforrecognisingfacesacross pose and age.2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 67–74 (2018). https://doi.org/10.1109/FG.2018.00020
-
[44]
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes.arXiv Preprint(2014). https://doi.org/10. 48550/arXiv.1312.6114. 22
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[45]
& Levine, S
Rybkin, O., Daniilidis, K. & Levine, S. Simple and effective VAE training with calibrated decoders.Pro- ceedings of the 38th International Conference on Machine Learning, Vol. 139 ofProceedings of Machine Learning Research, 9179–9189 (2021). URL https://proceedings.mlr.press/v139/rybkin21a.html
2021
-
[46]
Deng, J.et al.ImageNet: A large-scale hierarchical image database.2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
-
[47]
Troje, N. F. & Bülthoff, H. H. Face recognition under varying poses: The role of texture and shape.Vision Research36, 1761–1771 (1996). https://doi.org/10.1016/0042-6989(95)00230-8
-
[48]
https://doi.org/10.1037/1076-898X.5.4.339
Bruce, V.et al.Verification of face identities from images captured on video.Journal of Experimental Psychology: Applied5, 339–360 (1999). https://doi.org/10.1037/1076-898X.5.4.339
-
[49]
Variabilityinphotosofthesameface.Cognition 121, 313–323 (2011)
Jenkins,R.,White,D.,VanMontfort,X.&MikeBurton,A. Variabilityinphotosofthesameface.Cognition 121, 313–323 (2011). https://doi.org/10.1016/j.cognition.2011.08.001
-
[50]
Recognizingfaces.CurrentDirectionsinPsychologicalScience26,212–217 (2017)
Young,A.W.&Burton,A.M. Recognizingfaces.CurrentDirectionsinPsychologicalScience26,212–217 (2017). https://doi.org/10.1177/0963721416688114
-
[51]
Parde, C. J.et al.Twin Identification over Viewpoint Change: A Deep Convolutional Neural Network Surpasses Humans.ACM Trans. Appl. Percept.20, 10:1–10:15 (2023). https://doi.org/10.1145/3609224
-
[52]
Zhu, X., Watson, D. M., Rogers, D. & Andrews, T. J. View-symmetric representations of faces in human and artificial neural networks.Neuropsychologia207, 109061 (2025). https://doi.org/10.1016/j. neuropsychologia.2024.109061
work page doi:10.1016/j 2025
-
[53]
Hofmann, S. M.et al.Dynamic presentation in 3d modulates face similarity judgments – a human-aligned encoding model approach.Sciety (eLife)(2025). https://doi.org/10.31234/osf.io/f62pw_v4
-
[54]
Alcorn, M. A.et al.Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4840–4849 (2019). https://doi.org/10.1109/CVPR.2019.00498
-
[55]
Hill, M. Q.et al.Deep convolutional neural networks in the face of caricature.Nature Machine Intelligence 1, 522–529 (2019). https://doi.org/10.1038/s42256-019-0111-7
-
[56]
https://doi.org/10.1073/ pnas.1403112111
Yamins,D.L.K.etal.Performance-optimizedhierarchicalmodelspredictneuralresponsesinhighervisual cortex.Proceedings of the National Academy of Sciences111, 8619–8624 (2014). https://doi.org/10.1073/ pnas.1403112111
2014
-
[57]
& Kriegeskorte, N
Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation.PLOS Computational Biology10, e1003915 (2014). https://doi.org/10.1371/ journal.pcbi.1003915
2014
-
[58]
Guclu, U. & Van Gerven, M. A. J. Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream.Journal of Neuroscience35, 10005–10014 (2015). https: //doi.org/10.1523/JNEUROSCI.5023-14.2015
-
[59]
URLhttps://papers.neurips.cc/paper_files/paper/ 2019/hash/7813d1590d28a7dd372ad54b5d29d033-Abstract.html
Kubilius,J.etal.Brain-LikeObjectRecognitionwithHigh-PerformingShallowRecurrentANNs.Advances inNeuralInformationProcessingSystems,Vol.32(2019). URLhttps://papers.neurips.cc/paper_files/paper/ 2019/hash/7813d1590d28a7dd372ad54b5d29d033-Abstract.html
2019
-
[60]
Lindsay, G. W. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. Journal of Cognitive Neuroscience33, 2017–2031 (2021). https://doi.org/10.1162/jocn_a_01544
-
[61]
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.,
Spoerer,C.J.,McClure,P.&Kriegeskorte,N. RecurrentConvolutionalNeuralNetworks:ABetterModelof BiologicalObjectRecognition.FrontiersinPsychology8(2017). https://doi.org/10.3389/fpsyg.2017.01551
-
[62]
Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I. & Kriegeskorte, N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision.PLoS Computational Biology16, e1008215 (2020). https://doi.org/10.1371/journal.pcbi.1008215. 23
-
[63]
C.et al.Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences116, 21854–21863 (2019)
Kietzmann, T. C.et al.Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences116, 21854–21863 (2019). https://doi.org/10. 1073/pnas.1905544116
2019
-
[64]
Kar, K., Kubilius, J., Schmidt, K., Issa, E. B. & DiCarlo, J. J. Evidence that recurrent circuits are critical to theventralstream’sexecutionofcoreobjectrecognitionbehavior.NatureNeuroscience22,974–983(2019). https://doi.org/10.1038/s41593-019-0392-5
-
[65]
Rajaei, K., Mohsenzadeh, Y., Ebrahimpour, R. & Khaligh-Razavi, S.-M. Beyond core object recognition: Recurrent processes account for object recognition under occlusion.PLOS Computational Biology15, e1007001 (2019). https://doi.org/10.1371/journal.pcbi.1007001
-
[66]
https://doi.org/10.1038/s41586-026-10267-3
Shi,Y.etal.Rapidconcertedswitchingoftheneuralcodeintheinferotemporalcortex.Nature1–10(2026). https://doi.org/10.1038/s41586-026-10267-3
-
[67]
Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A. & Kornblith, S. Human alignment of neural network representations.The Eleventh International Conference on Learning Representations(2023). URL https://openreview.net/forum?id=ReDQ1OUQR0X
2023
-
[68]
& Golan, T
Avitan, I. & Golan, T. Model–behavior alignment under flexible evaluation: When the best- fitting model isn’t the right one.Advances in Neural Information Processing Systems, Vol. 38, 12081–12120 (2025). URL https://proceedings.neurips.cc/paper_files/paper/2025/file/ 11e1900e680f5fe1893a8e27362dbe2c-Paper-Conference.pdf
2025
-
[69]
Moscovitch, M., Winocur, G. & Behrmann, M. What Is Special about Face Recognition? Nineteen Exper- iments on a Person with Visual Object Agnosia and Dyslexia but Normal Face Recognition.Journal of Cognitive Neuroscience9, 555–604 (1997). https://doi.org/10.1162/jocn.1997.9.5.555
-
[70]
Plaut, D. C. & Behrmann, M. Complementary neural representations for faces and words: A computa- tionalexploration.CognitiveNeuropsychology28,251–275(2011). https://doi.org/10.1080/02643294.2011. 609812
-
[71]
Kar, K., Kanwisher, N. & Dobs, K. Deep neural networks optimized for both face detection and face discrimination most accurately predict face-selective neurons in macaque inferior temporal cortex.2023 Conference on Cognitive Computational Neuroscience(2023). https://doi.org/10.32470/CCN.2023.1554-0
-
[72]
Grosbard, I. D. & Yovel, G. Self-supervision deep learning models are better models of human high- level visual cortex: The roles of multi-modality and dataset training size.bioRxiv Preprint(2025). https: //doi.org/10.1101/2025.01.09.632216
-
[73]
Lee,S.,Ying,J.,Dey,A.,Jeon,Y.-N.&Issa,E.B.Efficienttaskgeneralizationandhumanlikefaceperception in models that learn to discriminate face geometry.bioRxiv Preprint(2026). https://doi.org/10.64898/2026. 01.31.703048
- [74]
-
[75]
Freiwald, W. A. & Tsao, D. Y. Functional compartmentalization and viewpoint generalization within the macaque face-processing system.Science330, 845–851 (2010). https://doi.org/10.1126/science.1194908
-
[76]
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.2015 IEEE International Conference on Computer Vision (ICCV), 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
-
[77]
A framework for few-shot language model evaluation
Falcon, W. & The PyTorch Lightning team. PyTorch Lightning (2019). https://doi.org/10.5281/zenodo. 3828935
-
[78]
Albumentations:fastandflexible image augmentations.Information11, 125 (2020)
Buslaev,A.,Parinov,A.,Khvedchenya,E.,Iglovikov,V.I.&Kalinin,A.A. Albumentations:fastandflexible image augmentations.Information11, 125 (2020). https://doi.org/10.3390/info11020125. 24
-
[79]
Ravi, N.et al.Accelerating 3D deep learning with PyTorch3D.arXiv Preprint(2020). https://doi.org/10. 48550/arXiv.2007.08501
-
[80]
Bayesian Experimental Design: A Review , volume =
Chaloner, K. & Verdinelli, I. Bayesian experimental design: A review.Statistical Science10, 273–304 (1995). https://doi.org/10.1214/ss/1177009939
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.