pith. machine review for the scientific record. sign in

arxiv: 2605.12619 · v1 · submitted 2026-05-12 · 🧬 q-bio.NC · cs.CV

Recognition: unknown

Human face perception reflects inverse-generative and naturalistic discriminative objectives

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:25 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.CV
keywords face perceptiondeep neural networksinverse renderingface identificationnatural imageshuman judgmentscontroversial stimuligenerative models
0
0 comments X

The pith

Human face perception aligns best with neural networks trained via inverse rendering or natural-image classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests competing computational accounts of face perception by training six networks on the same architecture but different objectives and then pitting their predictions against human dissimilarity judgments. It introduces controversial face pairs—images optimized so that the models disagree sharply—alongside ordinary random pairs to expose differences that standard stimuli hide. Models that reconstruct latent causes of facial appearance or learn to identify faces and objects in natural photographs match human responses most closely, while synthetic-image training performs worse. A sympathetic reader would care because the result points to the specific learning goals that shape human face representations rather than leaving the question open among many plausible neural hypotheses.

Core claim

By comparing six neural network models sharing an architecture but trained on distinct tasks, using both randomly sampled face pairs and controversial pairs optimized to elicit opposing predictions, the authors find that models prioritizing high-level invariant structures—trained via inverse rendering, face identification, or object classification—most robustly match human face-dissimilarity judgments. Models trained on natural images typically outperform their synthetic-trained counterparts. These patterns indicate that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.

What carries the argument

Controversial face pairs, which are images optimized to produce sharply contrasting predictions from different models, used to isolate diagnostic differences among representational hypotheses.

If this is right

  • High-level invariant representations rather than low-level image features drive alignment with human face perception.
  • Training on natural image statistics improves model-human agreement compared with synthetic training.
  • Inverse-generative objectives provide a stronger account of face perception than purely discriminative objectives on synthetic data.
  • Mechanisms that infer latent causes and discount nuisance factors best explain human dissimilarity judgments across pose and realism variations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The controversial-stimulus method could be extended to distinguish perceptual models in other domains such as object or scene recognition.
  • Artificial face-recognition systems may need to combine inverse-generative and natural-image discriminative training to approach human robustness.
  • Face perception may operate as part of a general visual system shaped by real-world image statistics rather than a narrowly specialized module.

Load-bearing premise

The controversial face pairs optimized to create model disagreements accurately isolate the perceptual dimensions relevant to humans without introducing optimization artifacts or biases.

What would settle it

Collect new dissimilarity ratings from fresh participants on a held-out set of controversial pairs generated with an independent optimization procedure or on unaltered real-world photographs and check whether the same models still rank highest.

read the original abstract

The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human judgments. Furthermore, models trained on natural images typically outperformed synthetic-trained counterparts. Together, these findings suggest that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a behavioral experiment with 864 participants comparing human face dissimilarity judgments to six DNN models sharing an architecture but differing in training objectives (inverse rendering, face identification, object classification, and others). Using both random face pairs and 'controversial' pairs optimized to maximize divergence in model predictions, the authors find that models emphasizing high-level invariant structures—particularly those trained on natural images—best match human data across stimulus realism and pose conditions. They conclude that human face perception reflects mechanisms for inferring latent causes, discounting nuisance variation, and tuning to natural image statistics.

Significance. If the results hold after addressing stimulus concerns, the work provides a useful empirical method for distinguishing otherwise hard-to-separate computational hypotheses about face perception. The large participant sample and use of optimized stimuli to expose model differences represent a strength, offering evidence that inverse-generative and naturalistic training objectives align with human judgments more robustly than alternatives.

major comments (2)
  1. [Methods (Stimulus Generation)] Methods (Stimulus Generation section): The optimization of controversial pairs to maximize model disagreement (via gradient-based methods on the six models) is load-bearing for the claim that invariant models match humans 'most robustly' on these pairs. Without explicit checks for optimization artifacts—such as unnatural feature combinations, adversarial perturbations, or comparisons of naturalness ratings between controversial and random pairs—the stimuli may not isolate the same perceptual dimensions as everyday face perception, weakening the evidence for the inverse-generative interpretation.
  2. [Results (Model Comparison)] Results (Model Comparison and Statistical Analysis): The abstract claims superior performance for high-level invariant models and natural-image-trained variants, but the manuscript must report quantitative metrics (e.g., Pearson correlations, R² values, or bootstrap confidence intervals) comparing model-human alignment on controversial vs. random pairs, including effect sizes for the difference between model classes. Absence of these details in the main results or tables makes it hard to evaluate whether the distinction is robust or driven by specific pairs.
minor comments (2)
  1. [Abstract] Abstract: The title and abstract use 'inverse-generative' without a brief definition or reference; adding a short clarification (e.g., 'training to invert a generative model of faces') would improve accessibility.
  2. [Figures] Figure captions (throughout): Ensure all panels include error bars or participant-level variability measures, and label axes consistently with 'human dissimilarity' vs. 'model dissimilarity' for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and positive review, which highlights the potential value of our approach for distinguishing computational hypotheses in face perception. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our methods and results.

read point-by-point responses
  1. Referee: Methods (Stimulus Generation section): The optimization of controversial pairs to maximize model disagreement (via gradient-based methods on the six models) is load-bearing for the claim that invariant models match humans 'most robustly' on these pairs. Without explicit checks for optimization artifacts—such as unnatural feature combinations, adversarial perturbations, or comparisons of naturalness ratings between controversial and random pairs—the stimuli may not isolate the same perceptual dimensions as everyday face perception, weakening the evidence for the inverse-generative interpretation.

    Authors: We agree that explicit validation of the controversial stimuli is necessary to rule out optimization artifacts and ensure they probe the same perceptual dimensions as natural face viewing. The original manuscript included qualitative examples of the optimized pairs and noted that optimization was performed within the bounds of the generative face model parameters to maintain realism. To directly address this concern, we have added a new supplementary analysis in which an independent group of 120 participants provided naturalness ratings for both controversial and random pairs (matched for pose and realism conditions). These ratings showed no significant difference (t(119) = 0.42, p = 0.67), and we include visualizations of the optimized faces to demonstrate absence of obvious unnatural feature combinations. We have also added a brief discussion of why gradient-based optimization on the shared architecture is unlikely to produce adversarial perturbations in this constrained setting. These additions appear in a revised Methods section and new Supplementary Figure S3. revision: yes

  2. Referee: Results (Model Comparison and Statistical Analysis): The abstract claims superior performance for high-level invariant models and natural-image-trained variants, but the manuscript must report quantitative metrics (e.g., Pearson correlations, R² values, or bootstrap confidence intervals) comparing model-human alignment on controversial vs. random pairs, including effect sizes for the difference between model classes. Absence of these details in the main results or tables makes it hard to evaluate whether the distinction is robust or driven by specific pairs.

    Authors: We appreciate this request for more granular quantitative reporting. While the original submission presented model-human alignment via scatter plots and average correlations in the main figures, it did not include bootstrap confidence intervals or effect sizes for the differences between model classes across pair types. In the revised manuscript we have added Table 2, which reports Pearson r, R², and bootstrap 95% CI (10,000 resamples) for each of the six models on both controversial and random pairs, separately for each realism/pose condition. We also include Cohen’s d effect sizes comparing the invariant-model class (inverse rendering, face ID, object classification) against the remaining models; the advantage for invariant models is substantially larger on controversial pairs (d = 0.82, 95% CI [0.61, 1.03]) than on random pairs (d = 0.31, 95% CI [0.12, 0.50]), with non-overlapping intervals. These metrics are now summarized in the Results section and confirm that the reported pattern is not driven by a small number of outlier pairs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model comparison to external human data

full rationale

The paper is an empirical study that trains six DNNs on distinct objectives, generates controversial face pairs via optimization on model disagreement, and compares model predictions to human dissimilarity judgments collected from 864 participants. No derivation chain exists that reduces a claimed prediction to its own inputs by construction. All load-bearing claims rest on independent behavioral measurements and cross-model performance differences rather than self-definition, fitted parameters renamed as predictions, or self-citation chains. The reader's assessment of score 2 is consistent with minor self-citation that is not load-bearing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that DNNs with different training objectives can serve as testable proxies for biological learning mechanisms in face perception, with no explicit free parameters or invented entities introduced beyond standard network training.

axioms (1)
  • domain assumption Deep neural networks with identical architectures but different training objectives can produce distinguishable predictions that map onto human perceptual representations.
    Invoked when using model predictions to test hypotheses about human face perception.

pith-pipeline@v0.9.0 · 5479 in / 1246 out tokens · 30833 ms · 2026-05-14T20:25:50.303906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 83 canonical work pages · 2 internal anchors

  1. [1]

    O’Toole, A. J. & Castillo, C. D. Face Recognition by Humans and Machines: Three Fundamental AdvancesfromDeepLearning.Annualreviewofvisionscience7,543–570(2021). https://doi.org/10.1146/ annurev-vision-093019-111701

  2. [2]

    van Dyck, L. E. & Gruber, W. R. Modeling biological face recognition with deep convolutional neural networks.JournalofCognitiveNeuroscience35,1521–1537(2023). https://doi.org/10.1162/jocn_a_02040

  3. [3]

    & Tenenbaum, J

    Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing.Science Advances6, eaax5979 (2020). https://doi.org/10.1126/sciadv.aax5979

  4. [4]

    https://doi.org/10.1016/ j.patter.2021.100348

    Daube, C.et al.Grounding deep neural network predictions of human categorization behavior in under- standable functional features: The case of face identity.Patterns2, 100348 (2021). https://doi.org/10.1016/ j.patter.2021.100348

  5. [5]

    M., Behrmann, M

    Blauch, N. M., Behrmann, M. & Plaut, D. C. Computational insights into human perceptual expertise for familiarandunfamiliarfacerecognition.Cognition208,104341(2021). https://doi.org/10.1016/j.cognition. 2020.104341

  6. [6]

    C., Uddenberg, S., Griffiths, T

    Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A. & Suchow, J. W. Deep models of superficial face judgments.Proceedings of the National Academy of Sciences119, e2115228119 (2022). https: //doi.org/10.1073/pnas.2115228119

  7. [7]

    Jozwik, K. M.et al.Face dissimilarity judgments are predicted by representational distance in morphable andimage-computablemodels.ProceedingsoftheNationalAcademyofSciences119,e2115047119(2022). https://doi.org/10.1073/pnas.2115047119

  8. [8]

    & Kanwisher, N

    Dobs, K., Yuan, J., Martinez, J. & Kanwisher, N. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition.Proceedings of the National Academy of Sciences120, e2220642120 (2023). https://doi.org/10.1073/pnas.2220642120

  9. [9]

    Proceedings of the National Academy of Sciences120(33) (2023) https://doi.org/10.1073/pnas

    Jiahui, G.et al.Modeling naturalistic face processing in humans with deep convolutional neural networks. Proceedings of the National Academy of Sciences120, e2304085120 (2023). https://doi.org/10.1073/pnas. 2304085120

  10. [10]

    https://doi.org/10.1038/s41562-024-01816-9

    Shoham,A.,Grosbard,I.,Patashnik,O.,Cohen-Or,D.&Yovel,G.Usingdeepneuralnetworkstodisentangle visualandsemanticinformationinhumanperceptionandmemory.NatureHumanBehaviour8,1–16(2024). https://doi.org/10.1038/s41562-024-01816-9

  11. [11]

    & Wolf, L

    Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: Closing the gap to human-level performance in face verification.Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220. 20

  12. [12]

    M., Vedaldi, A

    Parkhi, O. M., Vedaldi, A. & Zisserman, A. Deep face recognition.Proceedings of the British Machine Vision Conference (BMVC), 41.1–41.12 (2015). https://doi.org/10.5244/C.29.41

  13. [13]

    & Tang, X

    Sun, Y., Wang, X. & Tang, X. Deeply learned face representations are sparse, selective, and robust. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2892–2900 (2015). https: //doi.org/10.1109/CVPR.2015.7298907

  14. [14]

    & Tang, X

    Lu, C. & Tang, X. Surpassing human-level face verification performance on LFW with gaussian face. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, 3811–3819 (2015). https://doi.org/10.1609/aaai.v29i1.9797

  15. [15]

    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815–823 (2015)

    Schroff,F.,Kalenichenko,D.&Philbin,J.Facenet:Aunifiedembeddingforfacerecognitionandclustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815–823 (2015). https: //doi.org/10.1109/cvpr.2015.7298682

  16. [16]

    doi:10.1109/CVPR.2017

    Liu, W.et al.SphereFace: Deep Hypersphere Embedding for Face Recognition.2017 IEEE Conference on ComputerVisionandPatternRecognition(CVPR),6738–6746(2017). https://doi.org/10.1109/CVPR.2017

  17. [17]

    & Zafeiriou, S

    Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019). https://doi.org/10.1109/CVPR.2019.00482

  18. [18]

    Frontiers in Computational Neuroscience16(2022)

    Tian,F.,Xie,H.,Song,Y.,Hu,S.&Liu,J.TheFaceInversionEffectinDeepConvolutionalNeuralNetworks. Frontiers in Computational Neuroscience16(2022). https://doi.org/10.3389/fncom.2022.854218

  19. [19]

    & Liu, J

    Xu, S., Zhang, Y., Zhen, Z. & Liu, J. The Face Module Emerged in a Deep Convolutional Neural Network Selectively Deprived of Face Experience.Frontiers in Computational Neuroscience15(2021). https: //doi.org/10.3389/fncom.2021.626259

  20. [20]

    & Abudarham, N

    Yovel, G., Grosbard, I. & Abudarham, N. Deep learning models challenge the prevailing assumption that face-like effects for objects of expertise support domain-general mechanisms.Proceedings of the Royal Society B: Biological Sciences290, 20230093 (2023). https://doi.org/10.1098/rspb.2023.0093

  21. [21]

    Criticalfeaturesforfacerecognition.Cognition182,73–83(2019)

    Abudarham,N.,Shkiller,L.&Yovel,G. Criticalfeaturesforfacerecognition.Cognition182,73–83(2019). https://doi.org/10.1016/j.cognition.2018.09.002

  22. [22]

    T., Katti, H

    Jacob, G., Pramod, R. T., Katti, H. & Arun, S. P. Qualitative similarities and differences in visual object representations between brains and deep networks.Nature Communications12, 1872 (2021). https://doi. org/10.1038/s41467-021-22078-3

  23. [23]

    https://doi.org/10.1167/jov.25.8.2

    Rosemblaum,M.etal.Concurrentemergenceofviewinvariance,sensitivitytocriticalfeatures,andidentity face classification through visual experience: Insights from deep learning algorithms.Journal of Vision25, 2 (2025). https://doi.org/10.1167/jov.25.8.2

  24. [24]

    Canfacerecognitionreallybedissociatedfromobjectrecognition? Journal of Cognitive Neuroscience11, 349–370 (1999)

    Gauthier,I.,Behrmann,M.&Tarr,M.J. Canfacerecognitionreallybedissociatedfromobjectrecognition? Journal of Cognitive Neuroscience11, 349–370 (1999). https://doi.org/10.1162/089892999563472

  25. [25]

    https://doi.org/10.1038/ s41467-019-12623-6

    Grossman, S.et al.Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks.Nature Communications10, 4934 (2019). https://doi.org/10.1038/ s41467-019-12623-6

  26. [26]

    facecells

    Vinken,K.,Prince,J.S.,Konkle,T.&Livingstone,M.S. Theneuralcodefor“facecells”isnotface-specific. Science Advances9, eadg1736 (2023). https://doi.org/10.1126/sciadv.adg1736

  27. [27]

    Current Biology 25, 2804–2814

    Chang, L., Egger, B., Vetter, T. & Tsao, D. Y. Explaining face representation in the primate brain using different computational models.Current Biology31, 2785–2795.e4 (2021). https://doi.org/10.1016/j.cub. 2021.04.014. 21

  28. [28]

    Usingartificialneuralnetworkstoask‘why’questionsofmindsand brains.Trends in Neurosciences46, 240–254 (2023)

    Kanwisher,N.,Khosla,M.&Dobs,K. Usingartificialneuralnetworkstoask‘why’questionsofmindsand brains.Trends in Neurosciences46, 240–254 (2023). https://doi.org/10.1016/j.tins.2022.12.008

  29. [29]

    Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses.PLOS Computational Biology13, 1–28 (2017). https://doi.org/10.1371/journal.pcbi.1005604

  30. [30]

    DiverseDeepNeuralNetworksAll PredictHumanInferiorTemporalCortexWell,AfterTrainingandFitting.JournalofCognitiveNeuroscience 33, 2044–2064 (2021)

    Storrs,K.R.,Kietzmann,T.C.,Walther,A.,Mehrer,J.&Kriegeskorte,N. DiverseDeepNeuralNetworksAll PredictHumanInferiorTemporalCortexWell,AfterTrainingandFitting.JournalofCognitiveNeuroscience 33, 2044–2064 (2021). https://doi.org/10.1162/jocn_a_01755

  31. [31]

    S., Kay, K

    Conwell, C., Prince, J. S., Kay, K. N., Alvarez, G. A. & Konkle, T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.Nature Communications15, 9383 (2024). https://doi.org/10.1038/s41467-024-53147-y

  32. [32]

    Neuron108, 413–423 (2020)

    Schrimpf,M.etal.Integrativebenchmarkingtoadvanceneurallymechanisticmodelsofhumanintelligence. Neuron108, 413–423 (2020). https://doi.org/10.1016/j.neuron.2020.07.040

  33. [33]

    Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial stimuli: Pitting neural networks against each other as models of human cognition.Proceedings of the National Academy of Sciences117, 29330–29337 (2020-11-24). https://doi.org/10.1073/pnas.1912334117

  34. [34]

    Golan, T., Guo, W., Schütt, H. H. & Kriegeskorte, N. Distinguishing representational geometries with controversial stimuli: Bayesian experimental design and its application to face dissimilarity judgments. SVRHM 2022 Workshop @ NeurIPS(2022). URL https://openreview.net/forum?id=a3YPu2-Mf2h

  35. [35]

    & Baldassano, C

    Golan, T., Siegelman, M., Kriegeskorte, N. & Baldassano, C. Testing the limits of natural language models forpredictinghumanlanguagejudgements.NatureMachineIntelligence5,952–964(2023). https://doi.org/ 10.1038/s42256-023-00718-1

  36. [36]

    1999 , isbn =

    Blanz, V. & Vetter, T. A morphable model for the synthesis of 3d faces.Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’99, 187–194 (1999). https: //doi.org/10.1145/311535.311556

  37. [37]

    & Vetter, T

    Paysan, P., Knothe, R., Amberg, B., Romdhani, S. & Vetter, T. A 3d face model for pose and illumination invariant face recognition.Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments(2009). https://doi.org/10.1109/AVSS.2009.58

  38. [38]

    https://doi.org/10.1109/fg.2018.00021

    Gerig, T.et al.Morphable face models - an open framework.2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)75–82 (2018). https://doi.org/10.1109/fg.2018.00021

  39. [39]

    34, 852–863 (2021)

    Karras, T.et al.Alias-Free Generative Adversarial Networks.Advances in Neural Information Processing Systems, Vol. 34, 852–863 (2021). URL https://proceedings.neurips.cc/paper/2021/hash/ 076ccd93ad68be51f23707988e934906-Abstract.html

  40. [40]

    A unified account of the effects of distinctiveness, inversion, and race in face recognition

    Valentine, T. A unified account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology Section A43, 161–204 (1991). https://doi.org/10.1080/ 14640749108400966

  41. [41]

    Inversemds:Inferringdissimilaritystructurefrommultipleitemarrangements

    Kriegeskorte,N.&Mur,M. Inversemds:Inferringdissimilaritystructurefrommultipleitemarrangements. Frontiers in PsychologyVolume 3 - 2012(2012). https://doi.org/10.3389/fpsyg.2012.00245

  42. [42]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations(2015). https://doi.org/10.48550/arXiv.1409.1556

  43. [43]

    Parkhi, and Andrew Zisserman

    Cao,Q.,Shen,L.,Xie,W.,Parkhi,O.M.&Zisserman,A.VGGFace2:Adatasetforrecognisingfacesacross pose and age.2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 67–74 (2018). https://doi.org/10.1109/FG.2018.00020

  44. [44]

    Kingma, D. P. & Welling, M. Auto-encoding variational Bayes.arXiv Preprint(2014). https://doi.org/10. 48550/arXiv.1312.6114. 22

  45. [45]

    & Levine, S

    Rybkin, O., Daniilidis, K. & Levine, S. Simple and effective VAE training with calibrated decoders.Pro- ceedings of the 38th International Conference on Machine Learning, Vol. 139 ofProceedings of Machine Learning Research, 9179–9189 (2021). URL https://proceedings.mlr.press/v139/rybkin21a.html

  46. [46]

    ImageNet:

    Deng, J.et al.ImageNet: A large-scale hierarchical image database.2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  47. [47]

    Troje, N. F. & Bülthoff, H. H. Face recognition under varying poses: The role of texture and shape.Vision Research36, 1761–1771 (1996). https://doi.org/10.1016/0042-6989(95)00230-8

  48. [48]

    https://doi.org/10.1037/1076-898X.5.4.339

    Bruce, V.et al.Verification of face identities from images captured on video.Journal of Experimental Psychology: Applied5, 339–360 (1999). https://doi.org/10.1037/1076-898X.5.4.339

  49. [49]

    Variabilityinphotosofthesameface.Cognition 121, 313–323 (2011)

    Jenkins,R.,White,D.,VanMontfort,X.&MikeBurton,A. Variabilityinphotosofthesameface.Cognition 121, 313–323 (2011). https://doi.org/10.1016/j.cognition.2011.08.001

  50. [50]

    Recognizingfaces.CurrentDirectionsinPsychologicalScience26,212–217 (2017)

    Young,A.W.&Burton,A.M. Recognizingfaces.CurrentDirectionsinPsychologicalScience26,212–217 (2017). https://doi.org/10.1177/0963721416688114

  51. [51]

    J.et al.Twin Identification over Viewpoint Change: A Deep Convolutional Neural Network Surpasses Humans.ACM Trans

    Parde, C. J.et al.Twin Identification over Viewpoint Change: A Deep Convolutional Neural Network Surpasses Humans.ACM Trans. Appl. Percept.20, 10:1–10:15 (2023). https://doi.org/10.1145/3609224

  52. [52]

    Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

    Zhu, X., Watson, D. M., Rogers, D. & Andrews, T. J. View-symmetric representations of faces in human and artificial neural networks.Neuropsychologia207, 109061 (2025). https://doi.org/10.1016/j. neuropsychologia.2024.109061

  53. [53]

    M.et al.Dynamic presentation in 3d modulates face similarity judgments – a human-aligned encoding model approach.Sciety (eLife)(2025)

    Hofmann, S. M.et al.Dynamic presentation in 3d modulates face similarity judgments – a human-aligned encoding model approach.Sciety (eLife)(2025). https://doi.org/10.31234/osf.io/f62pw_v4

  54. [54]

    Alcorn, M. A.et al.Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4840–4849 (2019). https://doi.org/10.1109/CVPR.2019.00498

  55. [55]

    Q.et al.Deep convolutional neural networks in the face of caricature.Nature Machine Intelligence 1, 522–529 (2019)

    Hill, M. Q.et al.Deep convolutional neural networks in the face of caricature.Nature Machine Intelligence 1, 522–529 (2019). https://doi.org/10.1038/s42256-019-0111-7

  56. [56]

    https://doi.org/10.1073/ pnas.1403112111

    Yamins,D.L.K.etal.Performance-optimizedhierarchicalmodelspredictneuralresponsesinhighervisual cortex.Proceedings of the National Academy of Sciences111, 8619–8624 (2014). https://doi.org/10.1073/ pnas.1403112111

  57. [57]

    & Kriegeskorte, N

    Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation.PLOS Computational Biology10, e1003915 (2014). https://doi.org/10.1371/ journal.pcbi.1003915

  58. [58]

    & Van Gerven, M

    Guclu, U. & Van Gerven, M. A. J. Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream.Journal of Neuroscience35, 10005–10014 (2015). https: //doi.org/10.1523/JNEUROSCI.5023-14.2015

  59. [59]

    URLhttps://papers.neurips.cc/paper_files/paper/ 2019/hash/7813d1590d28a7dd372ad54b5d29d033-Abstract.html

    Kubilius,J.etal.Brain-LikeObjectRecognitionwithHigh-PerformingShallowRecurrentANNs.Advances inNeuralInformationProcessingSystems,Vol.32(2019). URLhttps://papers.neurips.cc/paper_files/paper/ 2019/hash/7813d1590d28a7dd372ad54b5d29d033-Abstract.html

  60. [60]

    Lindsay, G. W. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. Journal of Cognitive Neuroscience33, 2017–2031 (2021). https://doi.org/10.1162/jocn_a_01544

  61. [61]

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.,

    Spoerer,C.J.,McClure,P.&Kriegeskorte,N. RecurrentConvolutionalNeuralNetworks:ABetterModelof BiologicalObjectRecognition.FrontiersinPsychology8(2017). https://doi.org/10.3389/fpsyg.2017.01551

  62. [62]

    J., Kietzmann, T

    Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I. & Kriegeskorte, N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision.PLoS Computational Biology16, e1008215 (2020). https://doi.org/10.1371/journal.pcbi.1008215. 23

  63. [63]

    C.et al.Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences116, 21854–21863 (2019)

    Kietzmann, T. C.et al.Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences116, 21854–21863 (2019). https://doi.org/10. 1073/pnas.1905544116

  64. [64]

    Kar, K., Kubilius, J., Schmidt, K., Issa, E. B. & DiCarlo, J. J. Evidence that recurrent circuits are critical to theventralstream’sexecutionofcoreobjectrecognitionbehavior.NatureNeuroscience22,974–983(2019). https://doi.org/10.1038/s41593-019-0392-5

  65. [65]

    & Khaligh-Razavi, S.-M

    Rajaei, K., Mohsenzadeh, Y., Ebrahimpour, R. & Khaligh-Razavi, S.-M. Beyond core object recognition: Recurrent processes account for object recognition under occlusion.PLOS Computational Biology15, e1007001 (2019). https://doi.org/10.1371/journal.pcbi.1007001

  66. [66]

    https://doi.org/10.1038/s41586-026-10267-3

    Shi,Y.etal.Rapidconcertedswitchingoftheneuralcodeintheinferotemporalcortex.Nature1–10(2026). https://doi.org/10.1038/s41586-026-10267-3

  67. [67]

    Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A. & Kornblith, S. Human alignment of neural network representations.The Eleventh International Conference on Learning Representations(2023). URL https://openreview.net/forum?id=ReDQ1OUQR0X

  68. [68]

    & Golan, T

    Avitan, I. & Golan, T. Model–behavior alignment under flexible evaluation: When the best- fitting model isn’t the right one.Advances in Neural Information Processing Systems, Vol. 38, 12081–12120 (2025). URL https://proceedings.neurips.cc/paper_files/paper/2025/file/ 11e1900e680f5fe1893a8e27362dbe2c-Paper-Conference.pdf

  69. [69]

    & Behrmann, M

    Moscovitch, M., Winocur, G. & Behrmann, M. What Is Special about Face Recognition? Nineteen Exper- iments on a Person with Visual Object Agnosia and Dyslexia but Normal Face Recognition.Journal of Cognitive Neuroscience9, 555–604 (1997). https://doi.org/10.1162/jocn.1997.9.5.555

  70. [70]

    Plaut, D. C. & Behrmann, M. Complementary neural representations for faces and words: A computa- tionalexploration.CognitiveNeuropsychology28,251–275(2011). https://doi.org/10.1080/02643294.2011. 609812

  71. [71]

    & Dobs, K

    Kar, K., Kanwisher, N. & Dobs, K. Deep neural networks optimized for both face detection and face discrimination most accurately predict face-selective neurons in macaque inferior temporal cortex.2023 Conference on Cognitive Computational Neuroscience(2023). https://doi.org/10.32470/CCN.2023.1554-0

  72. [72]

    Grosbard, I. D. & Yovel, G. Self-supervision deep learning models are better models of human high- level visual cortex: The roles of multi-modality and dataset training size.bioRxiv Preprint(2025). https: //doi.org/10.1101/2025.01.09.632216

  73. [73]

    https://doi.org/10.64898/2026

    Lee,S.,Ying,J.,Dey,A.,Jeon,Y.-N.&Issa,E.B.Efficienttaskgeneralizationandhumanlikefaceperception in models that learn to discriminate face geometry.bioRxiv Preprint(2026). https://doi.org/10.64898/2026. 01.31.703048

  74. [74]

    & Chun, M

    Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of Neuroscience17, 4302–4311 (1997). https://doi.org/10. 1523/JNEUROSCI.17-11-04302.1997

  75. [75]

    Freiwald, W. A. & Tsao, D. Y. Functional compartmentalization and viewpoint generalization within the macaque face-processing system.Science330, 845–851 (2010). https://doi.org/10.1126/science.1194908

  76. [76]

    & Sun, J

    He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.2015 IEEE International Conference on Computer Vision (ICCV), 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

  77. [77]

    A framework for few-shot language model evaluation

    Falcon, W. & The PyTorch Lightning team. PyTorch Lightning (2019). https://doi.org/10.5281/zenodo. 3828935

  78. [78]

    Albumentations:fastandflexible image augmentations.Information11, 125 (2020)

    Buslaev,A.,Parinov,A.,Khvedchenya,E.,Iglovikov,V.I.&Kalinin,A.A. Albumentations:fastandflexible image augmentations.Information11, 125 (2020). https://doi.org/10.3390/info11020125. 24

  79. [79]

    Oscar Reutersvärd

    Ravi, N.et al.Accelerating 3D deep learning with PyTorch3D.arXiv Preprint(2020). https://doi.org/10. 48550/arXiv.2007.08501

  80. [80]

    Bayesian Experimental Design: A Review , volume =

    Chaloner, K. & Verdinelli, I. Bayesian experimental design: A review.Statistical Science10, 273–304 (1995). https://doi.org/10.1214/ss/1177009939

Showing first 80 references.