arxiv: 2605.11718 · v1 · submitted 2026-05-12 · 🧬 q-bio.NC · cs.AI· cs.NE

Recognition: 2 theorem links

· Lean Theorem

Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization

Chang Liu, Dahui Wang, Jie Su, Molan Li, Tianyi Qian, Zhaotian Gu

Pith reviewed 2026-05-13 04:40 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.NE

keywords MT direction mapsself-supervised learningtopographic organizationpinwheel structurescontrastive optimization3D ResNetvisual cortexdorsal stream

0 comments

The pith

A 3D ResNet trained on videos with contrastive learning and spatial regularization spontaneously forms direction maps and pinwheels matching primate MT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that direction-selective maps and pinwheel structures characteristic of area MT emerge when a 3D ResNet is trained on naturalistic videos using Momentum Contrast self-supervised learning plus a spatial loss that encourages nearby units to develop similar tuning. The resulting representations reproduce macaque physiological statistics including direction selectivity index, circular variance, and pinwheel density, and they arise specifically from the trade-off between the contrastive objective and the spatial term. A sympathetic reader would care because the result suggests that the dorsal stream follows the same general self-organization rules previously demonstrated for ventral-stream areas, rather than requiring separate biological machinery for motion topography.

Core claim

By training a 3D ResNet on naturalistic videos via a Momentum Contrast self-supervised paradigm alongside a biologically inspired spatial loss, brain-like direction maps and topological pinwheel structures emerge spontaneously. MT tuning properties with strong direction selectivity paired with a residual axial component arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines including direction selectivity index, circular variance, and pinwheel density, unifying the computational origins of the ventral and dorsal streams under a single general

What carries the argument

Spatiotemporal topographic deep artificial neural network (TDANN) implemented as a 3D ResNet trained with Momentum Contrast contrastive loss plus spatial regularization that penalizes differences between nearby neurons.

Load-bearing premise

The particular combination of MoCo contrastive loss and the chosen spatial regularization term on the 3D ResNet architecture suffices to produce MT-like direction topography without further biological constraints or post-hoc tuning.

What would settle it

Training the identical 3D ResNet with the contrastive loss but without the spatial regularization term and then finding that the resulting direction selectivity index and pinwheel density fall outside the ranges measured in macaque MT would falsify the claim that this optimization trade-off produces the maps.

Figures

Figures reproduced from arXiv: 2605.11718 by Chang Liu, Dahui Wang, Jie Su, Molan Li, Tianyi Qian, Zhaotian Gu.

**Figure 1.** Figure 1: Spatiotemporal TDANN overview. A 3D ResNet-18 backbone with MoCo and spatial losses. B Simulated cortical sheet and spatial loss that promotes similar responses among nearby units [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Biological validation and parameter sensitivity of emergent tuning properties in the MT-like layer. A 16 drifting-grating directions used for probing. B Population tuning curve (PTC) at α = 0.5. C-F Sensitivity to α: selective-unit fraction (C), median bandwidth(D), median CV(E), and median DSI(F); green dashed lines indicate macaque MT baselines [24, 3, 4]. Gray line: α = 0.5. emerging 180◦ residual compo… view at source ↗

**Figure 3.** Figure 3: Mechanistic origins of emergent tuning properties through optimization tradeoffs. A FWHM and CV-derived bandwidth diverge in the optimal model. B Decomposition of tuning under contrastive-only, spatial-only, and joint objectives. C Schematic of competing dynamics between discriminative pressure and spatial smoothness. topographic hallmark of primate MT is the “pinwheel” structure [8, 1]. As visualized i… view at source ↗

**Figure 4.** Figure 4: Spontaneous emergence and quantitative analysis of MT-like pinwheel structures. A Direction map at α = 0.5 with positive (white dot) and negative (black dot) pinwheel centers. B Pinwheel density versus α; green dashed line marks the macaque MT baseline (∼ 4.9 mm−2 ) estimated from [1, 16]. Gray line: α = 0.5. C Histogram of preferred direction differences (∆) between adjacent units at α = 0.5. Functional … view at source ↗

**Figure 5.** Figure 5: Population Tuning Curves (PTCs) across spatial constraints (α), with annotated primary-peak FWHM values. I Appendix: Direction Maps across Varying α α=0 α=0.1 α=0.25 α=0.5 α=1.25 α=2.5 0.5mm [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Topological evolution of MT direction maps across α in a 2 mm×2 mm MT-like sub-region. White/black dots denote positive/negative pinwheel charges [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience. While recent computational frameworks like the Topographic Deep Artificial Neural Network (TDANN) have successfully modeled spatial organization in the ventral stream, the computational origins of the dorsal stream's distinct topographies, such as direction-selective maps in the middle temporal (MT) area, remain largely unresolved. In this work, we present a spatiotemporal TDANN to investigate whether MT topography is governed by the same universal principles. By training a 3D ResNet on naturalistic videos via a Momentum Contrast (MoCo) self-supervised paradigm alongside a biologically inspired spatial loss, we demonstrate the spontaneous emergence of brain-like direction maps and topological pinwheel structures. Crucially, we reveal that MT tuning properties, characterized by strong direction selectivity paired with a residual axial component, arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines, including direction selectivity index, circular variance, and pinwheel density. These findings unify the computational origins of the ventral and dorsal streams, establishing a general mechanism for cortical self-organization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MT direction maps emerge from MoCo plus spatial loss in this TDANN, but the loss term needs checking to confirm it's not pre-shaping the topology.

read the letter

The one or two things to know: this work trains a 3D ResNet on videos using Momentum Contrast and a spatial regularization term, resulting in direction-selective maps and pinwheels that match macaque MT data on key metrics. It extends topographic modeling to the dorsal stream. What the paper does well is demonstrate the emergence without direct supervision for motion direction. The matches to biological baselines like direction selectivity index and pinwheel density are concrete and allow direct comparison. The soft spots center on the spatial loss. The abstract describes it as biologically inspired and part of a strict trade-off, but without the functional form or ablation results, it's possible the regularization is tuned to favor the observed structures. That would make the maps less of a spontaneous outcome from video data and contrastive pressure alone. The concern that the loss might be the dominant driver seems reasonable based on what's provided. This paper is for researchers modeling cortical self-organization with deep networks. Readers familiar with TDANN or contrastive learning in neuroscience will get the most from it and can evaluate the unification claim. It deserves serious peer review. The core finding is new enough and the quantitative matches strong enough to justify referee time, though they will likely request more on the loss details and robustness.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a spatiotemporal extension of the Topographic Deep Artificial Neural Network (TDANN) framework. It trains a 3D ResNet architecture on naturalistic videos using the Momentum Contrast (MoCo) self-supervised learning paradigm in conjunction with a biologically inspired spatial loss. The central claim is that direction maps and pinwheel structures emerge spontaneously in the model's representations, quantitatively matching physiological properties of macaque area MT such as direction selectivity index, circular variance, and pinwheel density. The authors conclude that these features result from an optimization trade-off between discriminative and spatial regularization pressures, providing a unified account for self-organization in ventral and dorsal visual streams.

Significance. If the quantitative matches are robust and not due to post-hoc tuning, this would represent a significant advance in computational neuroscience by extending topographic models to the dorsal stream and demonstrating how self-supervised learning on video data can give rise to MT-like topography. It builds on prior TDANN work for V1/V2/V4 and offers a potential general mechanism for cortical map formation. The use of contrastive learning without explicit labels is a strength, as is the attempt to match multiple biological metrics.

major comments (3)

Abstract: The claim that MT tuning properties 'arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization' is load-bearing for the central thesis, yet the abstract (and by extension the methods summary) provides no explicit equation or functional form for the spatial loss term. Without this, it is impossible to evaluate whether the loss implicitly favors pinwheel density or smoothness independently of the MoCo objective on video data.
Methods/Results: The reported quantitative matches to macaque MT baselines (DSI, circular variance, pinwheel density) are presented without details on hyperparameter search procedures, data exclusion criteria, or statistical controls. This omission directly affects verifiability of the claim that the matches are robust rather than sensitive to specific implementation choices.
Results: No ablation experiments are described that compare the full model against a version using only the spatial regularization term (or only MoCo). Such controls are required to establish that the emergence of direction maps and pinwheels is due to the described trade-off rather than the spatial loss alone.

minor comments (2)

Abstract: The phrase 'spatiotemporal TDANN' is used without a concise definition of how the 3D ResNet implementation differs from prior 2D TDANN models in terms of architecture or loss application.
Figures: Legends should explicitly state the numerical biological baseline values (e.g., mean pinwheel density per mm²) alongside model outputs for direct visual comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of the work's significance. We address each major comment point by point below and will revise the manuscript accordingly to enhance clarity and verifiability.

read point-by-point responses

Referee: Abstract: The claim that MT tuning properties 'arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization' is load-bearing for the central thesis, yet the abstract (and by extension the methods summary) provides no explicit equation or functional form for the spatial loss term. Without this, it is impossible to evaluate whether the loss implicitly favors pinwheel density or smoothness independently of the MoCo objective on video data.

Authors: We agree that the abstract would benefit from an explicit reference to the spatial loss form to support evaluation of the trade-off. In the revised manuscript, we will incorporate a concise description of the spatial loss functional form into the abstract and methods summary, clarifying its role alongside the MoCo objective without implying independent favoritism toward specific topographic features. revision: yes
Referee: Methods/Results: The reported quantitative matches to macaque MT baselines (DSI, circular variance, pinwheel density) are presented without details on hyperparameter search procedures, data exclusion criteria, or statistical controls. This omission directly affects verifiability of the claim that the matches are robust rather than sensitive to specific implementation choices.

Authors: We acknowledge that additional methodological details are necessary for full verifiability. In the revision, we will expand the Methods and Results sections to include comprehensive information on hyperparameter search procedures, data exclusion criteria, and statistical controls used in the quantitative comparisons to macaque MT data. revision: yes
Referee: Results: No ablation experiments are described that compare the full model against a version using only the spatial regularization term (or only MoCo). Such controls are required to establish that the emergence of direction maps and pinwheels is due to the described trade-off rather than the spatial loss alone.

Authors: We agree that ablation controls are essential to substantiate the optimization trade-off. In the revised manuscript, we will add ablation experiments training variants with only the MoCo objective and only the spatial regularization term, to demonstrate that direction maps and pinwheels arise specifically from their combination rather than either component in isolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard contrastive training plus regularization without reduction to inputs by construction.

full rationale

The paper's central claim is that direction-selective maps and pinwheels emerge spontaneously when a 3D ResNet is trained on naturalistic videos using Momentum Contrast (MoCo) self-supervision together with a biologically inspired spatial loss. This chain is self-contained: the contrastive objective is a standard, externally defined loss (InfoNCE-style), the spatial term is described as biologically inspired rather than reverse-engineered from the target statistics, and the reported matches to macaque DSI, circular variance, and pinwheel density are presented as post-training measurements rather than fitted parameters renamed as predictions. No equations in the abstract reduce the output topography to the input loss by algebraic identity, and no self-citation chain is invoked to forbid alternatives. The result therefore does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no equations, methods sections, or supplementary details are available to enumerate specific free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5517 in / 1302 out tokens · 52758 ms · 2026-05-13T04:40:19.821433+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes
By training a 3D ResNet on naturalistic videos via a Momentum Contrast (MoCo) self-supervised paradigm alongside a biologically inspired spatial loss, we demonstrate the spontaneous emergence of brain-like direction maps and topological pinwheel structures.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
MT tuning properties... arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

Proceedings of the Royal So- ciety of London

Optical imaging reveals the functional architecture of neurons processing shape and motion in owl monkey area MT. Proceedings of the Royal So- ciety of London. Series B: Biological Sciences258(1352), 109–119 (1994). https://doi.org/10.1098/rspb.1994.0150

work page doi:10.1098/rspb.1994.0150 1994
[2]

Journal of the Optical Society of America A2(2), 284 (1985)

Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A2(2), 284 (1985). https://doi.org/10.1364/JOSAA.2.000284

work page doi:10.1364/josaa.2.000284 1985
[3]

Journal of Neurophysiology52(6), 1106–1130 (1984)

Albright, T.D.: Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neurophysiology52(6), 1106–1130 (1984). https://doi.org/10.1152/jn.1984.52.6.1106

work page doi:10.1152/jn.1984.52.6.1106 1984
[4]

The Journal of Neuroscience12(12), 4745–4765 (1992)

Britten, K., Shadlen, M., Newsome, W., Movshon, J.: The analysis of visual mo- tion: A comparison of neuronal and psychophysical performance. The Journal of Neuroscience12(12), 4745–4765 (1992). https://doi.org/10.1523/JNEUROSCI.12- 12-04745.1992

work page doi:10.1523/jneurosci.12- 1992
[5]

A Simple Framework for Contrastive Learning of Visual Representations

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Frame- work for Contrastive Learning of Visual Representations (2020). https://doi.org/10.48550/ARXIV.2002.05709

work page internal anchor Pith review doi:10.48550/arxiv.2002.05709 2020
[6]

Neuron34(3), 341–347 (2002)

Chklovskii, D.B., Schikorski, T., Stevens, C.F.: Wiring Optimization in Cor- tical Circuits. Neuron34(3), 341–347 (2002). https://doi.org/10.1016/S0896- 6273(02)00679-7

work page doi:10.1016/s0896- 2002
[7]

Proceedings of the National Academy of Sciences89(20), 9666–9670 (1992)

Dacey, D.M., Petersen, M.R.: Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of Sciences89(20), 9666–9670 (1992). https://doi.org/10.1073/pnas.89.20.9666 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 11

work page doi:10.1073/pnas.89.20.9666 1992
[8]

The Journal of Neuroscience23(9), 3881– 3898 (2003)

Diogo, A.C.M., Soares, J.G.M., Koulakov, A., Albright, T.D., Gattass, R.: Electro- physiological Imaging of Functional Architecture in the Cortical Middle Temporal Visual Area ofCebus apellaMonkey. The Journal of Neuroscience23(9), 3881– 3898 (2003). https://doi.org/10.1523/JNEUROSCI.23-09-03881.2003

work page doi:10.1523/jneurosci.23-09-03881.2003 2003
[9]

Nature343(6259), 644–647 (1990)

Durbin, R., Mitchison, G.: A dimension reduction framework for understanding cortical maps. Nature343(6259), 644–647 (1990). https://doi.org/10.1038/343644a0

work page doi:10.1038/343644a0 1990
[10]

Science 373(6553), eabd0830 (2021)

Ge, X., Zhang, K., Gribizis, A., Hamodi, A.S., Sabino, A.M., Crair, M.C.: Reti- nal waves prime visual motion detection by simulating future optic flow. Science 373(6553), eabd0830 (2021). https://doi.org/10.1126/science.abd0830

work page doi:10.1126/science.abd0830 2021
[11]

Grill, F

Grill, J.B., Strub, F., Altch´ e, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised Learn- ing (2020). https://doi.org/10.48550/ARXIV.2006.07733

work page doi:10.48550/arxiv.2006.07733 2020
[12]

https://doi.org/10.48550/ARXIV.1711.09577

Hara, K., Kataoka, H., Satoh, Y.: Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? (2017). https://doi.org/10.48550/ARXIV.1711.09577

work page doi:10.48550/arxiv.1711.09577 2017
[13]

In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsuper- vised Visual Representation Learning. In: 2020 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 9726–9735. IEEE, Seattle, WA, USA (2020). https://doi.org/10.1109/CVPR42600.2020.00975

work page doi:10.1109/cvpr42600.2020.00975 2020
[14]

The Journal of Physiology160(1), 106–154 (1962)

Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology160(1), 106–154 (1962). https://doi.org/10.1113/jphysiol.1962.sp006837

work page doi:10.1113/jphysiol.1962.sp006837 1962
[15]

Journal of Cognitive Neuroscience4(4), 323–336 (1992)

Jacobs, R.A., Jordan, M.I.: Computational Consequences of a Bias toward Short Connections. Journal of Cognitive Neuroscience4(4), 323–336 (1992). https://doi.org/10.1162/jocn.1992.4.4.323

work page doi:10.1162/jocn.1992.4.4.323 1992
[16]

Science 330(6007), 1113–1116 (2010)

Kaschube, M., Schnabel, M., L¨ owel, S., Coppola, D.M., White, L.E., Wolf, F.: Universality in the Evolution of Orientation Columns in the Visual Cortex. Science 330(6007), 1113–1116 (2010). https://doi.org/10.1126/science.1194869

work page doi:10.1126/science.1194869 2010
[17]

Bio- logical Cybernetics43(1), 59–69 (1982)

Kohonen, T.: Self-organized formation of topologically correct feature maps. Bio- logical Cybernetics43(1), 59–69 (1982). https://doi.org/10.1007/BF00337288

work page doi:10.1007/bf00337288 1982
[18]

Frontiers in Compu- tational Neuroscience13, 20 (2019)

Koprinkova-Hristova, P.D., Bocheva, N., Nedelcheva, S., Stefanova, M.: Spike Tim- ing Neural Model of Motion Perception and Decision Making. Frontiers in Compu- tational Neuroscience13, 20 (2019). https://doi.org/10.3389/fncom.2019.00020

work page doi:10.3389/fncom.2019.00020 2019
[19]

Journal of Physics C: Solid State Physics6(7), 1181–1203 (1973)

Kosterlitz, J.M., Thouless, D.J.: Ordering, metastability and phase transitions in two-dimensional systems. Journal of Physics C: Solid State Physics6(7), 1181–1203 (1973). https://doi.org/10.1088/0022-3719/6/7/010

work page doi:10.1088/0022-3719/6/7/010 1973
[20]

In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC)

LeCun, Y.: 1.1 Deep Learning Hardware: Past, Present, and Future. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC). pp. 12–19. IEEE, San Francisco, CA, USA (2019). https://doi.org/10.1109/ISSCC.2019.8662396

work page doi:10.1109/isscc.2019.8662396 2019
[21]

Proceedings of the National Academy of Sciences83(19), 7508–7512 (1986)

Linsker, R.: From basic network principles to neural architecture: Emergence of spatial-opponent cells. Proceedings of the National Academy of Sciences83(19), 7508–7512 (1986). https://doi.org/10.1073/pnas.83.19.7508

work page doi:10.1073/pnas.83.19.7508 1986
[22]

Neuron p

Margalit, E., Lee, H., Finzi, D., DiCarlo, J.J., Grill-Spector, K., Yamins, D.L.: A unifying framework for functional organization in early and higher ventral visual cortex. Neuron p. S0896627324002794 (2024). https://doi.org/10.1016/j.neuron.2024.04.018

work page doi:10.1016/j.neuron.2024.04.018 2024
[23]

Maunsell, J.H., Van Essen, D.C.: Functional properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direc- 12 Z. Gu et al. tion, speed, and orientation. Journal of Neurophysiology49(5), 1127–1147 (1983). https://doi.org/10.1152/jn.1983.49.5.1127

work page doi:10.1152/jn.1983.49.5.1127 1983
[24]

eneuro8(1), ENEURO.0383–20.2020 (2021)

Nakhla, N., Korkian, Y., Krause, M.R., Pack, C.C.: Neural Selectivity for Vi- sual Motion in Macaque Area V3A. eneuro8(1), ENEURO.0383–20.2020 (2021). https://doi.org/10.1523/ENEURO.0383-20.2020

work page doi:10.1523/eneuro.0383-20.2020 2020
[25]

https://doi.org/10.48550/ARXIV.1807.00053

Nayebi, A., Bear, D., Kubilius, J., Kar, K., Ganguli, S., Sussillo, D., DiCarlo, J.J., Yamins, D.L.K.: Task-Driven Convolutional Recurrent Models of the Visual System (2018). https://doi.org/10.48550/ARXIV.1807.00053

work page doi:10.48550/arxiv.1807.00053 2018
[26]

Proceedings of the National Academy of Sciences 87(21), 8345–8349 (1990)

Obermayer, K., Ritter, H., Schulten, K.: A principle for the formation of the spatial structure of cortical feature maps. Proceedings of the National Academy of Sciences 87(21), 8345–8349 (1990). https://doi.org/10.1073/pnas.87.21.8345

work page doi:10.1073/pnas.87.21.8345 1990
[27]

Journal of Mathematical Biology15(3), 267–273 (1982)

Oja, E.: Simplified neuron model as a principal component an- alyzer. Journal of Mathematical Biology15(3), 267–273 (1982). https://doi.org/10.1007/BF00275687

work page doi:10.1007/bf00275687 1982
[28]

https://doi.org/10.48550/ARXIV.2103.05905

Pan, T., Song, Y., Yang, T., Jiang, W., Liu, W.: VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples (2021). https://doi.org/10.48550/ARXIV.2103.05905

work page doi:10.48550/arxiv.2103.05905 2021
[29]

Adabins: Depth estimation using adap- tive bins

Qian, R., Meng, T., Gong, B., Yang, M.H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal Contrastive Video Representation Learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 6960–6970. IEEE, Nashville, TN, USA (2021). https://doi.org/10.1109/CVPR46437.2021.00689

work page doi:10.1109/cvpr46437.2021.00689 2021
[30]

Nature Neuroscience 2(1), 79–87 (1999)

Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience 2(1), 79–87 (1999). https://doi.org/10.1038/4580

work page doi:10.1038/4580 1999
[31]

NeuroImage128, 63–73 (2016)

Ribot, J., Romagnoni, A., Milleret, C., Bennequin, D., Touboul, J.: Pinwheel- dipole configuration in cat early visual cortex. NeuroImage128, 63–73 (2016). https://doi.org/10.1016/j.neuroimage.2015.12.022

work page doi:10.1016/j.neuroimage.2015.12.022 2016
[32]

Nature Neuroscience9(11), 1421–1431 (2006)

Rust, N.C., Mante, V., Simoncelli, E.P., Movshon, J.A.: How MT cells ana- lyze the motion of visual patterns. Nature Neuroscience9(11), 1421–1431 (2006). https://doi.org/10.1038/nn1786

work page doi:10.1038/nn1786 2006
[33]

In: Palm, G., Aertsen, A

Shaw, G.L.: Donald Hebb: The Organization of Behavior. In: Palm, G., Aertsen, A. (eds.) Brain Theory, pp. 231–233. Springer Berlin Heidelberg, Berlin, Heidelberg (1986). https://doi.org/10.1007/978-3-642-70911-1˙15

work page doi:10.1007/978-3-642-70911-1 1986
[34]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Soomro, K., Zamir, A.R., Shah, M.: UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild (2012). https://doi.org/10.48550/ARXIV.1212.0402

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1212.0402 2012
[35]

Proceed- ings of the Royal Society of London

Swindale, N.V.: A model for the formation of ocular dominance stripes. Proceed- ings of the Royal Society of London. Series B. Biological Sciences208(1171), 243– 264 (1980). https://doi.org/10.1098/rspb.1980.0051

work page doi:10.1098/rspb.1980.0051 1980
[36]

Cerebral Cortex30(6), 3483–3517 (2020)

Vanni, S., Hokkanen, H., Werner, F., Angelucci, A.: Anatomy and Phys- iology of Macaque Visual Cortical Areas V1, V2, and V5/MT: Bases for Biologically Realistic Models. Cerebral Cortex30(6), 3483–3517 (2020). https://doi.org/10.1093/cercor/bhz322

work page doi:10.1093/cercor/bhz322 2020
[37]

https://doi.org/10.48550/ARXIV.2005.10242

Wang, T., Isola, P.: Understanding Contrastive Representation Learn- ing through Alignment and Uniformity on the Hypersphere (2020). https://doi.org/10.48550/ARXIV.2005.10242

work page doi:10.48550/arxiv.2005.10242 2020
[38]

https://doi.org/10.48550/ARXIV.2105.15134 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 13

Wen, Z., Li, Y.: Toward Understanding the Feature Learn- ing Process of Self-supervised Contrastive Learning (2021). https://doi.org/10.48550/ARXIV.2105.15134 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 13

work page doi:10.48550/arxiv.2105.15134 2021
[39]

https://doi.org/10.48550/ARXIV.1805.01978

Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised Feature Learn- ing via Non-Parametric Instance-level Discrimination (2018). https://doi.org/10.48550/ARXIV.1805.01978

work page doi:10.48550/arxiv.1805.01978 2018
[40]

Nature Neuroscience19(3), 356–365 (2016)

Yamins, D.L.K., DiCarlo, J.J.: Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience19(3), 356–365 (2016). https://doi.org/10.1038/nn.4244

work page doi:10.1038/nn.4244 2016
[41]

Proceedings of the National Academy of Sciences118(3), e2014196118 (2021)

Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M.C., DiCarlo, J.J., Yamins, D.L.K.: Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences118(3), e2014196118 (2021). https://doi.org/10.1073/pnas.2014196118 A Appendix: Position Initialization The position initialization algorithm establishes ...

work page doi:10.1073/pnas.2014196118 2021