pith. machine review for the scientific record. sign in

arxiv: 2605.11718 · v1 · submitted 2026-05-12 · 🧬 q-bio.NC · cs.AI· cs.NE

Recognition: 2 theorem links

· Lean Theorem

Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization

Chang Liu, Dahui Wang, Jie Su, Molan Li, Tianyi Qian, Zhaotian Gu

Pith reviewed 2026-05-13 04:40 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.NE
keywords MT direction mapsself-supervised learningtopographic organizationpinwheel structurescontrastive optimization3D ResNetvisual cortexdorsal stream
0
0 comments X

The pith

A 3D ResNet trained on videos with contrastive learning and spatial regularization spontaneously forms direction maps and pinwheels matching primate MT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that direction-selective maps and pinwheel structures characteristic of area MT emerge when a 3D ResNet is trained on naturalistic videos using Momentum Contrast self-supervised learning plus a spatial loss that encourages nearby units to develop similar tuning. The resulting representations reproduce macaque physiological statistics including direction selectivity index, circular variance, and pinwheel density, and they arise specifically from the trade-off between the contrastive objective and the spatial term. A sympathetic reader would care because the result suggests that the dorsal stream follows the same general self-organization rules previously demonstrated for ventral-stream areas, rather than requiring separate biological machinery for motion topography.

Core claim

By training a 3D ResNet on naturalistic videos via a Momentum Contrast self-supervised paradigm alongside a biologically inspired spatial loss, brain-like direction maps and topological pinwheel structures emerge spontaneously. MT tuning properties with strong direction selectivity paired with a residual axial component arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines including direction selectivity index, circular variance, and pinwheel density, unifying the computational origins of the ventral and dorsal streams under a single general

What carries the argument

Spatiotemporal topographic deep artificial neural network (TDANN) implemented as a 3D ResNet trained with Momentum Contrast contrastive loss plus spatial regularization that penalizes differences between nearby neurons.

Load-bearing premise

The particular combination of MoCo contrastive loss and the chosen spatial regularization term on the 3D ResNet architecture suffices to produce MT-like direction topography without further biological constraints or post-hoc tuning.

What would settle it

Training the identical 3D ResNet with the contrastive loss but without the spatial regularization term and then finding that the resulting direction selectivity index and pinwheel density fall outside the ranges measured in macaque MT would falsify the claim that this optimization trade-off produces the maps.

Figures

Figures reproduced from arXiv: 2605.11718 by Chang Liu, Dahui Wang, Jie Su, Molan Li, Tianyi Qian, Zhaotian Gu.

Figure 1
Figure 1. Figure 1: Spatiotemporal TDANN overview. A 3D ResNet-18 backbone with MoCo and spatial losses. B Simulated cortical sheet and spatial loss that promotes similar re￾sponses among nearby units [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Biological validation and parameter sensitivity of emergent tuning properties in the MT-like layer. A 16 drifting-grating directions used for probing. B Population tuning curve (PTC) at α = 0.5. C-F Sensitivity to α: selective-unit fraction (C), median bandwidth(D), median CV(E), and median DSI(F); green dashed lines indicate macaque MT baselines [24, 3, 4]. Gray line: α = 0.5. emerging 180◦ residual compo… view at source ↗
Figure 3
Figure 3. Figure 3: Mechanistic origins of emergent tuning properties through optimization trade￾offs. A FWHM and CV-derived bandwidth diverge in the optimal model. B Decompo￾sition of tuning under contrastive-only, spatial-only, and joint objectives. C Schematic of competing dynamics between discriminative pressure and spatial smoothness. topographic hallmark of primate MT is the “pinwheel” structure [8, 1]. As visu￾alized i… view at source ↗
Figure 4
Figure 4. Figure 4: Spontaneous emergence and quantitative analysis of MT-like pinwheel struc￾tures. A Direction map at α = 0.5 with positive (white dot) and negative (black dot) pinwheel centers. B Pinwheel density versus α; green dashed line marks the macaque MT baseline (∼ 4.9 mm−2 ) estimated from [1, 16]. Gray line: α = 0.5. C Histogram of preferred direction differences (∆) between adjacent units at α = 0.5. Functional … view at source ↗
Figure 5
Figure 5. Figure 5: Population Tuning Curves (PTCs) across spatial constraints (α), with anno￾tated primary-peak FWHM values. I Appendix: Direction Maps across Varying α α=0 α=0.1 α=0.25 α=0.5 α=1.25 α=2.5 0.5mm [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Topological evolution of MT direction maps across α in a 2 mm×2 mm MT-like sub-region. White/black dots denote positive/negative pinwheel charges [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience. While recent computational frameworks like the Topographic Deep Artificial Neural Network (TDANN) have successfully modeled spatial organization in the ventral stream, the computational origins of the dorsal stream's distinct topographies, such as direction-selective maps in the middle temporal (MT) area, remain largely unresolved. In this work, we present a spatiotemporal TDANN to investigate whether MT topography is governed by the same universal principles. By training a 3D ResNet on naturalistic videos via a Momentum Contrast (MoCo) self-supervised paradigm alongside a biologically inspired spatial loss, we demonstrate the spontaneous emergence of brain-like direction maps and topological pinwheel structures. Crucially, we reveal that MT tuning properties, characterized by strong direction selectivity paired with a residual axial component, arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines, including direction selectivity index, circular variance, and pinwheel density. These findings unify the computational origins of the ventral and dorsal streams, establishing a general mechanism for cortical self-organization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a spatiotemporal extension of the Topographic Deep Artificial Neural Network (TDANN) framework. It trains a 3D ResNet architecture on naturalistic videos using the Momentum Contrast (MoCo) self-supervised learning paradigm in conjunction with a biologically inspired spatial loss. The central claim is that direction maps and pinwheel structures emerge spontaneously in the model's representations, quantitatively matching physiological properties of macaque area MT such as direction selectivity index, circular variance, and pinwheel density. The authors conclude that these features result from an optimization trade-off between discriminative and spatial regularization pressures, providing a unified account for self-organization in ventral and dorsal visual streams.

Significance. If the quantitative matches are robust and not due to post-hoc tuning, this would represent a significant advance in computational neuroscience by extending topographic models to the dorsal stream and demonstrating how self-supervised learning on video data can give rise to MT-like topography. It builds on prior TDANN work for V1/V2/V4 and offers a potential general mechanism for cortical map formation. The use of contrastive learning without explicit labels is a strength, as is the attempt to match multiple biological metrics.

major comments (3)
  1. Abstract: The claim that MT tuning properties 'arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization' is load-bearing for the central thesis, yet the abstract (and by extension the methods summary) provides no explicit equation or functional form for the spatial loss term. Without this, it is impossible to evaluate whether the loss implicitly favors pinwheel density or smoothness independently of the MoCo objective on video data.
  2. Methods/Results: The reported quantitative matches to macaque MT baselines (DSI, circular variance, pinwheel density) are presented without details on hyperparameter search procedures, data exclusion criteria, or statistical controls. This omission directly affects verifiability of the claim that the matches are robust rather than sensitive to specific implementation choices.
  3. Results: No ablation experiments are described that compare the full model against a version using only the spatial regularization term (or only MoCo). Such controls are required to establish that the emergence of direction maps and pinwheels is due to the described trade-off rather than the spatial loss alone.
minor comments (2)
  1. Abstract: The phrase 'spatiotemporal TDANN' is used without a concise definition of how the 3D ResNet implementation differs from prior 2D TDANN models in terms of architecture or loss application.
  2. Figures: Legends should explicitly state the numerical biological baseline values (e.g., mean pinwheel density per mm²) alongside model outputs for direct visual comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of the work's significance. We address each major comment point by point below and will revise the manuscript accordingly to enhance clarity and verifiability.

read point-by-point responses
  1. Referee: Abstract: The claim that MT tuning properties 'arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization' is load-bearing for the central thesis, yet the abstract (and by extension the methods summary) provides no explicit equation or functional form for the spatial loss term. Without this, it is impossible to evaluate whether the loss implicitly favors pinwheel density or smoothness independently of the MoCo objective on video data.

    Authors: We agree that the abstract would benefit from an explicit reference to the spatial loss form to support evaluation of the trade-off. In the revised manuscript, we will incorporate a concise description of the spatial loss functional form into the abstract and methods summary, clarifying its role alongside the MoCo objective without implying independent favoritism toward specific topographic features. revision: yes

  2. Referee: Methods/Results: The reported quantitative matches to macaque MT baselines (DSI, circular variance, pinwheel density) are presented without details on hyperparameter search procedures, data exclusion criteria, or statistical controls. This omission directly affects verifiability of the claim that the matches are robust rather than sensitive to specific implementation choices.

    Authors: We acknowledge that additional methodological details are necessary for full verifiability. In the revision, we will expand the Methods and Results sections to include comprehensive information on hyperparameter search procedures, data exclusion criteria, and statistical controls used in the quantitative comparisons to macaque MT data. revision: yes

  3. Referee: Results: No ablation experiments are described that compare the full model against a version using only the spatial regularization term (or only MoCo). Such controls are required to establish that the emergence of direction maps and pinwheels is due to the described trade-off rather than the spatial loss alone.

    Authors: We agree that ablation controls are essential to substantiate the optimization trade-off. In the revised manuscript, we will add ablation experiments training variants with only the MoCo objective and only the spatial regularization term, to demonstrate that direction maps and pinwheels arise specifically from their combination rather than either component in isolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard contrastive training plus regularization without reduction to inputs by construction.

full rationale

The paper's central claim is that direction-selective maps and pinwheels emerge spontaneously when a 3D ResNet is trained on naturalistic videos using Momentum Contrast (MoCo) self-supervision together with a biologically inspired spatial loss. This chain is self-contained: the contrastive objective is a standard, externally defined loss (InfoNCE-style), the spatial term is described as biologically inspired rather than reverse-engineered from the target statistics, and the reported matches to macaque DSI, circular variance, and pinwheel density are presented as post-training measurements rather than fitted parameters renamed as predictions. No equations in the abstract reduce the output topography to the input loss by algebraic identity, and no self-citation chain is invoked to forbid alternatives. The result therefore does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no equations, methods sections, or supplementary details are available to enumerate specific free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5517 in / 1302 out tokens · 52758 ms · 2026-05-13T04:40:19.821433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

  1. [1]

    Proceedings of the Royal So- ciety of London

    Optical imaging reveals the functional architecture of neurons processing shape and motion in owl monkey area MT. Proceedings of the Royal So- ciety of London. Series B: Biological Sciences258(1352), 109–119 (1994). https://doi.org/10.1098/rspb.1994.0150

  2. [2]

    Journal of the Optical Society of America A2(2), 284 (1985)

    Adelson, E.H., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A2(2), 284 (1985). https://doi.org/10.1364/JOSAA.2.000284

  3. [3]

    Journal of Neurophysiology52(6), 1106–1130 (1984)

    Albright, T.D.: Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neurophysiology52(6), 1106–1130 (1984). https://doi.org/10.1152/jn.1984.52.6.1106

  4. [4]

    The Journal of Neuroscience12(12), 4745–4765 (1992)

    Britten, K., Shadlen, M., Newsome, W., Movshon, J.: The analysis of visual mo- tion: A comparison of neuronal and psychophysical performance. The Journal of Neuroscience12(12), 4745–4765 (1992). https://doi.org/10.1523/JNEUROSCI.12- 12-04745.1992

  5. [5]

    A Simple Framework for Contrastive Learning of Visual Representations

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Frame- work for Contrastive Learning of Visual Representations (2020). https://doi.org/10.48550/ARXIV.2002.05709

  6. [6]

    Neuron34(3), 341–347 (2002)

    Chklovskii, D.B., Schikorski, T., Stevens, C.F.: Wiring Optimization in Cor- tical Circuits. Neuron34(3), 341–347 (2002). https://doi.org/10.1016/S0896- 6273(02)00679-7

  7. [7]

    Proceedings of the National Academy of Sciences89(20), 9666–9670 (1992)

    Dacey, D.M., Petersen, M.R.: Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of Sciences89(20), 9666–9670 (1992). https://doi.org/10.1073/pnas.89.20.9666 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 11

  8. [8]

    The Journal of Neuroscience23(9), 3881– 3898 (2003)

    Diogo, A.C.M., Soares, J.G.M., Koulakov, A., Albright, T.D., Gattass, R.: Electro- physiological Imaging of Functional Architecture in the Cortical Middle Temporal Visual Area ofCebus apellaMonkey. The Journal of Neuroscience23(9), 3881– 3898 (2003). https://doi.org/10.1523/JNEUROSCI.23-09-03881.2003

  9. [9]

    Nature343(6259), 644–647 (1990)

    Durbin, R., Mitchison, G.: A dimension reduction framework for understanding cortical maps. Nature343(6259), 644–647 (1990). https://doi.org/10.1038/343644a0

  10. [10]

    Science 373(6553), eabd0830 (2021)

    Ge, X., Zhang, K., Gribizis, A., Hamodi, A.S., Sabino, A.M., Crair, M.C.: Reti- nal waves prime visual motion detection by simulating future optic flow. Science 373(6553), eabd0830 (2021). https://doi.org/10.1126/science.abd0830

  11. [11]

    Grill, F

    Grill, J.B., Strub, F., Altch´ e, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised Learn- ing (2020). https://doi.org/10.48550/ARXIV.2006.07733

  12. [12]

    https://doi.org/10.48550/ARXIV.1711.09577

    Hara, K., Kataoka, H., Satoh, Y.: Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? (2017). https://doi.org/10.48550/ARXIV.1711.09577

  13. [13]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsuper- vised Visual Representation Learning. In: 2020 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 9726–9735. IEEE, Seattle, WA, USA (2020). https://doi.org/10.1109/CVPR42600.2020.00975

  14. [14]

    The Journal of Physiology160(1), 106–154 (1962)

    Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology160(1), 106–154 (1962). https://doi.org/10.1113/jphysiol.1962.sp006837

  15. [15]

    Journal of Cognitive Neuroscience4(4), 323–336 (1992)

    Jacobs, R.A., Jordan, M.I.: Computational Consequences of a Bias toward Short Connections. Journal of Cognitive Neuroscience4(4), 323–336 (1992). https://doi.org/10.1162/jocn.1992.4.4.323

  16. [16]

    Science 330(6007), 1113–1116 (2010)

    Kaschube, M., Schnabel, M., L¨ owel, S., Coppola, D.M., White, L.E., Wolf, F.: Universality in the Evolution of Orientation Columns in the Visual Cortex. Science 330(6007), 1113–1116 (2010). https://doi.org/10.1126/science.1194869

  17. [17]

    Bio- logical Cybernetics43(1), 59–69 (1982)

    Kohonen, T.: Self-organized formation of topologically correct feature maps. Bio- logical Cybernetics43(1), 59–69 (1982). https://doi.org/10.1007/BF00337288

  18. [18]

    Frontiers in Compu- tational Neuroscience13, 20 (2019)

    Koprinkova-Hristova, P.D., Bocheva, N., Nedelcheva, S., Stefanova, M.: Spike Tim- ing Neural Model of Motion Perception and Decision Making. Frontiers in Compu- tational Neuroscience13, 20 (2019). https://doi.org/10.3389/fncom.2019.00020

  19. [19]

    Journal of Physics C: Solid State Physics6(7), 1181–1203 (1973)

    Kosterlitz, J.M., Thouless, D.J.: Ordering, metastability and phase transitions in two-dimensional systems. Journal of Physics C: Solid State Physics6(7), 1181–1203 (1973). https://doi.org/10.1088/0022-3719/6/7/010

  20. [20]

    In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC)

    LeCun, Y.: 1.1 Deep Learning Hardware: Past, Present, and Future. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC). pp. 12–19. IEEE, San Francisco, CA, USA (2019). https://doi.org/10.1109/ISSCC.2019.8662396

  21. [21]

    Proceedings of the National Academy of Sciences83(19), 7508–7512 (1986)

    Linsker, R.: From basic network principles to neural architecture: Emergence of spatial-opponent cells. Proceedings of the National Academy of Sciences83(19), 7508–7512 (1986). https://doi.org/10.1073/pnas.83.19.7508

  22. [22]

    Neuron p

    Margalit, E., Lee, H., Finzi, D., DiCarlo, J.J., Grill-Spector, K., Yamins, D.L.: A unifying framework for functional organization in early and higher ventral visual cortex. Neuron p. S0896627324002794 (2024). https://doi.org/10.1016/j.neuron.2024.04.018

  23. [23]

    Maunsell, J.H., Van Essen, D.C.: Functional properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direc- 12 Z. Gu et al. tion, speed, and orientation. Journal of Neurophysiology49(5), 1127–1147 (1983). https://doi.org/10.1152/jn.1983.49.5.1127

  24. [24]

    eneuro8(1), ENEURO.0383–20.2020 (2021)

    Nakhla, N., Korkian, Y., Krause, M.R., Pack, C.C.: Neural Selectivity for Vi- sual Motion in Macaque Area V3A. eneuro8(1), ENEURO.0383–20.2020 (2021). https://doi.org/10.1523/ENEURO.0383-20.2020

  25. [25]

    https://doi.org/10.48550/ARXIV.1807.00053

    Nayebi, A., Bear, D., Kubilius, J., Kar, K., Ganguli, S., Sussillo, D., DiCarlo, J.J., Yamins, D.L.K.: Task-Driven Convolutional Recurrent Models of the Visual System (2018). https://doi.org/10.48550/ARXIV.1807.00053

  26. [26]

    Proceedings of the National Academy of Sciences 87(21), 8345–8349 (1990)

    Obermayer, K., Ritter, H., Schulten, K.: A principle for the formation of the spatial structure of cortical feature maps. Proceedings of the National Academy of Sciences 87(21), 8345–8349 (1990). https://doi.org/10.1073/pnas.87.21.8345

  27. [27]

    Journal of Mathematical Biology15(3), 267–273 (1982)

    Oja, E.: Simplified neuron model as a principal component an- alyzer. Journal of Mathematical Biology15(3), 267–273 (1982). https://doi.org/10.1007/BF00275687

  28. [28]

    https://doi.org/10.48550/ARXIV.2103.05905

    Pan, T., Song, Y., Yang, T., Jiang, W., Liu, W.: VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples (2021). https://doi.org/10.48550/ARXIV.2103.05905

  29. [29]

    Adabins: Depth estimation using adap- tive bins

    Qian, R., Meng, T., Gong, B., Yang, M.H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal Contrastive Video Representation Learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 6960–6970. IEEE, Nashville, TN, USA (2021). https://doi.org/10.1109/CVPR46437.2021.00689

  30. [30]

    Nature Neuroscience 2(1), 79–87 (1999)

    Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience 2(1), 79–87 (1999). https://doi.org/10.1038/4580

  31. [31]

    NeuroImage128, 63–73 (2016)

    Ribot, J., Romagnoni, A., Milleret, C., Bennequin, D., Touboul, J.: Pinwheel- dipole configuration in cat early visual cortex. NeuroImage128, 63–73 (2016). https://doi.org/10.1016/j.neuroimage.2015.12.022

  32. [32]

    Nature Neuroscience9(11), 1421–1431 (2006)

    Rust, N.C., Mante, V., Simoncelli, E.P., Movshon, J.A.: How MT cells ana- lyze the motion of visual patterns. Nature Neuroscience9(11), 1421–1431 (2006). https://doi.org/10.1038/nn1786

  33. [33]

    In: Palm, G., Aertsen, A

    Shaw, G.L.: Donald Hebb: The Organization of Behavior. In: Palm, G., Aertsen, A. (eds.) Brain Theory, pp. 231–233. Springer Berlin Heidelberg, Berlin, Heidelberg (1986). https://doi.org/10.1007/978-3-642-70911-1˙15

  34. [34]

    UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

    Soomro, K., Zamir, A.R., Shah, M.: UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild (2012). https://doi.org/10.48550/ARXIV.1212.0402

  35. [35]

    Proceed- ings of the Royal Society of London

    Swindale, N.V.: A model for the formation of ocular dominance stripes. Proceed- ings of the Royal Society of London. Series B. Biological Sciences208(1171), 243– 264 (1980). https://doi.org/10.1098/rspb.1980.0051

  36. [36]

    Cerebral Cortex30(6), 3483–3517 (2020)

    Vanni, S., Hokkanen, H., Werner, F., Angelucci, A.: Anatomy and Phys- iology of Macaque Visual Cortical Areas V1, V2, and V5/MT: Bases for Biologically Realistic Models. Cerebral Cortex30(6), 3483–3517 (2020). https://doi.org/10.1093/cercor/bhz322

  37. [37]

    https://doi.org/10.48550/ARXIV.2005.10242

    Wang, T., Isola, P.: Understanding Contrastive Representation Learn- ing through Alignment and Uniformity on the Hypersphere (2020). https://doi.org/10.48550/ARXIV.2005.10242

  38. [38]

    https://doi.org/10.48550/ARXIV.2105.15134 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 13

    Wen, Z., Li, Y.: Toward Understanding the Feature Learn- ing Process of Self-supervised Contrastive Learning (2021). https://doi.org/10.48550/ARXIV.2105.15134 MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 13

  39. [39]

    https://doi.org/10.48550/ARXIV.1805.01978

    Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised Feature Learn- ing via Non-Parametric Instance-level Discrimination (2018). https://doi.org/10.48550/ARXIV.1805.01978

  40. [40]

    Nature Neuroscience19(3), 356–365 (2016)

    Yamins, D.L.K., DiCarlo, J.J.: Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience19(3), 356–365 (2016). https://doi.org/10.1038/nn.4244

  41. [41]

    Proceedings of the National Academy of Sciences118(3), e2014196118 (2021)

    Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M.C., DiCarlo, J.J., Yamins, D.L.K.: Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences118(3), e2014196118 (2021). https://doi.org/10.1073/pnas.2014196118 A Appendix: Position Initialization The position initialization algorithm establishes ...