Robustness of Similarity-based Positional Encoding Under Rotations: Theoretical Analysis and Experimental Validation

Andrea Santomauro; Giorgio Leonardi; Luigi Portinale

arxiv: 2606.17961 · v1 · pith:26AYFCN5new · submitted 2026-06-16 · 💻 cs.CV · cs.AI

Robustness of Similarity-based Positional Encoding Under Rotations: Theoretical Analysis and Experimental Validation

Andrea Santomauro , Luigi Portinale , Giorgio Leonardi This is my paper

Pith reviewed 2026-06-27 01:15 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords positional encodingtransformersrotational robustnesssimilarity-based encodingimage classificationLipschitz stabilityFrobenius norm

0 comments

The pith

Similarity-based positional encoding remains stable under rotations given mild Lipschitz conditions on its components.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that simPE, which encodes positions via pairwise similarity relations rather than absolute or sinusoidal values, is not rotation-invariant but becomes stable once its building blocks obey mild Lipschitz conditions. Explicit upper bounds on the change in the encoding matrix are derived in the Frobenius norm. Controlled experiments rotate test images while keeping training images fixed and demonstrate that simPE retains higher accuracy, F1, precision, and recall than standard learned positional encodings, especially for small-to-moderate angles across synthetic and FashionMNIST data. The result is relevant wherever geometric misalignment arises during acquisition, such as medical imaging.

Core claim

Under mild Lipschitz assumptions on the elementary components, simPE is stable under rotational perturbations and explicit perturbation bounds in Frobenius norm are derived. On four datasets with rotated test images, simPE consistently outperforms standard learned positional encoding in accuracy, F1 score, precision, and recall, most markedly in the small-to-moderate angle regime.

What carries the argument

Similarity-based positional encoding (simPE), which injects positional information through pairwise relations among input elements.

If this is right

simPE supplies a quantifiable robustness guarantee for Transformer models facing small rotational shifts in input geometry.
Performance gains appear most reliably in the small-to-moderate rotation range on both synthetic shapes and real image benchmarks.
The encoding is provably not fully invariant, so some degradation must still be expected for large angles.
The same stability mechanism can be checked on other controlled perturbations once the Lipschitz property is verified.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the Lipschitz property holds for a given architecture, simPE could replace learned encodings in pipelines that preprocess medical or satellite images.
The explicit bounds open the possibility of analytically predicting the angle threshold at which accuracy begins to fall sharply.
Extending the same Lipschitz analysis to translations or affine transforms would test whether the stability result generalizes beyond rotations.
Hybrid encodings that combine simPE with a small learned component might preserve the bound while recovering some invariance.

Load-bearing premise

The elementary components inside simPE satisfy mild Lipschitz conditions.

What would settle it

Measure whether the observed drop in classification metrics on rotated test images exceeds the size of the derived Frobenius-norm bounds when the Lipschitz condition on simPE components is deliberately violated.

Figures

Figures reproduced from arXiv: 2606.17961 by Andrea Santomauro, Giorgio Leonardi, Luigi Portinale.

**Figure 2.** Figure 2: Performance on the Digits dataset as a function of test rotation angle (degrees). Both [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Performance on the FashionMNIST dataset as a function of test rotation angle [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Performance on the Shapes dataset as a function of test rotation angle (degrees). [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Positional encoding is a fundamental component of Transformer architectures, as it injects information about the spatial or sequential arrangement of inputs. Among recent alternatives to standard absolute and sinusoidal encodings, similarity-based positional encoding (simPE) has emerged as a flexible framework for representing positional structure through pairwise relations. simPE was originally designed for medical imaging applications, where geometric robustness is especially relevant: small rotations naturally arise during image acquisition, induced by imaging instruments, patient positioning, or slight acquisition misalignments. Despite its empirical promise, the theoretical behavior of simPE under geometric perturbations has not been fully characterized. In this paper, we study the robustness of simPE with respect to rotations, combining formal theoretical analysis with experimental validation. We first show that simPE is generally not rotation-invariant. We then prove that, under mild Lipschitz assumptions on the elementary components, simPE is stable under rotational perturbations and derive explicit perturbation bounds in Frobenius norm. We validate these findings experimentally on four controlled datasets--a synthetic Arrow dataset, a synthetic Shapes dataset (four geometric shape categories), a synthetic Digits dataset, and a benchmark image classification dataset (FashionMNIST)--in which training and validation images are kept in a fixed canonical orientation while test images are subjected to increasing rotation angles. Across all datasets, simPE consistently outperforms standard learned positional encoding in terms of accuracy, F1 score, precision, and recall under rotation, particularly in the small-to-moderate angle regime, corroborating the theoretical stability guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives stability bounds for simPE under rotations via Lipschitz assumptions and shows experimental gains, but the constants are never quantified so the theory's practical bite is unclear.

read the letter

The punchline is that simPE gets a formal stability result under rotations plus experiments on four datasets where it beats standard learned positional encoding, especially at small angles. The new piece is the explicit Frobenius-norm perturbation bounds derived from the Lipschitz conditions on the component functions, together with the controlled protocol that holds training images fixed and rotates only the test set.

The experiments look solid on the surface: synthetic arrows, shapes, digits, and FashionMNIST all show the same pattern of better accuracy, F1, precision, and recall. That consistency across dataset types is useful and directly addresses the medical-imaging motivation.

The soft spot is exactly the one the stress-test flagged. The bounds rest on "mild Lipschitz assumptions" but the paper gives no indication that those constants were computed or even bounded for the actual similarity kernels and embeddings in simPE. Without numbers, it is hard to tell whether the derived bounds are tight enough to explain the observed robustness or whether they are loose enough to be non-informative. The abstract also omits error bars or run-to-run variance, which makes the performance claims harder to weigh.

This is for people working on positional encodings in vision transformers who care about geometric robustness. A reader who wants to plug the bounds into their own analysis would get something concrete, but they would still have to do the Lipschitz homework themselves.

I would send it to peer review. The theory-plus-experiment package is worth referee time even if the assumptions need tightening.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that similarity-based positional encoding (simPE) is not generally rotation-invariant but becomes stable under rotational perturbations when mild Lipschitz assumptions hold on its elementary components; explicit perturbation bounds are derived in the Frobenius norm. Experiments on four controlled datasets (synthetic Arrow, Shapes, Digits, and FashionMNIST) with canonically oriented training/validation images and rotated test images show simPE consistently outperforming standard learned positional encoding on accuracy, F1, precision, and recall, especially in the small-to-moderate angle regime.

Significance. If the Lipschitz constants can be instantiated and shown to be sufficiently small for the concrete similarity functions and kernels, the work supplies useful theoretical grounding for simPE in rotation-sensitive domains such as medical imaging. The controlled experimental protocol across multiple synthetic and benchmark datasets provides concrete evidence of practical robustness gains over learned encodings.

major comments (2)

[Abstract] Abstract and theoretical analysis: the explicit Frobenius-norm perturbation bounds are derived only after invoking unspecified 'mild Lipschitz assumptions on the elementary components.' These assumptions are not instantiated for the specific similarity functions, kernels, or embedding maps used in simPE, nor are the resulting constants computed or shown to produce non-vacuous bounds at the tested rotation angles. If the constants are large, the stability guarantees do not actually support the observed experimental robustness.
[Experimental validation] Experimental section: full dataset construction details (exact rotation application procedure, angle ranges, and number of trials) and error-bar or statistical significance reporting are not visible, preventing verification that the reported outperformance is robust rather than anecdotal.

minor comments (1)

Consider adding a short table or paragraph that either computes or bounds the Lipschitz constants for the concrete simPE components used in the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the clarity and verifiability of our work on the robustness of simPE under rotations. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract and theoretical analysis: the explicit Frobenius-norm perturbation bounds are derived only after invoking unspecified 'mild Lipschitz assumptions on the elementary components.' These assumptions are not instantiated for the specific similarity functions, kernels, or embedding maps used in simPE, nor are the resulting constants computed or shown to produce non-vacuous bounds at the tested rotation angles. If the constants are large, the stability guarantees do not actually support the observed experimental robustness.

Authors: We agree that the manuscript states the Lipschitz assumptions at a general level without providing concrete instantiations or numerical values for the constants associated with the specific similarity functions (e.g., dot-product or RBF) and embedding maps employed. This leaves open the question of bound tightness. In the revised manuscript we will add an appendix subsection that instantiates the constants for the concrete components used in the experiments (cosine similarity and Gaussian kernel) and evaluates the resulting Frobenius-norm bounds at the rotation angles tested (0–45°). If the computed constants render the bounds loose, we will explicitly note this limitation and discuss its implications for the theoretical support of the empirical results. revision: yes
Referee: [Experimental validation] Experimental section: full dataset construction details (exact rotation application procedure, angle ranges, and number of trials) and error-bar or statistical significance reporting are not visible, preventing verification that the reported outperformance is robust rather than anecdotal.

Authors: The referee correctly identifies that the current experimental description omits several implementation specifics required for full reproducibility. In the revision we will expand the experimental section (and add a dedicated appendix) with: (i) the precise rotation procedure (scipy.ndimage.rotate with bilinear interpolation and zero-padding), (ii) the exact angle ranges and increments used on each dataset, (iii) the number of independent trials (five random seeds), and (iv) error bars showing mean ± one standard deviation together with paired t-test p-values comparing simPE against learned positional encoding. These additions will allow readers to assess the statistical robustness of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived from external assumptions; experiments provide independent validation

full rationale

The paper's central derivation establishes stability bounds under explicitly stated mild Lipschitz assumptions on the elementary components of simPE, which are invoked as external conditions rather than derived from the paper's own equations or data fits. Experimental results on four separate datasets (synthetic Arrow, Shapes, Digits, and FashionMNIST) with controlled rotations supply independent empirical corroboration. No load-bearing steps reduce by construction to self-citations, fitted parameters renamed as predictions, or self-definitional relations; the theoretical claim and validation remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Lipschitz assumption for the theoretical bound and on the fixed-orientation training / rotated-test protocol for the empirical part; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Mild Lipschitz assumptions on the elementary components of simPE
Required to derive the explicit perturbation bounds under rotation.

pith-pipeline@v0.9.1-grok · 5804 in / 1312 out tokens · 51534 ms · 2026-06-27T01:15:54.133026+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 2 linked inside Pith

[1]

Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

2019
[2]

TransUNet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306, 2021

Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. TransUNet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306, 2021

Pith/arXiv arXiv 2021
[3]

Group equivariant convolutional networks

Taco Cohen and Max Welling. Group equivariant convolutional networks. InProceedings of the 33rd International Conference on Machine Learning, pages 2990–2999. PMLR, 2016

2016
[4]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

2021
[5]

Convo- lutional sequence to sequence learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convo- lutional sequence to sequence learning. InProceedings of the 34th International Conference on Machine Learning, pages 1243–1252. PMLR, 2017

2017
[6]

John Wiley & Sons, New York, 1978

Erwin Kreyszig.Introductory Functional Analysis with Applications. John Wiley & Sons, New York, 1978

1978
[7]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

1998
[8]

Similarity-based positional encoding for enhanced classification in medical images

Giorgio Leonardi, Luigi Portinale, and Andrea Santomauro. Similarity-based positional encoding for enhanced classification in medical images. InProceedings of the 3rd AIxIA Workshop on Artificial Intelligence for Healthcare (HC@AIxIA 2024), volume 3880 ofCEUR Workshop Proceedings, pages 182–188, Bolzano, Italy, 2024. CEUR-WS.org. 17

2024
[9]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

2021
[10]

Image transformer

Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. InProceedings of the 35th International Conference on Machine Learning, pages 4055–4064. PMLR, 2018

2018
[11]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research, 21(140):1–67, 2020

2020
[12]

Tyrrell Rockafellar and Roger J.-B

R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational Analysis, volume 317 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1998

1998
[13]

McGraw-Hill, New York, 2 edition, 1991

Walter Rudin.Functional Analysis. McGraw-Hill, New York, 2 edition, 1991

1991
[14]

Comparing different positional encodings for the interpretation of medical images

Andrea Santomauro, Giorgio Leonardi, and Luigi Portinale. Comparing different positional encodings for the interpretation of medical images. In Pierangela Bruno, Francesco Calimeri, Francesco Cauteruccio, Mauro Dragoni, Fabio Stella, and Giorgio Terracina, editors,Artifi- cial Intelligence for Healthcare, and Hybrid Models for Coupling Deductive and Induc...

2026
[15]

Self-attention with relative position representations

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 464–468. Association for Computational Linguistics, 2018

2018
[16]

RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

2024
[17]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[18]

General E(2)-equivariant steerable CNNs

Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019
[19]

Rethinking and improving relative position encoding for vision transformer

Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, and Hongyang Chao. Rethinking and improving relative position encoding for vision transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10033–10041, 2021

2021
[20]

Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017. 18

Pith/arXiv arXiv 2017

[1] [1]

Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations?Journal of Machine Learning Research, 20(184):1–25, 2019

2019

[2] [2]

TransUNet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306, 2021

Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. TransUNet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306, 2021

Pith/arXiv arXiv 2021

[3] [3]

Group equivariant convolutional networks

Taco Cohen and Max Welling. Group equivariant convolutional networks. InProceedings of the 33rd International Conference on Machine Learning, pages 2990–2999. PMLR, 2016

2016

[4] [4]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

2021

[5] [5]

Convo- lutional sequence to sequence learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convo- lutional sequence to sequence learning. InProceedings of the 34th International Conference on Machine Learning, pages 1243–1252. PMLR, 2017

2017

[6] [6]

John Wiley & Sons, New York, 1978

Erwin Kreyszig.Introductory Functional Analysis with Applications. John Wiley & Sons, New York, 1978

1978

[7] [7]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

1998

[8] [8]

Similarity-based positional encoding for enhanced classification in medical images

Giorgio Leonardi, Luigi Portinale, and Andrea Santomauro. Similarity-based positional encoding for enhanced classification in medical images. InProceedings of the 3rd AIxIA Workshop on Artificial Intelligence for Healthcare (HC@AIxIA 2024), volume 3880 ofCEUR Workshop Proceedings, pages 182–188, Bolzano, Italy, 2024. CEUR-WS.org. 17

2024

[9] [9]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

2021

[10] [10]

Image transformer

Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. InProceedings of the 35th International Conference on Machine Learning, pages 4055–4064. PMLR, 2018

2018

[11] [11]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learning Research, 21(140):1–67, 2020

2020

[12] [12]

Tyrrell Rockafellar and Roger J.-B

R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational Analysis, volume 317 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1998

1998

[13] [13]

McGraw-Hill, New York, 2 edition, 1991

Walter Rudin.Functional Analysis. McGraw-Hill, New York, 2 edition, 1991

1991

[14] [14]

Comparing different positional encodings for the interpretation of medical images

Andrea Santomauro, Giorgio Leonardi, and Luigi Portinale. Comparing different positional encodings for the interpretation of medical images. In Pierangela Bruno, Francesco Calimeri, Francesco Cauteruccio, Mauro Dragoni, Fabio Stella, and Giorgio Terracina, editors,Artifi- cial Intelligence for Healthcare, and Hybrid Models for Coupling Deductive and Induc...

2026

[15] [15]

Self-attention with relative position representations

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 464–468. Association for Computational Linguistics, 2018

2018

[16] [16]

RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

2024

[17] [17]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017

[18] [18]

General E(2)-equivariant steerable CNNs

Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019

[19] [19]

Rethinking and improving relative position encoding for vision transformer

Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, and Hongyang Chao. Rethinking and improving relative position encoding for vision transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10033–10041, 2021

2021

[20] [20]

Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017. 18

Pith/arXiv arXiv 2017