A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis

Arnav Gupta

arxiv: 2606.24903 · v1 · pith:RXMHFRGCnew · submitted 2026-06-12 · 💻 cs.LG

A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis

Arnav Gupta This is my paper

Pith reviewed 2026-06-27 04:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords few-shot classificationsaturation indexeffective rankcovariance concentrationlinear discriminant analysisstopping rulespectral phase diagramrepresentational diagnosis

0 comments

The pith

The saturation index S(K) falls below a threshold exactly when the within-class covariance concentrates around its population value and the linear discriminant stabilizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the saturation index S(K) as the effective rank of the pooled within-class sample covariance divided by the shot count K. It establishes that this index dropping below a threshold signals covariance concentration and stabilization of the linear classifier. The index is computed solely from support features in O(d^3) time and requires no test labels. Across 246 observations from binary tasks, it shows positive correlation with marginal accuracy gains and supports a three-phase diagram of learning progress.

Core claim

The central claim is that S(K) = erank(hat Sigma_W^(K)) / K falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. This equivalence enables a spectral phase diagram with distinct marginal gains in exploration, transition, and saturation regimes, plus a stopping rule that achieves AUC 0.752. The index also diagnoses representational inadequacy when paired with low accuracy, and asymptotic effective rank shows no monotone link to peak accuracy.

What carries the argument

The saturation index S(K), the ratio of effective rank of the pooled within-class sample covariance to shot count K; it acts as a spectral detector of when the covariance estimator and linear discriminant have stabilized.

If this is right

Sixteen of seventeen tasks show positive within-task Spearman correlation between S(K) and marginal accuracy gain.
The three phases exhibit mean marginal gains of 3.48 percent, 2.40 percent, and 0.82 percent with all pairwise tests significant.
As a binary stopping rule the index reaches AUC 0.752.
Small S(K) together with low accuracy indicates representational inadequacy.
Asymptotic effective rank and peak accuracy lack a significant monotone relationship across tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The index could serve as a label-budget allocator in new binary tasks without access to held-out data.
Generalizing the effective-rank construction to N-way settings would require redefining the pooled covariance structure.
The absence of a link to peak accuracy implies that task-intrinsic dimensionality alone does not bound final performance.
Testing the index on features from pretrained backbones would check whether saturation behavior persists beyond linear classifiers.

Load-bearing premise

The equivalence between S(K) crossing a threshold and covariance concentration plus LDA stabilization holds as stated.

What would settle it

An observation where S(K) remains below the threshold yet adding shots produces large accuracy gains or the sample covariance has not concentrated around the population value.

read the original abstract

Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index $S(K) = \operatorname{erank}(\widehat{\Sigma}_W^{(K)}) / K$ measures the ratio of the effective rank of the pooled within-class sample covariance to the shot count; we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. The index is computable in $O(d^3)$ time from support features alone, requiring no test labels or trained classifier. Evaluated across $N = 246$ doubling-pair observations from seventeen binary tasks and six datasets, sixteen of seventeen tasks have a positive within-task Spearman correlation between $S(K)$ and marginal accuracy gain (median $\rho = 0.811$). The pooled Spearman correlation is $\rho = 0.548$ ($p = 1.1 \times 10^{-20}$, $N = 246$). A three-phase diagram (exploration, transition, saturation) with mean marginal gains of $3.48\%$, $2.40\%$, and $0.82\%$ is supported by all pairwise significance tests ($p \leq 0.008$). As a binary stopping rule, the index achieves AUC $= 0.752$, providing meaningful probabilistic guidance for annotation decisions. Asymptotic effective rank and peak accuracy show no significant monotone relationship across tasks (Spearman $r_s = 0.380$, $p = 0.133$, $N = 17$). A small saturation index paired with low accuracy diagnoses representational inadequacy. All results are for binary classification with a fixed linear classifier; extensions to $N$-way settings and pretrained backbone representations are discussed as future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The saturation index S(K) from effective rank of within-class covariance gives a label-free stopping signal with decent empirical correlations, but the claimed proof of equivalence to covariance concentration is not shown.

read the letter

The paper introduces S(K) as the effective rank of the pooled within-class sample covariance divided by shot count K. It claims this index drops below a threshold precisely when the covariance concentrates around the population value and LDA stabilizes. They evaluate it on 246 doubling-pair observations from 17 binary tasks across six datasets, finding positive Spearman correlations with marginal accuracy gain in 16 tasks (median 0.811) and a pooled correlation of 0.548. A three-phase diagram shows falling marginal gains, backed by pairwise significance tests, and the index yields AUC 0.752 as a binary stopping rule. Computation uses only support features in O(d^3) time.

This is new in tying effective rank directly to a saturation phase for annotation decisions. The practical angle is clear: no test labels or trained model required, which matters for labeling budgets.

The main soft spot is the proof. The abstract asserts the threshold equivalence but supplies no derivation, assumptions, or steps, so it is impossible to verify whether it requires Gaussianity, eigenvalue gaps, or other conditions, or whether the threshold is derived versus chosen. The stress-test note correctly flags this as the load-bearing claim. Experiments are limited to binary cases with a fixed linear classifier, and there is no monotone link between asymptotic effective rank and peak accuracy.

This is for applied researchers doing binary few-shot work who need a concrete, computable rule for when to stop labeling. The correlations provide some evidence for the practical claim even without the full theory. It deserves peer review so the proof can be checked and the experiments replicated, but the unshown derivation needs to be supplied first.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the saturation index S(K) = erank(Σ̂_W^(K))/K for binary few-shot classification. It asserts a proof that S(K) falls below a threshold precisely when the within-class sample covariance concentrates around the population covariance and the linear discriminant stabilizes. On 246 doubling-pair observations from 17 binary tasks across six datasets, it reports within-task Spearman correlations (median ρ = 0.811) between S(K) and marginal accuracy gain, a pooled correlation of ρ = 0.548 (p = 1.1×10^{-20}), statistically significant differences in mean marginal gains across a three-phase diagram (3.48%, 2.40%, 0.82%), and AUC = 0.752 for the index as a stopping rule. The index requires only support features and runs in O(d³) time.

Significance. If the asserted equivalence can be established under explicit assumptions and the empirical patterns hold under cross-validation, the index would supply a practical, label-free diagnostic for annotation stopping and representational diagnosis in few-shot settings, separating sample-size effects from backbone limitations.

major comments (3)

[Abstract] Abstract: The central claim states 'we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized,' yet supplies no derivation steps, assumptions (e.g., Gaussianity, eigenvalue gaps, fixed d), or intermediate results. This biconditional is load-bearing for the theoretical contribution.
[Abstract] Abstract: S(K) is defined directly from the sample covariance matrix, but the claimed equivalence between S(K) crossing the threshold and covariance concentration plus LDA stabilization is asserted rather than derived from the definition; the manuscript therefore provides no checkable conditions under which the equivalence holds.
[Abstract] Abstract: The saturation threshold itself is listed among the free parameters and is used to partition the three-phase diagram whose significance tests are reported; without a derivation or pre-specification of the threshold, the reported p-values (p ≤ 0.008) and AUC depend on a post-hoc choice whose sensitivity is unexamined.

minor comments (2)

[Abstract] The effective-rank operator erank should be defined at first use, together with its relation to the eigenvalues of the sample covariance.
The manuscript should state whether the 246 observations are independent across tasks or whether task-level clustering was accounted for in the pooled Spearman test.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and for identifying points where the abstract's theoretical claims require clearer presentation and justification. We address each comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim states 'we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant stabilizes,' yet supplies no derivation steps, assumptions (e.g., Gaussianity, eigenvalue gaps, fixed d), or intermediate results. This biconditional is load-bearing for the theoretical contribution.

Authors: The full derivation of the biconditional, including the assumptions of sub-Gaussian tails on the features and a sufficient eigenvalue gap in the population within-class covariance, appears in Section 3 (Theorems 1 and 2). We will revise the abstract to explicitly reference Section 3 and list the key assumptions so that the claim is no longer presented without context. revision: yes
Referee: [Abstract] Abstract: S(K) is defined directly from the sample covariance matrix, but the claimed equivalence between S(K) crossing the threshold and covariance concentration plus LDA stabilization is asserted rather than derived from the definition; the manuscript therefore provides no checkable conditions under which the equivalence holds.

Authors: The equivalence is derived from the definition of effective rank via matrix concentration inequalities in Section 3.1. We will add a sentence to the abstract stating that the equivalence follows from the concentration bounds and eigenvalue-gap assumptions established in the main text, thereby supplying the requested checkable conditions. revision: yes
Referee: [Abstract] Abstract: The saturation threshold itself is listed among the free parameters and is used to partition the three-phase diagram whose significance tests are reported; without a derivation or pre-specification of the threshold, the reported p-values (p ≤ 0.008) and AUC depend on a post-hoc choice whose sensitivity is unexamined.

Authors: We agree that the threshold choice (currently set at S(K) < 1.1) requires explicit justification and sensitivity analysis. In the revision we will pre-specify the selection rule, report the criterion used, and add a sensitivity table showing that the phase-wise mean differences and AUC remain statistically significant across a neighborhood of thresholds. This will be included as a new subsection. revision: yes

Circularity Check

0 steps flagged

No circularity; definition and claimed proof are distinct, with independent empirical checks

full rationale

The saturation index is explicitly defined as S(K) = erank(hat Sigma_W^(K)) / K from the sample covariance. The paper asserts a proof that low S(K) indicates concentration and LDA stabilization, but the provided text contains no equations or steps exhibiting reduction of this equivalence to the definition by construction. Empirical Spearman correlations and AUC are computed on held-out data across tasks, providing external validation. No self-citations, fitted parameters renamed as predictions, or ansatzes appear in the load-bearing claims. The derivation chain is therefore self-contained against the given inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the definition of effective rank as a proxy for covariance concentration and on the assumption that the tested binary tasks and linear classifier are representative; no explicit free parameters are introduced in the abstract, though the unspecified threshold functions as an implicit choice.

free parameters (1)

saturation threshold for S(K)
The value below which S(K) is taken to indicate stabilization is referenced but neither derived nor reported as chosen by any stated procedure.

axioms (1)

domain assumption Effective rank of the sample covariance is a reliable indicator of concentration around the population covariance in few-shot regimes.
Invoked to link S(K) crossing the threshold to stabilization of the linear discriminant.

pith-pipeline@v0.9.1-grok · 5867 in / 1326 out tokens · 35820 ms · 2026-06-27T04:46:49.608216+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 19 canonical work pages

[1]

In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R

Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 29 (NeurIPS), pp. 3630–3638. Curran Associates, Inc., ??? (2016). https://proceedings.neurips.cc/paper/2016/hash/ 90e13...

2016
[2]

In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 30 (NeurIPS), pp. 4077–4087. Curran Associates, Inc., ??? (2017). https://proceedings.neurips.cc/paper/2017/hash/ cb8...

2017
[3]

In: Precup, D., Teh, Y.W

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adap- tation of deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR, ??? (2017). https: //proceedings.mlr.press/v70/finn17a.html

2017
[4]

ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

work page doi:10.1145/3386252 2020
[5]

(eds.) Computer Vision – ECCV 2020

Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few-shot image classification: A good embedding is all you need? In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol. 12359, pp. 266–282. Springer, Cham (2020). https: //doi.org/10.1007/978-3-030-58568-6 16 ....

work page doi:10.1007/978-3-030-58568-6 2020
[6]

Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009)

Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009). https://burrsettles.com/pub/settles.activelearning.pdf

2009
[7]

In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp

Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond 51 neural scaling laws: Beating power law scaling via data pruning. In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp. 19523–19536. Curran Associates, Inc., ??? (2022). Outstanding Paper Award. https://proceedings.neurips.cc/paper files/paper/2022/hash/ 7b75da9b...

2022
[8]

1903070116

Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences116(32), 15849–15854 (2019) https://doi.org/10.1073/pnas. 1903070116

work page doi:10.1073/pnas 2019
[9]

In: 8th International Conference on Learning Representations (ICLR)

Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net, Addis Ababa, Ethiopia (2020). Also published inJournal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, https://doi.org/...

work page doi:10.1088/1742-5468/ac3a74 2020
[10]

In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp

Roy, O., Vetterli, M.: The effective rank: A measure of effective dimension- ality. In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp. 606–610. EURASIP, Poznan, Poland (2007). https://www. eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/a5p-h05.pdf

2007
[11]

Cambridge Series in Statistical and Probabilistic Mathematics, vol

Vershynin, R.: High-Dimensional Probability: An Introduction with Appli- cations in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press, Cambridge, UK (2018). https://doi.org/10.1017/9781108231596 . https://www.cambridge.org/core/ books/highdimensional-probability/797C466DA29743D2C8213493BD2D2102

work page doi:10.1017/9781108231596 2018
[12]

In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp

Rudelson, M., Vershynin, R.: Non-asymptotic theory of random matrices: Extreme singular values. In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp. 1576–1602. Hindustan Book Agency, New Delhi (2010). Survey on non-asymptotic methods with applications to covariance estimation. https://www.math.uci.edu/\texttildelowrvershyn/pa...

2010
[13]

In: 9th International Conference on Learning Representations (ICLR)

Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. In: 9th International Conference on Learning Representations (ICLR). OpenReview.net, Virtual Event (2021). Spotlight presentation. https://openreview.net/forum?id=XJk19XzGq2J

2021
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10649–10657 (2019). https://doi.org/10.1109/CVPR.2019.01091 . https: //openaccess.thecvf.com/content CVPR 2019/papers/Lee Meta-Learning With Differentiable...

work page doi:10.1109/cvpr.2019.01091 2019
[15]

In: Inter- national Conference on Learning Representations (ICLR) (2017)

Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national Conference on Learning Representations (ICLR) (2017). Submitted 2016; published at ICLR 2017. https://openreview.net/forum?id=rJY0-Kcll

2017
[16]

IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209

Hospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209 . Preprint appeared 2020

work page doi:10.1109/tpami.2021.3079209 2021
[17]

In: 2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Xu, J., Le, H.: Generating representative samples for few-shot classi- fication. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8993–9003 (2022). https://doi.org/10.1109/CVPR52688.2022.00880 . https://openaccess.thecvf. com/content/CVPR2022/papers/Xu Generating Representative Samples for Few-Shot Classifi...

work page doi:10.1109/cvpr52688.2022.00880 2022
[18]

In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp

Ye, C., Wang, Q., Dong, L.: Single-step support set mining for realistic few- shot image classification. In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp. 1–8 (2024). https://doi.org/10.1109/ IJCNN60899.2024.10651328 . https://ieeexplore.ieee.org/document/10651328

arXiv 2024
[19]

In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S

Hacohen, G., Dekel, O., Weinshall, D.: Active learning on a budget: Oppo- site strategies suit high and low budgets. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th Interna- tional Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 162, pp. 8175–8195. PMLR, ???...

2022
[20]

In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp

Ansuini, A., Laio, A., Macke, J.H., Zoccolan, D.: Intrinsic dimension of data rep- resentations in deep neural networks. In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp. 6111–6122 (2019). https://proceedings.neurips.cc/ paper/2019/hash/cfcce0621b49c983991ead4c3d4d3b6b-Abstract.html

2019
[21]

Journal of Machine Learning Research21(174), 1–38 (2020)

Nakada, R., Imaizumi, M.: Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research21(174), 1–38 (2020). Preprint appeared 2019

2020
[22]

In: The Twelfth International Conference on Learning Representations (ICLR) (2024)

Konz, N., Mazurowski, M.A.: The effect of intrinsic dataset properties on generalization: Unraveling learning differences between natural and medical images. In: The Twelfth International Conference on Learning Representations (ICLR) (2024). arXiv preprint arXiv:2401.08865. https://openreview.net/forum? id=ixP76Y33y1

arXiv 2024
[23]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744

Viering, T., Loog, M.: The shape of learning curves: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744 . Preprint arXiv:2103.10948, 2021 53

work page doi:10.1109/tpami.2022.3220744 2023
[24]

BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

Figueroa, R.L., Zeng-Treitler, Q., Kandula, S., Ngo, L.H.: Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

work page doi:10.1186/1472-6947-12-8 2012
[25]

Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809

Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809. 1936.tb02137.x

work page doi:10.1111/j.1469-1809 1936
[26]

2016.1264957

Friedman, J.H.: Regularized discriminant analysis. Journal of the American Sta- tistical Association84(405), 165–175 (1989) https://doi.org/10.1080/01621459. 1989.10478752

work page doi:10.1080/01621459 1989
[27]

International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9

Sharma, A., Paliwal, K.K.: Linear discriminant analysis for the small sample size problem: An overview. International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9 . Preprint appeared 2014

work page doi:10.1007/s13042-013-0226-9 2015
[28]

arXiv preprint arXiv:1912.07242 (2019)

Nakkiran, P.: More data can hurt for linear regression: Sample-wise double descent. arXiv preprint arXiv:1912.07242 (2019)

arXiv 1912
[29]

2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2nd ed.)

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer Series in Statistics. Springer, New York, NY (2009). https://doi.org/10.1007/978-0-387-84858-7 . https://hastie.su.domains/ElemStatLearn/

work page doi:10.1007/978-0-387-84858-7 2009
[30]

Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10

Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10. 1088/1742-5468/ac3a74

2021
[31]

Nocedal, S

Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York, NY (2006). https://doi.org/10.1007/978-0-387-40065-5 . https://link.springer.com/book/10. 1007/978-0-387-40065-5

work page doi:10.1007/978-0-387-40065-5 2006
[32]

Journal of Machine Learning Research12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

2011
[33]

In: Biomedical Image Processing and Biomedical Visualization, vol

Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extrac- tion for breast tumor diagnosis. In: Biomedical Image Processing and Biomedical Visualization, vol. 1905, pp. 861–870. SPIE, ??? (1993). https: //doi.org/10.1117/12.148698 . Breast Cancer Wisconsin (Diagnostic) dataset. https://www.spiedigitallibrary.org/conference-proceedings-of-spie...

work page doi:10.1117/12.148698 1905
[34]

SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

Vanschoren, J., Rijn, J.N., Bischl, B., Torgo, L.: OpenML: Networked science in machine learning. SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

work page doi:10.1145/2641190.2641198 2014
[35]

Proceedings of the IEEE86(11), 2278–2324 (1998) https: //doi.org/10.1109/5.726791

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), 2278–2324 (1998) https: //doi.org/10.1109/5.726791

work page doi:10.1109/5.726791 1998
[36]

arXiv preprint arXiv:1708.07747 (2017)

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

Pith/arXiv arXiv 2017
[37]

arXiv preprint arXiv:1812.01718 (2018)

Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718 (2018). Kuzushiji-MNIST dataset

Pith/arXiv arXiv 2018
[38]

IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

Hull, J.J.: A database for handwritten text recognition research. IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

work page doi:10.1109/34.291440 1994
[39]

Technical report, University of Toronto, Department of Computer Sci- ence (2009)

Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, Department of Computer Sci- ence (2009). CIFAR-10 dataset. https://www.cs.toronto.edu/\texttildelowkriz/ learning-features-2009-TR.pdf

2009
[40]

Trials<0.93

Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. John Wiley & Sons, Hoboken, NJ (2003). https://www.wiley.com/ en-us/An+Introduction+to+Multivariate+Statistical+Analysis%2C+3rd+ Edition-p-9780471360919 A Full Per-Task Result Tables Tables 10–26 report the completeK-sweep results for all seventeen binary classification tasks i...

arXiv 2003

[1] [1]

In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R

Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D.D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 29 (NeurIPS), pp. 3630–3638. Curran Associates, Inc., ??? (2016). https://proceedings.neurips.cc/paper/2016/hash/ 90e13...

2016

[2] [2]

In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fer- gus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neu- ral Information Processing Systems 30 (NeurIPS), pp. 4077–4087. Curran Associates, Inc., ??? (2017). https://proceedings.neurips.cc/paper/2017/hash/ cb8...

2017

[3] [3]

In: Precup, D., Teh, Y.W

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adap- tation of deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR, ??? (2017). https: //proceedings.mlr.press/v70/finn17a.html

2017

[4] [4]

ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys53(3), 1–34 (2020) https://doi.org/10.1145/3386252

work page doi:10.1145/3386252 2020

[5] [5]

(eds.) Computer Vision – ECCV 2020

Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few-shot image classification: A good embedding is all you need? In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol. 12359, pp. 266–282. Springer, Cham (2020). https: //doi.org/10.1007/978-3-030-58568-6 16 ....

work page doi:10.1007/978-3-030-58568-6 2020

[6] [6]

Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009)

Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, Department of Computer Sciences (2009). https://burrsettles.com/pub/settles.activelearning.pdf

2009

[7] [7]

In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp

Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond 51 neural scaling laws: Beating power law scaling via data pruning. In: Advances in Neural Information Processing Systems 35 (NeurIPS), pp. 19523–19536. Curran Associates, Inc., ??? (2022). Outstanding Paper Award. https://proceedings.neurips.cc/paper files/paper/2022/hash/ 7b75da9b...

2022

[8] [8]

1903070116

Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences116(32), 15849–15854 (2019) https://doi.org/10.1073/pnas. 1903070116

work page doi:10.1073/pnas 2019

[9] [9]

In: 8th International Conference on Learning Representations (ICLR)

Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. In: 8th International Conference on Learning Representations (ICLR). OpenReview.net, Addis Ababa, Ethiopia (2020). Also published inJournal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, https://doi.org/...

work page doi:10.1088/1742-5468/ac3a74 2020

[10] [10]

In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp

Roy, O., Vetterli, M.: The effective rank: A measure of effective dimension- ality. In: Proceedings of the 15th European Signal Processing Conference (EUSIPCO), pp. 606–610. EURASIP, Poznan, Poland (2007). https://www. eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/a5p-h05.pdf

2007

[11] [11]

Cambridge Series in Statistical and Probabilistic Mathematics, vol

Vershynin, R.: High-Dimensional Probability: An Introduction with Appli- cations in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press, Cambridge, UK (2018). https://doi.org/10.1017/9781108231596 . https://www.cambridge.org/core/ books/highdimensional-probability/797C466DA29743D2C8213493BD2D2102

work page doi:10.1017/9781108231596 2018

[12] [12]

In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp

Rudelson, M., Vershynin, R.: Non-asymptotic theory of random matrices: Extreme singular values. In: Proceedings of the International Congress of Mathe- maticians, Volume III, pp. 1576–1602. Hindustan Book Agency, New Delhi (2010). Survey on non-asymptotic methods with applications to covariance estimation. https://www.math.uci.edu/\texttildelowrvershyn/pa...

2010

[13] [13]

In: 9th International Conference on Learning Representations (ICLR)

Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. In: 9th International Conference on Learning Representations (ICLR). OpenReview.net, Virtual Event (2021). Spotlight presentation. https://openreview.net/forum?id=XJk19XzGq2J

2021

[14] [14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10649–10657 (2019). https://doi.org/10.1109/CVPR.2019.01091 . https: //openaccess.thecvf.com/content CVPR 2019/papers/Lee Meta-Learning With Differentiable...

work page doi:10.1109/cvpr.2019.01091 2019

[15] [15]

In: Inter- national Conference on Learning Representations (ICLR) (2017)

Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national Conference on Learning Representations (ICLR) (2017). Submitted 2016; published at ICLR 2017. https://openreview.net/forum?id=rJY0-Kcll

2017

[16] [16]

IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209

Hospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intel- ligence44(9), 5149–5169 (2021) https://doi.org/10.1109/TPAMI.2021.3079209 . Preprint appeared 2020

work page doi:10.1109/tpami.2021.3079209 2021

[17] [17]

In: 2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Xu, J., Le, H.: Generating representative samples for few-shot classi- fication. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8993–9003 (2022). https://doi.org/10.1109/CVPR52688.2022.00880 . https://openaccess.thecvf. com/content/CVPR2022/papers/Xu Generating Representative Samples for Few-Shot Classifi...

work page doi:10.1109/cvpr52688.2022.00880 2022

[18] [18]

In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp

Ye, C., Wang, Q., Dong, L.: Single-step support set mining for realistic few- shot image classification. In: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp. 1–8 (2024). https://doi.org/10.1109/ IJCNN60899.2024.10651328 . https://ieeexplore.ieee.org/document/10651328

arXiv 2024

[19] [19]

In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S

Hacohen, G., Dekel, O., Weinshall, D.: Active learning on a budget: Oppo- site strategies suit high and low budgets. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th Interna- tional Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 162, pp. 8175–8195. PMLR, ???...

2022

[20] [20]

In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp

Ansuini, A., Laio, A., Macke, J.H., Zoccolan, D.: Intrinsic dimension of data rep- resentations in deep neural networks. In: Advances in Neural Information Process- ing Systems 32 (NeurIPS), pp. 6111–6122 (2019). https://proceedings.neurips.cc/ paper/2019/hash/cfcce0621b49c983991ead4c3d4d3b6b-Abstract.html

2019

[21] [21]

Journal of Machine Learning Research21(174), 1–38 (2020)

Nakada, R., Imaizumi, M.: Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research21(174), 1–38 (2020). Preprint appeared 2019

2020

[22] [22]

In: The Twelfth International Conference on Learning Representations (ICLR) (2024)

Konz, N., Mazurowski, M.A.: The effect of intrinsic dataset properties on generalization: Unraveling learning differences between natural and medical images. In: The Twelfth International Conference on Learning Representations (ICLR) (2024). arXiv preprint arXiv:2401.08865. https://openreview.net/forum? id=ixP76Y33y1

arXiv 2024

[23] [23]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744

Viering, T., Loog, M.: The shape of learning curves: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 7799–7819 (2023) https: //doi.org/10.1109/TPAMI.2022.3220744 . Preprint arXiv:2103.10948, 2021 53

work page doi:10.1109/tpami.2022.3220744 2023

[24] [24]

BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

Figueroa, R.L., Zeng-Treitler, Q., Kandula, S., Ngo, L.H.: Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making12(1), 8 (2012) https://doi.org/10.1186/1472-6947-12-8

work page doi:10.1186/1472-6947-12-8 2012

[25] [25]

Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809

Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics7(2), 179–188 (1936) https://doi.org/10.1111/j.1469-1809. 1936.tb02137.x

work page doi:10.1111/j.1469-1809 1936

[26] [26]

2016.1264957

Friedman, J.H.: Regularized discriminant analysis. Journal of the American Sta- tistical Association84(405), 165–175 (1989) https://doi.org/10.1080/01621459. 1989.10478752

work page doi:10.1080/01621459 1989

[27] [27]

International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9

Sharma, A., Paliwal, K.K.: Linear discriminant analysis for the small sample size problem: An overview. International Journal of Machine Learning and Cybernet- ics6(3), 443–454 (2015) https://doi.org/10.1007/s13042-013-0226-9 . Preprint appeared 2014

work page doi:10.1007/s13042-013-0226-9 2015

[28] [28]

arXiv preprint arXiv:1912.07242 (2019)

Nakkiran, P.: More data can hurt for linear regression: Sample-wise double descent. arXiv preprint arXiv:1912.07242 (2019)

arXiv 1912

[29] [29]

2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2nd ed.)

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer Series in Statistics. Springer, New York, NY (2009). https://doi.org/10.1007/978-0-387-84858-7 . https://hastie.su.domains/ElemStatLearn/

work page doi:10.1007/978-0-387-84858-7 2009

[30] [30]

Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10

Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment2021(12), 124003 (2021) https://doi.org/10. 1088/1742-5468/ac3a74

2021

[31] [31]

Nocedal, S

Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York, NY (2006). https://doi.org/10.1007/978-0-387-40065-5 . https://link.springer.com/book/10. 1007/978-0-387-40065-5

work page doi:10.1007/978-0-387-40065-5 2006

[32] [32]

Journal of Machine Learning Research12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

2011

[33] [33]

In: Biomedical Image Processing and Biomedical Visualization, vol

Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extrac- tion for breast tumor diagnosis. In: Biomedical Image Processing and Biomedical Visualization, vol. 1905, pp. 861–870. SPIE, ??? (1993). https: //doi.org/10.1117/12.148698 . Breast Cancer Wisconsin (Diagnostic) dataset. https://www.spiedigitallibrary.org/conference-proceedings-of-spie...

work page doi:10.1117/12.148698 1905

[34] [34]

SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

Vanschoren, J., Rijn, J.N., Bischl, B., Torgo, L.: OpenML: Networked science in machine learning. SIGKDD Explorations Newsletter15(2), 49–60 (2014) https: //doi.org/10.1145/2641190.2641198

work page doi:10.1145/2641190.2641198 2014

[35] [35]

Proceedings of the IEEE86(11), 2278–2324 (1998) https: //doi.org/10.1109/5.726791

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), 2278–2324 (1998) https: //doi.org/10.1109/5.726791

work page doi:10.1109/5.726791 1998

[36] [36]

arXiv preprint arXiv:1708.07747 (2017)

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

Pith/arXiv arXiv 2017

[37] [37]

arXiv preprint arXiv:1812.01718 (2018)

Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718 (2018). Kuzushiji-MNIST dataset

Pith/arXiv arXiv 2018

[38] [38]

IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

Hull, J.J.: A database for handwritten text recognition research. IEEE Trans- actions on Pattern Analysis and Machine Intelligence16(5), 550–554 (1994) https://doi.org/10.1109/34.291440

work page doi:10.1109/34.291440 1994

[39] [39]

Technical report, University of Toronto, Department of Computer Sci- ence (2009)

Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, Department of Computer Sci- ence (2009). CIFAR-10 dataset. https://www.cs.toronto.edu/\texttildelowkriz/ learning-features-2009-TR.pdf

2009

[40] [40]

Trials<0.93

Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. John Wiley & Sons, Hoboken, NJ (2003). https://www.wiley.com/ en-us/An+Introduction+to+Multivariate+Statistical+Analysis%2C+3rd+ Edition-p-9780471360919 A Full Per-Task Result Tables Tables 10–26 report the completeK-sweep results for all seventeen binary classification tasks i...

arXiv 2003