arxiv: 2604.02468 · v1 · submitted 2026-04-02 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

Haodong Xie , Yujun Cai , Rahul Singh Maharjan , Yiwei Wang , Federico Tavella , Angelo Cangelosi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords Concept Bottleneck ModelsHierarchical InterpretabilityLabel-Free ConceptsVisual Consistency LossMulti-level ExplanationsDeep Learning Interpretability

0 comments

The pith

A hierarchical extension to concept bottleneck models allows accurate multi-level classification and explanations using only label-free concepts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes extending standard concept bottleneck models, which predict labels via single-level human-understandable concepts, into a hierarchical version. This allows the model to classify and explain at multiple abstraction levels, progressing from general to specific features without needing labels for how concepts relate. It achieves this through a gradient-based visual consistency loss that keeps layers focused on similar image regions and dual classification heads operating at different levels. Sympathetic readers would care because current single-level CBMs cannot mirror how humans use both broad and detailed views for the same object, limiting their use in tasks where explanations must match the right scale of understanding.

Core claim

HIL-CBM extends CBMs into a hierarchical framework that enables classification and explanation across multiple semantic levels without requiring relational concept annotations. This is achieved by introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions and by training dual classification heads each operating on feature concepts at different abstraction levels. The model aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete, and experiments show it outperforms state-of-the-art sparse CBMs in classification accuracy while providing more interpretable, a

What carries the argument

HIL-CBM (Hierarchical Interpretable Label-Free Concept Bottleneck Model) that uses a gradient-based visual consistency loss to align abstraction layers on similar spatial regions together with dual classification heads for multi-level predictions on feature concepts.

If this is right

Outperforms state-of-the-art sparse CBMs in classification accuracy on benchmark datasets.
Provides more interpretable and accurate explanations according to human evaluations.
Maintains a hierarchical and label-free approach to feature concepts across multiple semantic levels.
Aligns the abstraction level of concept explanations with model predictions from abstract to concrete.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-head structure might transfer to other interpretable models to handle multi-scale inputs without adding supervision costs.
Explanations at matching abstraction levels could reduce user confusion when models are deployed on ambiguous or variably scaled objects.
Testing the consistency loss on non-image data could reveal whether the hierarchy benefit is vision-specific or more general.

Load-bearing premise

The gradient-based visual consistency loss will reliably align abstraction layers to similar spatial regions and dual classification heads will enable effective multi-level predictions without any relational concept annotations.

What would settle it

Experiments on the same benchmark datasets where HIL-CBM fails to exceed the accuracy of sparse CBMs or where human evaluators rate its explanations as no more accurate or interpretable than single-level models would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.02468 by Angelo Cangelosi, Federico Tavella, Haodong Xie, Rahul Singh Maharjan, Yiwei Wang, Yujun Cai.

**Figure 2.** Figure 2: Overview of our proposed model, HIL-CBM. A pre-trained backbone processes input images to produce image embeddings. Two concept layers, each [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of two-level predictions from HIL-CBM and the [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: visualizes the model debugging process using an example from the ImageNet validation set. In [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using both general and specific features, existing CBMs operate at a single semantic level in both concept and label space. We propose HIL-CBM, a Hierarchical Interpretable Label-Free Concept Bottleneck Model that extends CBMs into a hierarchical framework to enhance interpretability by more closely mirroring the human cognitive process. HIL-CBM enables classification and explanation across multiple semantic levels without requiring relational concept annotations. HIL-CBM aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete. This is achieved by (i) introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels. Experiments on benchmark datasets demonstrate that HIL-CBM outperforms state-of-the-art sparse CBMs in classification accuracy. Human evaluations further show that HIL-CBM provides more interpretable and accurate explanations, while maintaining a hierarchical and label-free approach to feature concepts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HIL-CBM adds a hierarchical label-free structure to CBMs via gradient consistency loss and dual heads, but the abstract gives no experimental details so the claimed gains stay unverified.

read the letter

The main point is that this paper extends concept bottleneck models to multiple abstraction levels without needing concept annotations. It uses a gradient-based visual consistency loss to push layers toward similar image regions and trains two separate classification heads on features from different depths. That setup is the actual novelty within the CBM line of work. It tries to make explanations line up with the prediction level, moving from abstract to concrete, which is a direct response to the single-level limitation in prior models. The architecture itself is cleanly described and avoids extra supervision, which keeps it practical. The dual-head design and the loss are straightforward additions that could plausibly improve alignment. The soft spot is the evidence. The abstract states better accuracy than sparse CBMs and stronger human ratings on explanations, yet supplies zero information on datasets, baselines, splits, or statistics. Without those, it is impossible to judge whether the hierarchy or the new loss actually produces the gains or whether the results are just incremental. The stress-test worry about the gradient loss aligning to unrelated regions rather than semantically related ones is reasonable on the given description, since nothing in the loss forces semantic content. The dual heads could also collapse into redundant predictors without relational constraints. This paper is aimed at people already working on interpretable vision models and CBM extensions. A reader in that niche would pick up usable architectural ideas, but would need the full experiments before treating the performance claims as settled. It should go to peer review so referees can check the implementation details and test whether the mechanisms deliver the intended hierarchy.

Referee Report

3 major / 2 minor

Summary. The paper introduces HIL-CBM, a hierarchical extension of Concept Bottleneck Models (CBMs) that performs multi-level classification and generates explanations at varying abstraction levels without requiring relational concept annotations or label supervision. It achieves this via a gradient-based visual consistency loss to align abstraction layers spatially and dual classification heads operating on features at different levels. Experiments on benchmark datasets are claimed to show higher classification accuracy than state-of-the-art sparse CBMs, with human evaluations indicating more interpretable and accurate explanations.

Significance. If the empirical claims hold under rigorous validation, the work would meaningfully advance interpretable ML by extending CBMs to hierarchical settings that better approximate human multi-scale reasoning, potentially improving both accuracy and trust in domains requiring multi-level explanations such as medical imaging or scene understanding. The label-free design is a notable strength, as is the attempt to enforce progressive abstraction without additional annotations.

major comments (3)

[§3.2] §3.2 (visual consistency loss): the loss is defined solely via gradient similarity (cosine similarity between layer gradients); this construction permits alignment on spatially overlapping but semantically unrelated regions (e.g., background texture), providing no guarantee that abstraction layers correspond to progressively concrete object parts as required for the hierarchical claim.
[§3.3] §3.3 (dual classification heads): both heads map to the identical label space without any relational supervision or explicit hierarchy constraint in the concept space; nothing prevents the heads from learning redundant or averaged predictors, which would violate the asserted progression from abstract to concrete predictions.
[§5] §5 (experiments): the reported accuracy gains over sparse CBM baselines lack error bars, statistical significance tests, and details on data splits or hyperparameter search; without these, the central claim that HIL-CBM outperforms SOTA cannot be assessed for robustness.

minor comments (2)

[§3] Notation for the abstraction levels and concept vectors is introduced without a clear summary table or diagram, making it difficult to track how features flow between the two heads.
[§5.3] The human evaluation section does not report inter-rater agreement metrics or the exact protocol for presenting explanations to participants.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below and indicate where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [§3.2] §3.2 (visual consistency loss): the loss is defined solely via gradient similarity (cosine similarity between layer gradients); this construction permits alignment on spatially overlapping but semantically unrelated regions (e.g., background texture), providing no guarantee that abstraction layers correspond to progressively concrete object parts as required for the hierarchical claim.

Authors: We acknowledge that relying solely on gradient cosine similarity for the visual consistency loss does not explicitly rule out alignment with non-semantic regions such as background textures. Our empirical evidence from concept visualizations and human evaluations suggests that the learned layers do focus on progressively concrete object parts in practice. In the revised manuscript we will add a dedicated limitations paragraph discussing this aspect of the loss and include additional qualitative examples illustrating semantic rather than texture-based alignment. revision: partial
Referee: [§3.3] §3.3 (dual classification heads): both heads map to the identical label space without any relational supervision or explicit hierarchy constraint in the concept space; nothing prevents the heads from learning redundant or averaged predictors, which would violate the asserted progression from abstract to concrete predictions.

Authors: The two heads receive concept features from layers at different abstraction depths, and the visual consistency loss is intended to enforce spatial differentiation between these layers. While no explicit relational supervision is used, we will strengthen the revision by adding a quantitative analysis comparing the two heads' predictions (e.g., disagreement rates on fine- versus coarse-grained examples) to demonstrate that they capture distinct levels of abstraction rather than redundant mappings. revision: partial
Referee: [§5] §5 (experiments): the reported accuracy gains over sparse CBM baselines lack error bars, statistical significance tests, and details on data splits or hyperparameter search; without these, the central claim that HIL-CBM outperforms SOTA cannot be assessed for robustness.

Authors: We agree that the current experimental reporting is insufficient for assessing robustness. In the revised manuscript we will report mean accuracy and standard deviation over multiple random seeds, include statistical significance tests (paired t-tests) against baselines, and provide complete details on data splits together with the hyperparameter search protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity; components introduced as independent extensions

full rationale

The paper proposes HIL-CBM by adding a gradient-based visual consistency loss and dual classification heads to standard CBMs. These are presented as novel architectural choices without any derivation that reduces them to fitted parameters or prior self-citations. Performance claims rest on experimental results rather than tautological redefinitions. No equations or load-bearing steps in the provided description equate outputs to inputs by construction, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, axioms, or invented entities; any hyperparameters such as loss weights are not named or quantified.

pith-pipeline@v0.9.0 · 5529 in / 1018 out tokens · 34685 ms · 2026-05-13T21:07:36.450607+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Tree-path KL Divergence loss ... enforces structural alignment between the predictions of the model and the hierarchical label structure

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

[1]

Basic objects in natural categories,

E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, and P. Boyes- Braem, “Basic objects in natural categories,”Cognitive psychology, vol. 8, no. 3, pp. 382–439, 1976

work page 1976
[2]

Categorizing concepts with basic level for vision-to-language,

H. Wang, H. Wang, and K. Xu, “Categorizing concepts with basic level for vision-to-language,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4962–4970

work page 2018
[3]

Pictures and names: Making the connection,

P. Jolicoeur, M. A. Gluck, and S. M. Kosslyn, “Pictures and names: Making the connection,”Cognitive psychology, vol. 16, no. 2, pp. 243– 275, 1984

work page 1984
[4]

Similar and different: The differ- entiation of basic-level categories

A. B. Markman and E. J. Wisniewski, “Similar and different: The differ- entiation of basic-level categories.”Journal of Experimental Psychology: Learning, memory, and cognition, vol. 23, no. 1, p. 54, 1997

work page 1997
[5]

Concept bottleneck models,

P. W. Koh, T. Nguyen, Y . S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang, “Concept bottleneck models,” inInternational conference on machine learning. PMLR, 2020, pp. 5338–5348

work page 2020
[6]

Coarse-to-fine concept bot- tleneck models,

K. Panousis, D. Ienco, and D. Marcos, “Coarse-to-fine concept bot- tleneck models,”Advances in Neural Information Processing Systems, vol. 37, pp. 105 171–105 199, 2024

work page 2024
[7]

Show and tell: Visually explainable deep neural nets via spatially-aware concept bottleneck models,

I. Benou and T. R. Raviv, “Show and tell: Visually explainable deep neural nets via spatially-aware concept bottleneck models,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 063–30 072

work page 2025
[8]

Vinet: A visually interpretable image diagnosis network,

D. Gu, Y . Li, F. Jiang, Z. Wen, S. Liu, W. Shi, G. Lu, and C. Zhou, “Vinet: A visually interpretable image diagnosis network,”IEEE Trans- actions on Multimedia, vol. 22, no. 7, pp. 1720–1729, 2020

work page 2020
[9]

Post-hoc concept bottleneck models

M. Yuksekgonul, M. Wang, and J. Zou, “Post-hoc concept bottleneck models,”arXiv preprint arXiv:2205.15480, 2022

work page arXiv 2022
[10]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),

B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas et al., “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),” inInternational conference on machine learning. PMLR, 2018, pp. 2668–2677

work page 2018
[11]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[12]

Label-free concept bottleneck models,

T. Oikarinen, S. Das, L. M. Nguyen, and T.-W. Weng, “Label-free concept bottleneck models,” inThe Eleventh International Conference on Learning Representations

work page
[13]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[14]

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks , shorttitle =

T. Oikarinen and T.-W. Weng, “Clip-dissect: Automatic description of neuron representations in deep vision networks,”arXiv preprint arXiv:2204.10965, 2022

work page arXiv 2022
[15]

Hybrid concept bottleneck models,

Y . Liu, T. Zhang, and S. Gu, “Hybrid concept bottleneck models,” in Proceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 20 179–20 189

work page 2025
[16]

Mining multilevel image seman- tics via hierarchical classification,

J. Fan, Y . Gao, H. Luo, and R. Jain, “Mining multilevel image seman- tics via hierarchical classification,”IEEE Transactions on Multimedia, vol. 10, no. 2, pp. 167–187, 2008

work page 2008
[17]

Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition,

Z. Yan, H. Zhang, R. Piramuthu, V . Jagadeesh, D. DeCoste, W. Di, and Y . Yu, “Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 2740–2748

work page 2015
[18]

Do convolutional neural networks learn class hierarchy?

A. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren, “Do convolutional neural networks learn class hierarchy?”IEEE transactions on visualiza- tion and computer graphics, vol. 24, no. 1, pp. 152–162, 2017

work page 2017
[19]

Use all the labels: A hierarchical multi-label contrastive learning framework,

S. Zhang, R. Xu, C. Xiong, and C. Ramaiah, “Use all the labels: A hierarchical multi-label contrastive learning framework,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 660–16 669

work page 2022
[20]

Chils: Zero- shot image classification with hierarchical label sets,

Z. Novack, J. McAuley, Z. C. Lipton, and S. Garg, “Chils: Zero- shot image classification with hierarchical label sets,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 26 342–26 362. 10

work page 2023
[21]

Bioclip: A vision foundation model for the tree of life,

S. Stevens, J. Wu, M. J. Thompson, E. G. Campolongo, C. H. Song, D. E. Carlyn, L. Dong, W. M. Dahdul, C. Stewart, T. Berger-Wolfet al., “Bioclip: A vision foundation model for the tree of life,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 412–19 424

work page 2024
[22]

Interpretable image recognition with hierarchical prototypes,

P. Hase, C. Chen, O. Li, and C. Rudin, “Interpretable image recognition with hierarchical prototypes,” inProceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, 2019, pp. 32–40

work page 2019
[23]

Hierarchical skin lesion image classifica- tion with prototypical decision tree,

Z. Yu, T. D. Nguyen, L. Ju, Y . Gal, M. Sashindranath, P. Bonnington, L. Zhang, V . Mar, and Z. Ge, “Hierarchical skin lesion image classifica- tion with prototypical decision tree,”npj Digital Medicine, vol. 8, no. 1, p. 26, 2025

work page 2025
[24]

Hierarchical prototype learning for zero-shot recognition,

X. Zhang, S. Gui, Z. Zhu, Y . Zhao, and J. Liu, “Hierarchical prototype learning for zero-shot recognition,”IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1692–1703, 2019

work page 2019
[25]

GPT-4 Technical Report

OpenAI, “Gpt-4 technical report,” 2023, openAI Technical Report No. 2023-03. [Online]. Available: https://doi.org/10.48550/arXiv.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
[26]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[27]

Learning reliable visual saliency for model explanations,

Y . Wang, H. Su, B. Zhang, and X. Hu, “Learning reliable visual saliency for model explanations,”IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1796–1807, 2019

work page 2019
[28]

Hierarchical dynamic masks for visual explanation of neural networks,

Y . Peng, L. He, D. Hu, Y . Liu, L. Yang, and S. Shang, “Hierarchical dynamic masks for visual explanation of neural networks,”IEEE trans- actions on multimedia, vol. 26, pp. 5311–5325, 2023

work page 2023
[29]

Bi-cam: Generating explanations for deep neural networks using bipolar information,

Y . Li, H. Liang, and R. Yu, “Bi-cam: Generating explanations for deep neural networks using bipolar information,”IEEE Transactions on Multimedia, vol. 26, pp. 568–580, 2023

work page 2023
[30]

Improving network interpretability via explanation consistency evaluation,

H. Wu, H. Jiang, K. Wang, Z. Tang, X. He, and L. Lin, “Improving network interpretability via explanation consistency evaluation,”IEEE Transactions on Multimedia, vol. 26, pp. 11 261–11 273, 2024

work page 2024
[31]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 618–626

work page 2017
[32]

Taking a hint: Leveraging explanations to make vision and language models more grounded,

R. R. Selvaraju, S. Lee, Y . Shen, H. Jin, S. Ghosh, L. Heck, D. Batra, and D. Parikh, “Taking a hint: Leveraging explanations to make vision and language models more grounded,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2591–2600

work page 2019
[33]

Casting your model: Learning to localize improves self-supervised representations,

R. R. Selvaraju, K. Desai, J. Johnson, and N. Naik, “Casting your model: Learning to localize improves self-supervised representations,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 058–11 067

work page 2021
[34]

Consistent explanations by contrastive learning,

V . Pillai, S. A. Koohpayegani, A. Ouligian, D. Fong, and H. Pirsiavash, “Consistent explanations by contrastive learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 213–10 222

work page 2022
[35]

Low-light image enhancement via clustering contrastive learning for visual recognition,

G. Sheng, G. Hu, X. Wang, W. Chen, and J. Jiang, “Low-light image enhancement via clustering contrastive learning for visual recognition,” Pattern Recognition, vol. 164, p. 111554, 2025

work page 2025
[36]

Leveraging sparse linear layers for debuggable deep networks,

E. Wong, S. Santurkar, and A. Madry, “Leveraging sparse linear layers for debuggable deep networks,” inInternational Conference on Machine Learning. PMLR, 2021, pp. 11 205–11 216

work page 2021
[37]

Visually con- sistent hierarchical image classification,

S. Park, Y . Zhang, X. Y . Stella, S. Beery, and J. Huang, “Visually con- sistent hierarchical image classification,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[38]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

work page 2009
[39]

The caltech-ucsd birds-200-2011 dataset,

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” 2011

work page 2011
[40]

Places: A 10 million image database for scene recognition,

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452–1464, 2017

work page 2017
[41]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255

work page 2009
[42]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016