Beyond Model Size: Probing the Gaps in Visual in-Context Learning by Training a Tiny Model
Pith reviewed 2026-06-27 13:29 UTC · model grok-4.3
The pith
A 1-million-parameter visual in-context model performs on par with models 7000 times larger on several adaptive tasks, showing that current benchmarks fail to isolate true adaptability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training a severely capacity-capped 1M-parameter visual in-context learning model on a modest dataset and comparing it directly to 7000-times-larger counterparts, the authors establish that existing evaluation protocols do not adequately capture adaptive capabilities with respect to task encoding, pre-training task selection, and metric choice.
What carries the argument
The 1-million-parameter visual in-context learning model trained on 70,000 images, deployed as an extreme low-capacity counterexample to test whether large scale is required for adaptability.
If this is right
- VICL progress reported on current benchmarks may overstate actual adaptability gains.
- Standardized task encodings and metric definitions become necessary before scaling claims can be trusted.
- Pre-training task choice must be reported and controlled when comparing adaptive performance.
- Small models can serve as useful probes for isolating benchmarking artifacts in adaptive vision.
Where Pith is reading between the lines
- Improved evaluation protocols could allow researchers to test adaptability without requiring massive compute.
- The same tiny-model probe could be applied to other modalities to check whether similar benchmarking gaps exist.
- Future work might prioritize data curation and encoding design over raw parameter count for in-context adaptation.
- The gap between reported and actual adaptability may slow progress until benchmarks are revised.
Load-bearing premise
Observed performance differences between the tiny model and much larger models can be attributed primarily to shortcomings in benchmarking rather than to the extreme difference in model capacity.
What would settle it
A re-evaluation in which the tiny model is given identical task encodings, pre-training tasks, and metrics as the large models and still shows large consistent deficits would falsify the central claim.
Figures
read the original abstract
Visual in-Context Learning (VICL) aims at making progress towards adaptive vision models, that can -- based on a few examples -- adapt to a new task at test-time. With the history of in-context learning in natural language processing research, where large, parameter-heavy models are in use, one pathway that current VICL methods take is model- and data-scaling as key ingredients. Yet, it is not clear, whether these ingredients are the key for in-context learning to take shape in vision models. To stress-test such large models, we challenge them with an extreme counterexample: we train a tiny visual in-context model with merely $1$ million parameters and a modest amount of $70,000$ images. We compare the results of this severely capacity capped tiny model to $7,000\times$ larger VICL models in different adaptive settings, (1) on image data with small distribution shifts, (2) on unseen task encodings and (3) on a completely new task, i.e., the setting VICL envisions. With the chasm of training resources between the tiny- and large models, our experiments showcase a lack in how adaptive capabilities are measured, with respect to how tasks are encoded, which tasks were used in pre-training and the choice of metrics. These gaps in current VICL benchmarking underscore a need for innovation in evaluation of adaptive capabilities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper trains a 1M-parameter visual in-context learning model on 70k images and compares its performance to 7000× larger VICL models across three adaptive settings (small distribution shifts, unseen task encodings, and completely new tasks). It concludes that the results expose deficiencies in current VICL benchmarking with respect to task encoding, pre-training task selection, and metrics.
Significance. If the tiny model's results can be shown to isolate benchmarking deficiencies from capacity limitations, the work would usefully redirect attention from model scaling toward improved evaluation protocols for adaptive vision capabilities.
major comments (2)
- [Abstract] Abstract: the central claim that the experiments 'showcase a lack in how adaptive capabilities are measured' rests on an empirical comparison, yet the abstract supplies no quantitative results, error analysis, or construction details for the three adaptive settings, preventing verification that performance differences can be attributed to benchmarking gaps rather than the 7000× capacity disparity.
- [Experimental results] Experimental comparison (new-task setting): the attribution of gaps to measurement practices rather than insufficient capacity for in-context adaptation requires evidence that the 1M model reaches non-trivial performance on the completely new task under controlled encodings; without such data the load-bearing assumption remains untested.
minor comments (1)
- [Abstract] Abstract: clarify whether the 70,000 images constitute the total training set or are allocated across tasks.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the experiments 'showcase a lack in how adaptive capabilities are measured' rests on an empirical comparison, yet the abstract supplies no quantitative results, error analysis, or construction details for the three adaptive settings, preventing verification that performance differences can be attributed to benchmarking gaps rather than the 7000× capacity disparity.
Authors: We agree that the abstract would benefit from additional quantitative details to support the central claim and facilitate verification. In the revised version, we will incorporate key performance metrics from the three adaptive settings, along with brief descriptions of the experimental constructions, while maintaining conciseness. revision: yes
-
Referee: [Experimental results] Experimental comparison (new-task setting): the attribution of gaps to measurement practices rather than insufficient capacity for in-context adaptation requires evidence that the 1M model reaches non-trivial performance on the completely new task under controlled encodings; without such data the load-bearing assumption remains untested.
Authors: The manuscript reports that the 1M model achieves non-trivial performance on the new task (exceeding random baselines under controlled encodings) in Section 4.3. This forms the basis for attributing gaps to benchmarking practices. To make this evidence more prominent, we will add explicit statements highlighting the non-trivial results relative to baselines. revision: partial
Circularity Check
No circularity: empirical comparison is self-contained
full rationale
The paper reports training a 1M-parameter model on 70k images and comparing its performance to 7000x larger VICL models across three settings. No equations, parameter fits, or derivations are present. The central claim—that observed gaps indicate deficiencies in task encoding, pre-training tasks, and metrics rather than capacity—is an interpretation of experimental outcomes, not a reduction to self-definition or self-citation. No load-bearing self-citations or ansatzes are invoked; the work is a direct empirical stress-test.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Performance of a severely capacity-capped model can be used to diagnose deficiencies in how adaptive capabilities are measured in much larger models.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Bai, Y., Geng, X., Mangalam, K., Bar, A., Yuille, A.L., Darrell, T., Malik, J., Efros, A.A.: Sequential modeling enables scalable learning for large vision models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22861–22872 (2024)
2024
-
[2]
Advances in neural information processing systems35, 25005–25017 (2022)
Bar, A., Gandelsman, Y., Darrell, T., Globerson, A., Efros, A.: Visual prompt- ing via image inpainting. Advances in neural information processing systems35, 25005–25017 (2022)
2022
-
[3]
In: Proceedings of the fourteenth international conference on artificial intelligence and statistics
Bengio, Y., Bastien, F., Bergeron, A., Boulanger-Lewandowski, N., Breuel, T., Chherawala, Y., Cisse, M., Cˆ ot´ e, M., Erhan, D., Eustache, J., et al.: Deep learners benefit more from out-of-distribution examples. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. pp. 164–172. JMLR Workshop and Conference...
2011
-
[4]
International Journal of Computer Vision129(4), 1038–1059 (2021)
Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision129(4), 1038–1059 (2021)
2021
-
[5]
On the Opportunities and Risks of Foundation Models
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[6]
In: DAGM German Conference on Pattern Recognition
Bratuli´ c, J., Mittal, S., Hoffmann, D.T., B¨ ohm, S., Schirrmeister, R.T., Ball, T., Rupprecht, C., Brox, T.: Unlocking in-context learning for natural datasets beyond language modelling. In: DAGM German Conference on Pattern Recognition. pp. 303–319. Springer (2025)
2025
-
[7]
Advances in neural information processing systems33, 1877–1901 (2020) 14 S
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020) 14 S. Khatri et al
1901
-
[8]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Butoi, V.I., Ortiz, J.J.G., Ma, T., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Uni- verseg: Universal medical image segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21438–21451 (2023)
2023
-
[9]
In: Proceedings of the IEEE/CVF international conference on computer vision
Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021)
2021
-
[10]
Advances in neural information processing systems7(1994)
Caruana, R.: Learning many related tasks at the same time with backpropagation. Advances in neural information processing systems7(1994)
1994
-
[11]
Advances in neural information processing systems35, 18878– 18891 (2022)
Chan, S., Santoro, A., Lampinen, A., Wang, J., Singh, A., Richemond, P., McClel- land, J., Hill, F.: Data distributional properties drive emergent in-context learning in transformers. Advances in neural information processing systems35, 18878– 18891 (2022)
2022
-
[12]
In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition
Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gordon, C., Schuhmann, C., Schmidt, L., Jitsev, J.: Reproducible scaling laws for contrastive language-image learning. In: Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition. pp. 2818–2829 (2023)
2023
-
[13]
In: Proc
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
2016
-
[14]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Czolbe, S., Dalca, A.V.: Neuralizer: General neuroimage analysis without re- training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6217–6230 (2023)
2023
-
[15]
In: Proceedings of the 6th ACM multimedia systems conference
Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: A raw images dataset for digital image forensics. In: Proceedings of the 6th ACM multimedia systems conference. pp. 219–224 (2015)
2015
-
[16]
In: International conference on machine learning
Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A.P., Caron, M., Geirhos, R., Alabdulmohsin, I., et al.: Scaling vision transformers to 22 billion parameters. In: International conference on machine learning. pp. 7480–7512. PMLR (2023)
2023
-
[17]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[18]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12873–12883 (2021)
2021
-
[19]
In: International conference on machine learning
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. pp. 1126–1135. PMLR (2017)
2017
-
[20]
arXiv preprint arXiv:2402.04841 (2024)
Guo, J., Hao, Z., Wang, C., Tang, Y., Wu, H., Hu, H., Han, K., Xu, C.: Data- efficient large vision models through sequential autoregression. arXiv preprint arXiv:2402.04841 (2024)
-
[21]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)
2022
-
[22]
Deep Learning Scaling is Predictable, Empirically
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Pat- wary, M.M.A., Yang, Y., Zhou, Y.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
DISCUSSION AND CONCLUSION 15
-
[24]
Training Compute-Optimal Large Language Models
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. arXiv preprint arXiv:2203.1555610(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[25]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Jiang, K., Wang, Z., Yi, P., Chen, C., Huang, B., Luo, Y., Ma, J., Jiang, J.: Multi- scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8346–8355 (2020)
2020
-
[26]
Scaling Laws for Neural Language Models
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[27]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
In: European conference on computer vision
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: European conference on computer vision. pp. 577–593. Springer (2016)
2016
-
[29]
In: European conference on computer vision
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740–755. Springer (2014)
2014
-
[30]
Vision research120, 93–107 (2016)
M´ ely, D.A., Kim, J., McGill, M., Guo, Y., Serre, T.: A systematic comparison between visual cues for boundary detection. Vision research120, 93–107 (2016)
2016
-
[31]
In: International Workshop on Efficient Medical Artificial Intelligence
Negrini, A., Reiß, S.: Conquering the retina: Bringing visual in-context learning to oct. In: International Workshop on Efficient Medical Artificial Intelligence. pp. 21–30. Springer (2025)
2025
-
[32]
In: Proceedings of the IEEE/CVF winter conference on applications of computer vision
Poma, X.S., Riba, E., Sappa, A.: Dense extreme inception network: Towards a robust cnn model for edge detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 1923–1932 (2020)
1923
-
[33]
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving lan- guage understanding by generative pre-training (2018)
2018
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Rakic, M., Wong, H.E., Ortiz, J.J.G., Cimini, B.A., Guttag, J.V., Dalca, A.V.: Ty- che: Stochastic in-context learning for medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11159–11173 (2024)
2024
-
[35]
SAM 2: Segment Anything in Images and Videos
Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K.V., Carion, N., Wu, C.Y., Girshick, R., Doll´ ar, P., Feichtenhofer, C.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024), https://arxiv.org/ abs/2408.00714
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
In: Inter- national conference on learning representations (2017)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: Inter- national conference on learning representations (2017)
2017
-
[37]
Reiß, S., Marinov, Z., Jaus, A., Seibold, C., Sarfraz, M.S., Rodner, E., Stiefelhagen, R.: Is visual in-context learning for compositional medical tasks within reach? In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2642–2652 (2025)
2025
-
[38]
In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition
Reiß, S., Seibold, C., Freytag, A., Rodner, E., Stiefelhagen, R.: Every annotation counts: Multi-label deep supervision for medical image segmentation. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9532–9542 (2021)
2021
-
[39]
In: European Conference on Computer Vision
Reiß, S., Seibold, C., Freytag, A., Rodner, E., Stiefelhagen, R.: Graph-constrained contrastive regularization for semi-weakly volumetric segmentation. In: European Conference on Computer Vision. pp. 401–419. Springer (2022) 16 S. Khatri et al
2022
-
[40]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Reiß, S., Seibold, C., Freytag, A., Rodner, E., Stiefelhagen, R.: Decoupled semantic prototypes enable learning from diverse annotation types for semi-weakly segmen- tation in expert-driven domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15495–15506 (2023)
2023
-
[41]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
2022
-
[42]
In: International Conference on Medical image computing and computer-assisted intervention
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
2015
-
[43]
Advances in neural information processing systems35, 25278–25294 (2022)
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large- scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)
2022
-
[44]
In: Proceedings of the AAAI conference on artificial intelligence
Seibold, C.M., Reiß, S., Kleesiek, J., Stiefelhagen, R.: Reference-guided pseudo- label generation for medical semantic segmentation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 2171–2179 (2022)
2022
-
[45]
Sim´ eoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
Advances in neural information processing systems33, 596–608 (2020)
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C.A., Cubuk, E.D., Kurakin, A., Li, C.L.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems33, 596–608 (2020)
2020
-
[47]
In: International Workshop on Deep Learning in Medical Image Analysis
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: International Workshop on Deep Learning in Medical Image Analysis. pp. 240–
-
[48]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1199–1208 (2018)
2018
-
[49]
Advances in neural information processing systems30(2017)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems30(2017)
2017
-
[50]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi` ere, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Advances in neural information processing systems30(2017)
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems30(2017)
2017
-
[52]
Advances in neural information pro- cessing systems30(2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)
2017
-
[53]
arXiv preprint arXiv:2305.01115 (2023), https://arxiv.org/abs/2305.01115
Wang, Z., Jiang, Y., Lu, Y., Shen, Y., He, P., Chen, W., Wang, Z., Zhou, M.: In- context learning unlocked for diffusion models. arXiv preprint arXiv:2305.01115 (2023), https://arxiv.org/abs/2305.01115
-
[54]
IEEE transactions on image processing 13(4), 600–612 (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)
2004
-
[55]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Xian, K., Zhang, J., Wang, O., Mai, L., Lin, Z., Cao, Z.: Structure-guided ranking loss for single image depth prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 611–620 (2020)
2020
-
[56]
DISCUSSION AND CONCLUSION 17
-
[57]
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? Advances in neural information processing systems27(2014)
2014
-
[58]
In: International confer- ence on medical image computing and computer-assisted intervention
Yu, L., Wang, S., Li, X., Fu, C.W., Heng, P.A.: Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: International confer- ence on medical image computing and computer-assisted intervention. pp. 605–613. Springer (2019)
2019
-
[59]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 12104–12113 (2022)
2022
-
[60]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 633–641 (2017)
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.