Recognition: unknown
Evaluation of Winning Solutions of 2025 Low Power Computer Vision Challenge
Pith reviewed 2026-05-10 03:08 UTC · model grok-4.3
The pith
Winning solutions in the 2025 challenge demonstrate viable low-power designs for image classification, open-vocabulary segmentation, and monocular depth estimation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the top-performing solutions across the three tracks achieve competitive accuracy while respecting low-power limits, as assessed through a unified evaluation process. It identifies recurring design patterns in the winning entries and concludes by recommending adjustments to the format of similar challenges to better encourage real-world applicability.
What carries the argument
The three competition tracks paired with a standardized evaluation framework that measures accuracy, latency, memory, and energy use of submitted models in a consistent manner.
If this is right
- Specialized optimizations allow image classification to remain reliable under varied lighting and style shifts within power budgets.
- Text-prompt-driven segmentation becomes feasible on constrained hardware without full retraining for every new category.
- Monocular depth estimation can run efficiently enough for real-time use on edge platforms.
- Common techniques from the winners point to reusable methods for trading minimal accuracy for large gains in efficiency.
- Incorporating the paper's suggestions would make future competitions more effective at surfacing deployable solutions.
Where Pith is reading between the lines
- Combining elements from the winning entries across tracks could yield hybrid models suitable for multi-task mobile applications.
- Repeating the evaluation on a wider set of hardware platforms would clarify which optimizations are portable versus hardware-specific.
- Widespread use of these efficient models in consumer devices would lower overall energy draw for features such as augmented reality and navigation.
Load-bearing premise
The challenge tracks and evaluation metrics capture the essential trade-offs that matter for actual deployment of vision models on low-power hardware.
What would settle it
Direct measurement of the top solutions on multiple real edge devices outside the original evaluation setup, checking whether accuracy and power figures match the reported results.
read the original abstract
The IEEE Low-Power Computer Vision Challenge (LPCVC) aims to promote the development of efficient vision models for edge devices, balancing accuracy with constraints such as latency, memory capacity, and energy use. The 2025 challenge featured three tracks: (1) Image classification under various lighting conditions and styles, (2) Open-Vocabulary Segmentation with Text Prompt, and (3) Monocular Depth Estimation. This paper presents the design of LPCVC 2025, including its competition structure and evaluation framework, which integrates the Qualcomm AI Hub for consistent and reproducible benchmarking. The paper also introduces the top-performing solutions from each track and outlines key trends and observations. The paper concludes with suggestions for future computer vision competitions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the structure and outcomes of the 2025 IEEE Low-Power Computer Vision Challenge (LPCVC), which included three tracks—image classification under varying lighting and styles, open-vocabulary segmentation with text prompts, and monocular depth estimation. It details the competition rules, the evaluation framework that uses the Qualcomm AI Hub to measure latency, memory, and energy consumption in a standardized manner, presents the top-ranked solutions from each track along with their key architectural choices, identifies observed trends in model efficiency, and concludes with recommendations for future low-power CV competitions.
Significance. If the reported measurements hold, the work provides a useful public record of state-of-the-art efficient vision models submitted to a standardized low-power benchmark in 2025. The integration of Qualcomm AI Hub for reproducible metrics across submissions is a clear methodological strength that supports comparability. The documentation of winning approaches and trends can inform subsequent research on accuracy-efficiency trade-offs for edge deployment. However, the absence of any independent validation of the Hub as a faithful proxy for physical-device behavior reduces the strength of claims about real-world low-power performance.
major comments (1)
- Evaluation framework (around the description of Qualcomm AI Hub integration): the manuscript presents the Hub measurements as the basis for ranking solutions and drawing trends about low-power CV performance, yet contains no side-by-side comparison with physical edge-device runs, no sensitivity analysis to quantization paths or thermal conditions, and no cross-check against alternative runtimes. Because the central claim is that the framework delivers consistent and meaningful low-power characterizations, this untested assumption is load-bearing and requires either empirical validation or explicit qualification of the results' scope.
minor comments (2)
- Abstract and introduction: the three tracks are named but their precise task definitions, input resolutions, and accuracy metrics are not summarized in one place; adding a compact table would improve readability.
- Top-solution descriptions: while architectures are outlined, the paper would benefit from explicit reporting of the final accuracy, latency, memory, and energy numbers for each winner (perhaps in a summary table) rather than relying solely on narrative.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript describing the 2025 LPCVC. The comment on the evaluation framework raises a valid point about the scope of the Qualcomm AI Hub results, which we address directly below. We have revised the manuscript to include explicit qualifications that clarify the boundaries of our claims while preserving the value of the standardized competition record.
read point-by-point responses
-
Referee: Evaluation framework (around the description of Qualcomm AI Hub integration): the manuscript presents the Hub measurements as the basis for ranking solutions and drawing trends about low-power CV performance, yet contains no side-by-side comparison with physical edge-device runs, no sensitivity analysis to quantization paths or thermal conditions, and no cross-check against alternative runtimes. Because the central claim is that the framework delivers consistent and meaningful low-power characterizations, this untested assumption is load-bearing and requires either empirical validation or explicit qualification of the results' scope.
Authors: We agree that the absence of direct physical-device comparisons and sensitivity analyses represents a limitation in fully validating the Hub as a proxy. The AI Hub was chosen to enable fair, reproducible benchmarking across all teams using a common Snapdragon emulation environment, avoiding the practical barriers of requiring identical physical hardware for every submission. In the revised manuscript, we add a dedicated paragraph in the evaluation framework section that explicitly qualifies the results: the reported latency, memory, and energy metrics are derived from the Hub's standardized simulations and should be interpreted as such; they do not include exhaustive sensitivity testing for thermal throttling, alternative quantization paths, or other runtimes. We further note that while the Hub is designed to approximate edge-device behavior, independent hardware validation would be a valuable extension for future competitions and lies beyond the scope of this paper's focus on documenting the 2025 challenge outcomes and trends. This revision directly addresses the load-bearing assumption by delimiting the claims. revision: yes
Circularity Check
No circularity: paper reports competition results and framework without derivations or predictions
full rationale
The manuscript describes the LPCVC 2025 challenge design, tracks, evaluation framework (integrating Qualcomm AI Hub for benchmarking), top solutions per track, and observed trends. No equations, first-principles derivations, predictions, or fitted parameters are presented. The reader's take explicitly states no derivations or predictions exist, only observed outcomes against an external platform. No self-citations, ansatzes, or renamings of results appear in the provided abstract or description that could reduce to inputs by construction. This is a standard competition report paper; the central claims rest on external measurements and submissions rather than internal self-referential logic.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Rebooting computing and low-power image recognition challenge,
Y .-H. Lu, A. M. Kadin, A. C. Berg, T. M. Conte, E. P . DeBenedictis, R. Garg, G. Gingade, B. Hoang, Y . Huang, B. Li, J. Liu, W. Liu, H. Mao, J. Peng, T. Tang, E. K. Track, J. Wang, T. Wang, Y . Wang, and J. Y ao, “Rebooting computing and low-power image recognition challenge,” inProceedings of the IEEE/ACM International Conference on Computer- Aided Des...
2015
-
[2]
Imagenet large scale visual recognition challenge,
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,”
-
[3]
Available: https://arxiv.org/abs/1409
[Online]. Available: https://arxiv.org/abs/1409. 0575
-
[4]
Microsoft COCO: Common Objects in Context
T.-Y . Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P . Perona, D. Ramanan, C. L. Zitnick, and P . Dollár, “Microsoft coco: Common objects in context,” 2015. [Online]. Available: https: //arxiv.org/abs/1405.0312
work page internal anchor Pith review arXiv 2015
-
[5]
The sixth visual object tracking vot2018 challenge results,
M. Kristan, A. Leonardis, J. Matas, M. Fels- berg, R. Pflugfelder, L. Cehovin Zajc, T. Vojir, G. Bhat, A. Lukezic, A. Eldesokey, G. Fernandez, A. Garcia-Martin, A. Iglesias-Arias, A. Aydin Ala- tan, A. Gonzalez-Garcia, A. Petrosino, A. Memar- moghadam, A. Vedaldi, A. Muhic, A. He, A. Smeul- ders, A. G. Perera, B. Li, B. Chen, C. Kim, C. Xu, C. Xiong, C. T...
2026
-
[6]
The 2017 DAVIS Challenge on Video Object Segmentation
J. Pont-Tuset, F . Perazzi, S. Caelles, P . Arbeláez, A. Sorkine-Hornung, and L. V. Gool, “The 2017 davis challenge on video object segmentation,” 2018. [Online]. Available: https://arxiv.org/abs/1704.00675
work page internal anchor Pith review arXiv 2017
-
[7]
Neurips competition track,
“Neurips competition track,” https://neurips.cc/ Conferences/2024/CompetitionTrack, accessed: 2025-07-20
2024
-
[8]
Low-power image recognition chal- lenge,
K. Gauen, R. Rangan, A. Mohan, Y .-H. Lu, W. Liu, and A. C. Berg, “Low-power image recognition chal- lenge,” in2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp. 99– 104
2017
-
[9]
Special session: 2018 low-power image recognition challenge and beyond,
M. Ardi, A. C. Berg, B. Chen, Y .-K. Chen, Y . Chen, D. Kang, J. Lee, S. Lee, Y . Lu, Y .-H. Lu, and F . Sun, “Special session: 2018 low-power image recognition challenge and beyond,” in2019 IEEE International Conference on Artificial Intelligence Circuits and Sys- tems (AICAS), 2019, pp. 154–157
2018
-
[10]
Low-power computer vision: Status, challenges, and opportunities,
S. Alyamkin, M. Ardi, A. C. Berg, A. Brighton, B. Chen, Y . Chen, H.-P . Cheng, Z. Fan, C. Feng, B. Fu, K. Gauen, A. Goel, A. Goncharenko, X. Guo, S. Ha, A. Howard, X. Hu, Y . Huang, D. Kang, J. Kim, J. G. Ko, A. Kondratyev, J. Lee, S. Lee, S. Lee, Z. Li, Z. Liang, J. Liu, X. Liu, Y . Lu, Y .-H. Lu, D. Malik, H. H. Nguyen, E. Park, D. Repin, L. Shen, T. S...
2019
-
[11]
The 2020 low-power computer vision chal- lenge,
X. Hu, M.-C. Chang, Y . Chen, R. Sridhar, Z. Hu, Y . Xue, Z. Wu, P . Pi, J. Shen, J. Tan, X. Lian, J. Liu, Z. Wang, C.-H. Liu, Y .-S. Han, Y .-Y . Sung, Y . Lee, K.-C. Wu, W.-X. Guo, R. Lee, S. Liang, Z. Wang, G. Ding, G. Zhang, T. Xi, Y . Chen, H. Cai, L. Zhu, Z. Zhang, S. Han, S. Jeong, Y . Kwon, T. Wang, and J. Pan, “The 2020 low-power computer vision ...
2020
-
[12]
Evolution of winning solutions in the 2021 low-power computer vision challenge,
X. Hu, Z. Jiao, A. Kocher, Z. Wu, J. Liu, J. C. Davis, G. K. Thiruvathukal, and Y .-H. Lu, “Evolution of winning solutions in the 2021 low-power computer vision challenge,”Computer, vol. 56, no. 8, pp. 28– 37, 2023
2021
-
[13]
2023 low-power computer vision challenge (lpcvc) summary,
L. Chen, B. Boardley, P . Hu, Y . Wang, Y . Pu, X. Jin, Y . Y ao, R. Gong, B. Li, G. Huang, X. Liu, Z. Wan, X. Chen, N. Liu, Z. Zhang, D. Liu, R. Shan, Z. Che, F . Zhang, X. Mou, J. Tang, M. Chuprov, I. Malofeev, A. Goncharenko, A. Shcherbin, A. Y anchenko, S. Alyamkin, X. Hu, G. K. Thiruvathukal, and Y . H. Lu, “2023 low-power computer vision challenge (...
-
[14]
G., Zhu, M., Zhmog inov, A., & Chen, L.-C
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” 2019. [Online]. Available: https://arxiv.org/abs/1801.04381
-
[15]
Sometimes I just look at pictures of the Earth from space and I marvel at how beautiful it all is
X. Zou, Z.-Y . Dou, J. Y ang, Z. Gan, L. Li, C. Li, X. Dai, H. Behl, J. Wang, L. Yuan, N. Peng, L. Wang, Y . J. Lee, and J. Gao, “Generalized decoding for pixel, image, and language,” 2022. [Online]. Available: https://arxiv.org/abs/2212.11270
-
[16]
Learning Transferable Visual Models From Natural Language Supervision
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P . Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv. org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[17]
L. Y ang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,” 2024. [Online]. Available: https://arxiv.org/abs/2406.09414
work page internal anchor Pith review arXiv 2024
-
[18]
Mobileclip: Fast image-text models through multi-modal reinforced training,
P . K. A. Vasu, H. Pouransari, F . Faghri, R. Vemulapalli, and O. Tuzel, “Mobileclip: Fast image-text models through multi-modal reinforced training,” 2024. [Online]. Available: https://arxiv.org/abs/2311.17049
-
[19]
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
R. Krishna, Y . Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y . Kalantidis, L.-J. Li, D. A. Shamma, M. S. Bernstein, and F .-F . Li, “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” 2016. [Online]. Available: https://arxiv.org/abs/1602.07332
work page Pith review arXiv 2016
-
[20]
LoRA: Low-Rank Adaptation of Large Language Models
E. J. Hu, Y . Shen, P . Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[21]
Visual prompt tuning,
M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,”
-
[22]
Available: https://arxiv.org/abs/2203
[Online]. Available: https://arxiv.org/abs/2203. 12119 Month 2026 Publication Title 11
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.