arxiv: 2605.06927 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling

Tony Tran , Richie R. Suganda , Bin Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:03 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords object detectionneural architecture searchenergy efficiencyedge computingmodel scalingYOLOcomputer visionheterogeneous hardware

0 comments

The pith

XiYOLO finds a base object detection architecture via iterative energy-aware search and then scales it to produce models with better energy-accuracy tradeoffs than YOLO baselines on edge hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that object detection can meet strict energy limits on varied edge devices by searching once for an efficient base model and then scaling it, all while using only a handful of real device measurements. It combines a specialized search space, a two-stage energy predictor that adapts quickly, and compound scaling to generate a family of models. If the approach holds, practitioners could deploy accurate detectors on GPUs, NPUs, and similar hardware without exhaustive per-device profiling or large accuracy losses. The reported experiments on PascalVOC and COCO quantify the gains, including 86.15 mAP50 for the medium model alongside 20.6 percent lower GPU energy than YOLOv12m.

Core claim

An energy-adaptive framework that pairs an energy-aware XiResOFA search space, iterative search driven by a two-stage energy estimator, and compound scaling produces the XiYOLO family; this family delivers stronger energy-accuracy tradeoffs than YOLO baselines, for example reaching 86.15 mAP50 on PascalVOC while cutting energy 20.6 percent on GPU and 35.9 percent on NPU relative to YOLOv12m, and up to 53.7 percent GPU energy reduction on COCO at small scale, all under sparse hardware sampling of 2-20 target-device examples.

What carries the argument

Iterative search over the energy-aware XiResOFA space guided by a two-stage energy estimator, followed by compound scaling of the resulting base architecture.

If this is right

On PascalVOC the medium XiYOLO model reaches 86.15 mAP50 while using 20.6 percent less GPU energy and 35.9 percent less NPU energy than YOLOv12m.
On COCO the small XiYOLO variant reduces energy by as much as 53.7 percent on GPU and 51.6 percent on NPU relative to YOLOv12.
The two-stage estimator reaches higher sample efficiency than a single joint predictor when only a few target-device measurements are available.
The resulting model family supplies clear accuracy-energy operating points that can be chosen according to deployment budgets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same base-plus-scaling pattern could be applied to other dense prediction tasks such as semantic segmentation on the same hardware.
Because the estimator needs only a small number of samples, the framework may shorten the time required to port a detector to a new chip generation.
The separation of search from scaling makes it possible to keep one searched backbone and produce multiple accuracy tiers without repeating the full search on each budget.

Load-bearing premise

The two-stage energy estimator, adapted with only 2-20 real target-device samples, correctly predicts energy use for architectures never seen during the search and across different hardware platforms.

What would settle it

Running the searched XiYOLO models on a fresh device outside the 2-20 sample set and finding that measured energy differs substantially from the estimator's predictions or that accuracy falls below the YOLO baselines at the same energy budget.

Figures

Figures reproduced from arXiv: 2605.06927 by Bin Hu, Richie R. Suganda, Tony Tran.

**Figure 1.** Figure 1: Energy-aware NAS framework. Candidate detectors from a searchable energy-aware architecture space are ranked by an mAP proxy and a two-stage energy estimator, then refined iteratively to obtain scalable models under hardware-specific energy constraints. deployment-dependent costs such as latency or memory into the search objective [6, 30, 37]. In object detection, prior work has searched detector backbones… view at source ↗

**Figure 2.** Figure 2: Comparison between (a) the standard bottleneck block [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Elastic architectural choices in XiResOFA: selectable (a) compression ratios, (b) kernel [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the proposed two-stage energy estimator. Sparse energy samples from the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comparisons with other YOLO-series [18, 32] on PascalVOC dataset in terms of energy and accuracy on the GPU (left) and NPU (right) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Comparisons with other YOLO-series [18, 32] on COCO dataset in terms of energy and accuracy on the GPU (left) and NPU (right). 4.2 Accuracy and Energy Tradeoffs We first evaluate the deployment quality of the searched models by comparing detection accuracy and energy jointly. Figures 5 and 6 compare XiYOLO against YOLOv8 and YOLOv12 on PascalVOC and COCO, respectively, on the Qualcomm Adreno 650 GPU and th… view at source ↗

**Figure 7.** Figure 7: Energy consumption vs. time benchmarks against other YOLO-series [ [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Two-Stage Energy Estimator vs. Joint Model under few-shot adaptation with 2–20 samples [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Energy and accuracy characteristics of the medium search space. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Full energy consumption vs. time benchmarks against other YOLO-series [ [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Two-Stage Energy Estimator vs. Joint Model on few data points from 2–20 samples on [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Latency–energy comparison between XiYOLO and YOLO12 on the Qualcomm Adreno [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

read the original abstract

Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints while still providing reliable perception for downstream autonomy. Existing energy-aware NAS methods often target limited deployment settings, while real energy remains difficult to optimize because it is highly device-dependent and costly to measure. We address these challenges with an energy-adaptive framework that combines an energy-aware XiResOFA search space, a two-stage energy estimator, and iterative search to identify a single energy-efficient base architecture. We then apply compound scaling to transform this base design into the XiYOLO family across deployment budgets, enabling interpretable accuracy-energy tradeoffs under sparse hardware measurements. Experiments on PascalVOC, COCO, and real-device deployment show that XiYOLO achieves a stronger energy-accuracy tradeoff than YOLO baselines. On PascalVOC, the medium XiYOLO model reaches 86.15 mAP50 while reducing energy relative to YOLOv12m by 20.6% on GPU and 35.9% on NPU. On COCO, XiYOLO reduces energy relative to YOLOv12 by up to 53.7% on GPU and 51.6% on NPU at the small scale. The proposed two-stage estimator also improves sample efficiency over a joint predictor under few-shot adaptation with only 2-20 target-device samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XiYOLO gives a practical NAS-plus-scaling recipe for energy-efficient YOLO variants with real-device numbers, but the two-stage estimator's accuracy on unseen search outputs is the unverified hinge.

read the letter

XiYOLO finds a base architecture through iterative search in an energy-aware XiResOFA space, then uses compound scaling to generate a family of models. The two-stage estimator adapts to a target device with only 2-20 measurements, which is the piece meant to make the whole thing work under sparse hardware data. On PascalVOC the medium model hits 86.15 mAP50 while cutting energy 20.6% on GPU and 35.9% on NPU versus YOLOv12m; COCO shows larger relative savings at the small scale. Those are the concrete claims that matter for edge deployment.

Referee Report

3 major / 2 minor

Summary. The paper proposes XiYOLO, an energy-aware object detection family obtained via iterative NAS in a custom XiResOFA search space. A two-stage energy estimator is adapted with only 2-20 real-device samples to guide search for a base architecture, which is then compound-scaled to produce models at different budgets. Experiments on PascalVOC and COCO report stronger energy-accuracy tradeoffs than YOLOv12 baselines, including 86.15 mAP50 with 20.6% GPU / 35.9% NPU energy reduction for the medium model on PascalVOC and up to 53.7% GPU savings on COCO, while claiming improved sample efficiency for the estimator.

Significance. If the two-stage estimator reliably ranks and predicts energy for search-discovered architectures under few-shot adaptation, the framework offers a practical route to energy-efficient detectors on heterogeneous edge hardware with minimal measurement overhead. The combination of iterative search and interpretable compound scaling could support reproducible energy-accuracy frontiers in real-device NAS.

major comments (3)

[§4 (Experiments)] §4 (Experiments) and results tables: the headline energy savings (20.6% GPU / 35.9% NPU on PascalVOC medium model; 53.7% GPU on COCO) are reported without error bars, multiple random seeds, or statistical tests, despite the stochastic nature of NAS and hardware power measurements; this weakens confidence that the gains exceed measurement noise.
[§3.2 (two-stage energy estimator)] §3.2 (two-stage energy estimator): the central claim that 2-20 target-device samples suffice for accurate prediction on out-of-distribution NAS architectures is load-bearing for all reported savings, yet the manuscript provides no hold-out MAE, correlation, or ranking-error metrics comparing estimator predictions to real-device measurements on architectures discovered after adaptation.
[§4.3 (real-device deployment)] §4.3 (real-device deployment): the iterative search optimizes against the adapted estimator, but no ablation is shown on whether final real-device energy measurements match estimator predictions for the selected XiYOLO models versus the YOLO baselines under identical conditions (batch size, input resolution, power sampling protocol).

minor comments (2)

[Figures] Figure captions and axis labels for energy-accuracy Pareto curves should explicitly state the hardware platform, measurement tool, and number of runs averaged.
[§3.1] The definition of the XiResOFA search space and the exact compound scaling coefficients should be moved to a dedicated subsection or appendix for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of experimental rigor and estimator validation. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments) and results tables: the headline energy savings (20.6% GPU / 35.9% NPU on PascalVOC medium model; 53.7% GPU on COCO) are reported without error bars, multiple random seeds, or statistical tests, despite the stochastic nature of NAS and hardware power measurements; this weakens confidence that the gains exceed measurement noise.

Authors: We agree that the lack of error bars, multiple seeds, and statistical tests reduces confidence in the reported savings given the stochastic elements involved. In the revised version, we will rerun the NAS and hardware measurements across multiple random seeds (at least 3-5), report means with standard deviations in the tables, and include statistical significance tests (e.g., paired t-tests or Wilcoxon tests) comparing XiYOLO to YOLOv12 baselines. These additions will appear in §4 and the results tables. revision: yes
Referee: [§3.2 (two-stage energy estimator)] §3.2 (two-stage energy estimator): the central claim that 2-20 target-device samples suffice for accurate prediction on out-of-distribution NAS architectures is load-bearing for all reported savings, yet the manuscript provides no hold-out MAE, correlation, or ranking-error metrics comparing estimator predictions to real-device measurements on architectures discovered after adaptation.

Authors: The current manuscript validates the estimator primarily through end-to-end deployment results and sample-efficiency comparisons, but we acknowledge the value of direct hold-out metrics on post-adaptation architectures. We will add a dedicated evaluation in §3.2 (or a new subsection) reporting MAE, Pearson/Spearman correlation, and ranking error (e.g., Kendall tau) on a hold-out set of architectures discovered by the iterative search, measured against real-device energy under the same few-shot adaptation protocol. revision: yes
Referee: [§4.3 (real-device deployment)] §4.3 (real-device deployment): the iterative search optimizes against the adapted estimator, but no ablation is shown on whether final real-device energy measurements match estimator predictions for the selected XiYOLO models versus the YOLO baselines under identical conditions (batch size, input resolution, power sampling protocol).

Authors: We will add an ablation study in §4.3 that directly compares real-device energy measurements (under fixed batch size, input resolution, and power sampling protocol) of the final selected XiYOLO models and YOLO baselines against the predictions from the adapted two-stage estimator. This will include quantitative error metrics (MAE, relative error) and a discussion of any discrepancies to confirm the estimator's reliability for the chosen architectures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical NAS and real-device validation are self-contained

full rationale

The paper's core method is an iterative architecture search over an energy-aware search space using a two-stage estimator adapted via 2-20 target-device samples, followed by compound scaling of a discovered base model and direct experimental comparison against YOLO baselines on PascalVOC, COCO, and physical GPU/NPU hardware. No derivation, equation, or claimed prediction reduces by construction to its own fitted inputs, self-citations, or renamed empirical patterns; final energy-accuracy numbers are obtained from real measurements rather than surrogate outputs alone. The approach therefore remains externally falsifiable and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claims rest on standard deep-learning training assumptions plus two custom components whose correctness is not independently verified outside the reported experiments.

free parameters (1)

compound scaling coefficients
Factors that control depth, width, and resolution when expanding the base architecture to different energy budgets; chosen to match deployment targets.

axioms (1)

domain assumption A two-stage energy estimator trained on 2-20 device samples can generalize to unseen architectures and hardware
Invoked to justify avoiding exhaustive real-device measurements during search.

invented entities (1)

XiResOFA search space no independent evidence
purpose: To embed energy awareness directly into the architecture search for object detection backbones
Custom search space introduced to address limitations of prior energy-agnostic NAS methods.

pith-pipeline@v0.9.0 · 5542 in / 1438 out tokens · 57452 ms · 2026-05-11T01:03:14.248340+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

two-stage energy estimator that combines a generic architecture predictor with a lightweight device-specific residual model
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

iterative search procedure that progressively refines detector components

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 26 canonical work pages

[1]

Alqahtani, Muhammad Aamir Cheema, and Adel N

Daghash K. Alqahtani, Muhammad Aamir Cheema, and Adel N. Toosi. Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices. In Walid Gaaloul, Michael Sheng, Qi Yu, and Sami Yangui, editors,Service-Oriented Computing, pages 142–150, Singa- pore, 2025. Springer Nature. ISBN 978-981-96-0805-8. doi: 10.1007/978-981-96-0805-8_11

work page doi:10.1007/978-981-96-0805-8_11 2025
[2]

XiNet: Efficient Neural Networks for tinyML

Alberto Ancilotto, Francesco Paissan, and Elisabetta Farella. XiNet: Efficient Neural Networks for tinyML. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 16968–16977, 2023. URL https://openaccess.thecvf.com/content/ICCV2023/ html/Ancilotto_XiNet_Efficient_Neural_Networks_for_tinyML_ICCV_2023_ paper.html

2023
[3]

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Mi- crocontrollers.Proceedings of Machine Learning and Systems, 3:517–532, March

Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Mi- crocontrollers.Proceedings of Machine Learning and Systems, 3:517–532, March
[4]

URL https://proceedings.mlsys.org/paper_files/paper/2021/hash/ c4d41d9619462c534b7b61d1f772385e-Abstract.html

2021
[5]

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs.IEEE Transactions on Computers, 70(8):1253–1268, August 2021

Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, and Francesco Conti. DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs.IEEE Transactions on Computers, 70(8):1253–1268, August 2021. ISSN 1557-9956. doi: 10.1109/TC.2021.3066883. URL https://ieeexplore.ieee.org/ document/9381618

work page doi:10.1109/tc.2021.3066883 2021
[6]

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks. InProceedings of the Ninth Asian Conference on Machine Learning, pages 622–637. PMLR, November 2017. URL https://proceedings.mlr.press/v77/cai17a.html

2017
[7]

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In2018 International Conference on Learning Representations (ICLR), September 2018. URLhttps://openreview.net/forum?id=HylVB3AqYm

2018
[8]

Once for All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. Once for All: Train One Network and Specialize it for Efficient Deployment. In2020 International Conference on Learning Representations (ICLR), April 2020. URL https://iclr.cc/virtual_2020/ poster_HylxE1HKwS.html

2020
[9]

Hardware-Aware Iterative One-Shot Neural Architecture Search With Adaptable Knowl- edge Distillation for Efficient Edge Computing.IEEE Access, 13:54204–54222, 2025

Oscal Tzyh-Chiang Chen, Yu-Xuan Chang, Chih-Yu Chung, Ya-Yun Cheng, and Manh-Hung HA. Hardware-Aware Iterative One-Shot Neural Architecture Search With Adaptable Knowl- edge Distillation for Efficient Edge Computing.IEEE Access, 13:54204–54222, 2025. ISSN 2169-3536. doi: 10.1109/ACCESS.2025.3554185. URL https://ieeexplore.ieee.org/ document/10938148/

work page doi:10.1109/access.2025.3554185 2025
[10]

DetNAS: Backbone Search for Object Detection

Yukang Chen, Tong Yang, Xiangyu Zhang, GAOFENG MENG, Xinyu Xiao, and Jian Sun. DetNAS: Backbone Search for Object Detection. InAdvances 10 in Neural Information Processing Systems, volume 32. Curran Associates, Inc.,
[11]

URL https://proceedings.neurips.cc/paper_files/paper/2019/hash/ 228b25587479f2fc7570428e8bcbabdc-Abstract.html

2019
[12]

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

Xuanyi Dong and Yi Yang. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search. In2019 International Conference on Learning Representations (ICLR), September 2019. URLhttps://openreview.net/forum?id=HJxyZkBKDr

2019
[13]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The Pascal Visual Object Classes (VOC) Challenge.Int J Comput Vis, 88 (2):303–338, June 2010. ISSN 1573-1405. doi: 10.1007/s11263-009-0275-4. URL https://doi.org/10.1007/s11263-009-0275-4

work page doi:10.1007/s11263-009-0275-4 2010
[14]

[Xia26] Ziqing Xiang

Wanxuan Geng, Junfan Yi, and Liang Cheng. An efficient detector for maritime search and rescue object based on unmanned aerial vehicle images.Displays, 87:102994, April 2025. ISSN 0141-9382. doi: 10.1016/j.displa.2025.102994. URL https://www.sciencedirect.com/ science/article/pii/S0141938225000319

work page doi:10.1016/j.displa.2025.102994 2025
[15]

Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V . Le. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7029–7038, June 2019. doi: 10.1109/CVPR.2019.00720. URLhttps://ieeexplore.ieee.org/document/8954436. ISSN: 2575-7075

work page doi:10.1109/cvpr.2019.00720 2019
[16]

Estimating energy consumption of neural networks with joint Structure–Device encoding.Sustainable Computing: Informatics and Systems, 45:101062, January 2025

Chaopeng Guo, Shiyu Wang, Ruolan Xie, and Jie Song. Estimating energy consumption of neural networks with joint Structure–Device encoding.Sustainable Computing: Informatics and Systems, 45:101062, January 2025. ISSN 2210-5379. doi: 10.1016/j.suscom.2024.101062. URLhttps://www.sciencedirect.com/science/article/pii/S2210537924001070

work page doi:10.1016/j.suscom.2024.101062 2025
[17]

In: 2024 IEEE Conference on Artificial Intelligence (CAI)

Diksha Gupta, Rhui Dih Lee, and Laura Wynter. On Efficient Object-Detection NAS for ADAS on Edge devices. In2024 IEEE Conference on Artificial Intelligence (CAI), pages 1005–1010, June 2024. doi: 10.1109/CAI59869.2024.00183. URL https://ieeexplore.ieee.org/ document/10605392

work page doi:10.1109/cai59869.2024.00183 2024
[18]

Deep Residual Learning for Image Recognition , isbn =

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016. doi: 10.1109/CVPR.2016.90. URL https://ieeexplore.ieee. org/document/7780459. ISSN: 1063-6919

work page doi:10.1109/cvpr.2016.90 2016
[19]

YOLOv5 by Ultralytics, May 2020

Glenn Jocher. YOLOv5 by Ultralytics, May 2020. URL https://github.com/ ultralytics/yolov5

2020
[20]

YOLO by Ultralytics, May 2026

Glenn Jocher. YOLO by Ultralytics, May 2026. URL https://github.com/ultralytics/ ultralytics

2026
[21]

HW-NAS-BENCH: HARDW ARE-AW ARE NEURAL AR- CHITECTURE SEARCH BENCHMARK

Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, and Yingyan Lin. HW-NAS-BENCH: HARDW ARE-AW ARE NEURAL AR- CHITECTURE SEARCH BENCHMARK. In2021 International Conference on Learning Representations (ICLR), 2021

2021
[22]

Chan Yue Liew, Joanne Mun-Yee Lim, Chee Pin Tan, and Raja Mazhar Mohar Bin Tun Mo- har. Altitude-informed fusion pyramid network for multi-scale waste detection in un- manned aerial vehicle images.Engineering Applications of Artificial Intelligence, 153: 110814, August 2025. ISSN 0952-1976. doi: 10.1016/j.engappai.2025.110814. URL https://www.sciencedirec...

work page doi:10.1016/j.engappai.2025.110814 2025
[23]

MCUNet: Tiny Deep Learning on IoT Devices

Ji Lin, Wei-Ming Chen, Yujun Lin, john cohn, Chuang Gan, and Song Han. MCUNet: Tiny Deep Learning on IoT Devices. InAdvances in Neural Information Processing Systems, volume 33, pages 11711–11722. Curran Associates, Inc., 2020. URL https://proceedings.neurips. cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html

2020
[24]

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944, July 2017. doi: 10.1109/CVPR.2017.106. URLhttps://ieeexplore.ieee.org/document/8099589. ISSN: 1063-6919. 11

work page doi:10.1109/cvpr.2017.106 2017
[25]

Path Aggregation Network for Instance Segmentation

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path Aggregation Network for Instance Segmentation. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759–8768, June 2018. doi: 10.1109/CVPR.2018.00913. URL https: //ieeexplore.ieee.org/document/8579011. ISSN: 2575-7075

work page doi:10.1109/cvpr.2018.00913 2018
[26]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot MultiBox Detector. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors,Computer Vision – ECCV 2016, pages 21–37, Cham, 2016. Springer International Publishing. ISBN 978-3-319-46448-0. doi: 10.1007/ 978-3-319-46448-0_2

2016
[27]

Ortner, L

Julian Moosmann, Marco Giordano, Christian V ogt, and Michele Magno. TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers. In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 1–5, June 2023. doi: 10.1109/AICAS57966.2023.10168657. URL https://ie...

work page doi:10.1109/aicas57966.2023.10168657 2023
[28]

Ultra-Low Power DNN Accelerators for IoT: Resource Characterization of the MAX78000

Arthur Moss, Hyunjong Lee, Lei Xun, Chulhong Min, Fahim Kawsar, and Alessandro Montanari. Ultra-Low Power DNN Accelerators for IoT: Resource Characterization of the MAX78000. InProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, SenSys ’22, pages 934–940, New York, NY , USA, January 2023. Association for Computing Machinery. ISBN ...

work page doi:10.1145/3560905.3568300 2023
[29]

LEAF-YOLO: Lightweight Edge-Real-Time Small Object Detection on Aerial Imagery.Intelligent Systems with Appli- cations, 25:200484, March 2025

Van Quang Nghiem, Huy Hoang Nguyen, and Minh Son Hoang. LEAF-YOLO: Lightweight Edge-Real-Time Small Object Detection on Aerial Imagery.Intelligent Systems with Appli- cations, 25:200484, March 2025. ISSN 2667-3053. doi: 10.1016/j.iswa.2025.200484. URL https://www.sciencedirect.com/science/article/pii/S2667305325000109

work page doi:10.1016/j.iswa.2025.200484 2025
[30]

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only Look Once: Unified, Real-Time Object Detection. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, June 2016. doi: 10.1109/CVPR.2016.91. URL https://ieeexplore.ieee.org/document/7780460. ISSN: 1063-6919

work page doi:10.1109/cvpr.2016.91 2016
[31]

A., Siebert, X., Cornu, O., and Vleeschouwer, C

Yuiko Sakuma, Masato Ishii, and Takuya Narihira. DetOFA: Efficient Training of Once-for-All Networks for Object Detection using Path Filter. In2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1325–1334, October 2023. doi: 10.1109/ ICCVW60793.2023.00143. URL https://ieeexplore.ieee.org/document/10350829. ISSN: 2473-9944

work page arXiv 2023
[32]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V . Le. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2820–2828, 2019. URL https://openaccess.thecvf.com/content_CVPR_2019/html/ Tan_MnasNet_Platform-Aware_Neural_...

2019
[33]

Mingxing Tan, Ruoming Pang, and Quoc V . Le. EfficientDet: Scalable and Efficient Object Detection. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10778–10787, June 2020. doi: 10.1109/CVPR42600.2020.01079. URL https: //ieeexplore.ieee.org/document/9156454. ISSN: 2575-7075

work page doi:10.1109/cvpr42600.2020.01079 2020
[34]

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, and David Doermann. YOLOv12: Attention-Centric Real-Time Object Detectors. InAdvances in Neural Information Processing Systems, October 2025. URL https://openreview.net/forum?id=gCvByDI4FN

2025
[35]

FCOS: A Simple and Strong Anchor- Free Object Detector.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (4):1922–1933, April 2022

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. FCOS: A Simple and Strong Anchor- Free Object Detector.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (4):1922–1933, April 2022. ISSN 1939-3539. doi: 10.1109/TPAMI.2020.3032166. URL https://ieeexplore.ieee.org/document/9229517. 12

work page doi:10.1109/tpami.2020.3032166 1922
[36]

ELASTIC: Efficient Once For All Iterative Search for Object Detection on Microcontrollers.IEEE Transactions on Computers, (01):1–8, March 2026

Tony Tran, Qin Lin, and Bin Hu. ELASTIC: Efficient Once For All Iterative Search for Object Detection on Microcontrollers.IEEE Transactions on Computers, (01):1–8, March 2026. ISSN 0018-9340. doi: 10.1109/TC.2026.3678184. URL https://www.computer.org/csdl/ journal/tc/5555/01/11456502/2faQWQg7unK

work page doi:10.1109/tc.2026.3678184 2026
[37]

2023 , url =

Peng Tu, Xu Xie, Guo Ai, Yuexiang Li, Yawen Huang, and Yefeng Zheng. FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13272–13281, October 2023. doi: 10.1109/ICCV51070.2023.01225. URL https://ieeexplore.ieee.org/document/ 10376762. ISSN: 2380-7504

work page doi:10.1109/iccv51070.2023.01225 2023
[38]

moco , url=

Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, and Yanning Zhang. NAS-FCOS: Fast Neural Architecture Search for Object Detection. In2020 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 11940–11948, June 2020. doi: 10.1109/CVPR42600.2020.01196. URL https://ieeexplore.ieee.org/document/ 9156326. ISSN: 2575-7075

work page doi:10.1109/cvpr42600.2020.01196 2020
[39]

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10726–10734, June 2019. doi: 10.1109/CVPR.20...

work page doi:10.1109/cvpr.2019.01099 2019
[40]

2021 , url =

Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, and Bo Chen. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators. In2021 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3824–3833, June 2021. doi: 10.1109/CVPR464...

work page doi:10.1109/cvpr46437.2021.00382 2021
[41]

AMPERE: A Generic Energy Estimation Approach for On-Device Training.SIGMETRICS Perform

Jiaru Zhang, Zesong Wang, Hao Wang, Tao Song, Huai-an Su, Rui Chen, Yang Hua, Xiangwei Zhou, Ruhui Ma, Miao Pan, and Haibing Guan. AMPERE: A Generic Energy Estimation Approach for On-Device Training.SIGMETRICS Perform. Eval. Rev., 53(2):27–32, August
[42]

doi: 10.1145/3764944.3764951

ISSN 0163-5999. doi: 10.1145/3764944.3764951. URL https://dl.acm.org/doi/ 10.1145/3764944.3764951

work page doi:10.1145/3764944.3764951
[43]

SWDet: Anchor-Based Object Detector for Solid Waste Detection in Aerial Images.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16:306–320,

Liming Zhou, Xiaohan Rao, Yahui Li, Xianyu Zuo, Yang Liu, Yinghao Lin, and Yong Yang. SWDet: Anchor-Based Object Detector for Solid Waste Detection in Aerial Images.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16:306–320,
[44]

doi: 10.1109/JSTARS.2022.3218958

ISSN 2151-1535. doi: 10.1109/JSTARS.2022.3218958. URL https://ieeexplore. ieee.org/document/9935119. 13 A Full Deployment Energy Results Figure 10 reports the full cumulative-energy trajectories for all nano, small, and medium models on the Qualcomm Adreno 650 GPU and the 15 TOPS NPU. These results complement the main- text medium-scale comparison by show...

work page doi:10.1109/jstars.2022.3218958 2022