FDDet: Achieving Data-Efficient Food Defect Detection Under Real-World Scenarios

Ruihao Xu; Yansong Tang; Yong Liu

arxiv: 2605.24508 · v1 · pith:BFV3BLPBnew · submitted 2026-05-23 · 💻 cs.CV

FDDet: Achieving Data-Efficient Food Defect Detection Under Real-World Scenarios

Ruihao Xu , Yong Liu , Yansong Tang This is my paper

Pith reviewed 2026-06-30 13:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords food defect detectionsemi-supervised learningdata augmentationpseudo-label calibrationobject detectionFDD-48 datasetBBoxMixUpCGPC

0 comments

The pith

FDDet improves food defect detection accuracy under limited labeled data by mixing same-category defect regions and calibrating pseudo-labels for consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the FDD-48 dataset covering 13 food types and 48 defect categories with fine-grained annotations under diverse real conditions. It proposes FDDet, a semi-supervised detector built around BBoxMixUp for data augmentation and CGPC for pseudo-label refinement. BBoxMixUp mixes defect bounding boxes from the same category to cut down on spurious feature links, while CGPC filters pseudo-labels by checking consistency within each sample. These elements target the data scarcity problem common in food quality control. If the approach holds, automated inspection systems could achieve reliable performance without large amounts of manually labeled examples.

Core claim

The paper claims that FDDet significantly outperforms mainstream detectors on FDD-48 by using BBoxMixUp, a data augmentation technique that mixes same-category defect regions to reduce spurious feature associations, and CGPC, which filters pseudo-labels based on intra-sample consistency, thereby enabling effective detection in data-limited scenarios.

What carries the argument

BBoxMixUp and CGPC: BBoxMixUp mixes defect regions from same-category examples during training to reduce irrelevant feature correlations, while CGPC applies consistency checks inside individual samples to calibrate and filter pseudo-labels in the semi-supervised loop.

If this is right

FDDet delivers higher detection accuracy than standard object detectors when only limited labeled food defect images are available.
The framework supports detection across 13 food types and 48 distinct defect categories collected under varied real-world conditions.
Reduced reliance on large labeled datasets makes automated quality control more practical for food production lines.
The two components together address both data augmentation and reliable pseudo-label usage in a semi-supervised setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The mixing and consistency ideas could transfer to defect detection tasks in other domains such as manufacturing parts or medical imaging.
Combining FDDet with additional unlabeled data sources from production lines might further lower labeling costs.
Testing the method on datasets that include temporal sequences or multi-view images would reveal whether the consistency calibration generalizes.

Load-bearing premise

The performance gains come specifically from the designs of BBoxMixUp and CGPC rather than from other training choices or properties of the FDD-48 dataset itself.

What would settle it

An ablation experiment that disables both BBoxMixUp and CGPC while keeping all other training details identical and still shows FDDet outperforming mainstream detectors on FDD-48 would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.24508 by Ruihao Xu, Yansong Tang, Yong Liu.

**Figure 1.** Figure 1: FDD-48: A food defect detection dataset with fine-grained annotations to closely align with practical application [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 3.** Figure 3: CGPC applies multi-dimensional consistency calibra [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: Illustration of BBoxMixUp. (a) Mixing Candidate Selection. Bounding boxes of the same class are pooled and randomly paired for mixing. (b) Localized Mixing. Regions from different images are blended to create new instance. inputting a vocabulary list of the 13 fruit names as a prompt, YOLO-World generated bounding box annotations for both normal and defective fruits within the images. Subsequently, we cond… view at source ↗

**Figure 4.** Figure 4: More sample images from FDD-48, demonstrating the diversity of food types and defect categories. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrative examples for each defect category in FDD-48, showing the visual characteristics of different food defects. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Food defect detection is critical for automated quality control, yet existing studies lack unified benchmarks and suffer from data scarcity. We introduce FDD-48, a comprehensive dataset with fine-grained annotations across 13 food types and 48 defect categories under diverse real-world conditions. To improve detection with limited labeled data, we propose FDDet, a semi-supervised framework featuring two key components: (1) BBoxMixUp, a data augmentation technique that mixes same-category defect regions to reduce spurious feature associations, and (2) CGPC (Consistency-Guided Pseudo-Label Calibration), which filters pseudo-labels based on intra-sample consistency. Experiments show FDDet significantly outperforms mainstream detectors on FDD-48, demonstrating its effectiveness for food defect detection under data-limited scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New dataset FDD-48 is the clearest contribution here, but the abstract gives no numbers or ablations to back the specific claims for BBoxMixUp and CGPC.

read the letter

The paper's main addition is FDD-48, a dataset covering 13 food types and 48 defect categories with real-world variation. That fills a gap for anyone working on industrial food inspection where labeled data is scarce. The FDDet framework adds two named pieces on top of standard semi-supervised detection: BBoxMixUp mixes same-category defect boxes, and CGPC filters pseudo-labels by intra-sample consistency.

Those components are presented as the reason for better performance under limited labels. The abstract states that FDDet outperforms mainstream detectors, but supplies no mAP numbers, no baseline list, and no ablation tables. The stress-test concern holds: nothing in the provided text shows that the gains survive when BBoxMixUp is swapped for ordinary MixUp or when CGPC is replaced by simple confidence thresholding. Without those controls the attribution stays unproven.

The work sits squarely in applied computer vision for manufacturing. It does not derive new theory or run formal verification, so its value is in the dataset and the empirical results once they are shown in full. A reader already running semi-supervised detectors on niche domains could pull the dataset for comparison; the method itself follows existing patterns closely enough that the incremental gain is hard to judge from the summary alone.

If the full paper contains proper ablations, error analysis, and reproducible splits, it is worth sending to referees. Otherwise the central claim about the two components needs more evidence before review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the FDD-48 dataset comprising 13 food types and 48 defect categories with fine-grained annotations under real-world conditions. It proposes FDDet, a semi-supervised object detection framework whose two highlighted components are BBoxMixUp (mixing same-category defect bounding boxes) and CGPC (intra-sample consistency filtering of pseudo-labels). The central claim is that FDDet significantly outperforms mainstream detectors on FDD-48 in data-limited regimes.

Significance. A new benchmark dataset and a semi-supervised detector tailored to food-defect imagery would be useful for industrial quality-control applications if the performance gains are shown to be robust and specifically attributable to the proposed modules. The work addresses a genuine data-scarcity problem, but the current presentation supplies no quantitative results, baseline tables, or ablation controls, limiting any assessment of practical impact.

major comments (2)

[Abstract] Abstract: the statement that 'FDDet significantly outperforms mainstream detectors' supplies no mAP, AP50, or other numeric deltas, no list of compared detectors, and no indication of the labeled-data fraction used; without these quantities the central empirical claim cannot be evaluated.
[Experiments (implied by abstract description of key components)] The attribution of gains to BBoxMixUp and CGPC is not isolated. No ablation is described that replaces BBoxMixUp with standard MixUp/CutMix or CGPC with conventional confidence-threshold pseudo-labeling while keeping the remainder of the semi-supervised pipeline fixed; therefore the performance delta cannot be confidently credited to the two named components rather than other implementation choices.

minor comments (2)

[Abstract] The abstract mentions 'diverse real-world conditions' but provides no statistics on lighting, occlusion, or defect-size distributions that would allow readers to judge dataset difficulty.
[Method (implied)] Notation for BBoxMixUp and CGPC is introduced without an accompanying equation or algorithmic listing, making the precise mixing and filtering operations difficult to reproduce from the text alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that the abstract and experimental sections require more specific quantitative support and component-isolating ablations to allow proper evaluation of the claims. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that 'FDDet significantly outperforms mainstream detectors' supplies no mAP, AP50, or other numeric deltas, no list of compared detectors, and no indication of the labeled-data fraction used; without these quantities the central empirical claim cannot be evaluated.

Authors: We agree that the abstract is insufficiently specific. In the revision we will update the abstract to report concrete mAP and AP50 deltas, name the compared detectors, and state the labeled-data fractions used in the FDD-48 experiments. revision: yes
Referee: [Experiments (implied by abstract description of key components)] The attribution of gains to BBoxMixUp and CGPC is not isolated. No ablation is described that replaces BBoxMixUp with standard MixUp/CutMix or CGPC with conventional confidence-threshold pseudo-labeling while keeping the remainder of the semi-supervised pipeline fixed; therefore the performance delta cannot be confidently credited to the two named components rather than other implementation choices.

Authors: We acknowledge that the current manuscript does not contain the requested ablations. We will add new ablation tables that replace BBoxMixUp with standard MixUp/CutMix and CGPC with conventional confidence-threshold pseudo-labeling while holding all other pipeline elements fixed, thereby isolating the contribution of each proposed module. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on experiments, not self-referential derivations

full rationale

The paper introduces a dataset FDD-48 and a semi-supervised detection framework FDDet with two proposed components (BBoxMixUp and CGPC). All claims are supported by experimental comparisons on the dataset rather than any derivation chain, equations, or parameters fitted to the target quantities and then relabeled as predictions. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided text. Attribution questions about whether gains isolate to the two components are experimental-design issues, not circularity under the defined patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5656 in / 1093 out tokens · 35775 ms · 2026-06-30T13:45:15.215075+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 8 canonical work pages · 4 internal anchors

[1]

Impurities detection in edible bird’s nest using optical segmentation and image fusion,

Cong Kai Yee, Ying Heng Yeo, Lai Hoong Cheng, and Kin Sam Yen, “Impurities detection in edible bird’s nest using optical segmentation and image fusion,”Machine Vision and Applications, vol. 31, no. 7, pp. 68, 2020

2020
[2]

Machine vision on the positioning accuracy evaluation of poultry viscera in the automatic evisceration robot system,

Yan Chen, Ke Feng, Jianjian Lu, and Zhigang Hu, “Machine vision on the positioning accuracy evaluation of poultry viscera in the automatic evisceration robot system,”International Journal of Food Properties, vol. 24, no. 1, pp. 933–943, 2021

2021
[3]

Monochrome computer vision for detecting common external defects of mango,

Krishna Kumar Patel, A Kar, and MA Khan, “Monochrome computer vision for detecting common external defects of mango,”Journal of Food Science and Technology, pp. 1–8, 2021

2021
[4]

Identification of common skin defects and classification of early decayed citrus using hyperspectral imaging technique,

Hailiang Zhang, Ying Chen, Xuemei Liu, Yifeng Huang, Baishao Zhan, and Wei Luo, “Identification of common skin defects and classification of early decayed citrus using hyperspectral imaging technique,”Food Analytical Methods, vol. 14, no. 6, pp. 1176–1193, 2021

2021
[5]

Automated discrimination of deveined shrimps based on grayscale image parameters,

Nawaporn Thanasarn, Supapan Chaiprapat, Kriangkrai Waiyakan, and Kunlapat Thongkaew, “Automated discrimination of deveined shrimps based on grayscale image parameters,”Journal of Food Process Engineering, vol. 42, no. 4, pp. e13041, 2019

2019
[6]

A color channel based on multiple random forest coupled with a computer vision technique for the detection and prediction of sudan dye-i adulteration in turmeric powder,

Dipankar Mandal, Arpitam Chatterjee, and Bipan Tudu, “A color channel based on multiple random forest coupled with a computer vision technique for the detection and prediction of sudan dye-i adulteration in turmeric powder,”Color Research & Application, vol. 47, no. 2, pp. 388–400, 2022

2022
[7]

Ma- chine learning & computer vision-based optimum black tea fermentation detection,

Anuja Bhargava, Atul Bansal, Vishal Goyal, and Aasheesh Shukla, “Ma- chine learning & computer vision-based optimum black tea fermentation detection,”Multimedia Tools and Applications, vol. 82, no. 28, pp. 43335–43347, 2023

2023
[8]

Fast and accurate detection of kiwifruit in orchard using improved yolov3-tiny model,

Longsheng Fu, Yali Feng, Jingzhu Wu, Zhihao Liu, Fangfang Gao, Yaqoob Majeed, Ahmad Al-Mallahi, Qin Zhang, Rui Li, and Yongjie Cui, “Fast and accurate detection of kiwifruit in orchard using improved yolov3-tiny model,”Precision Agriculture, vol. 22, pp. 754–776, 2021

2021
[9]

A novel fuzzy pooling based modified thinnet architecture for lemon fruit classification,

KD Mohana Sundaram, T Shankar, and N Sudhakar Reddy, “A novel fuzzy pooling based modified thinnet architecture for lemon fruit classification,”Journal of Intelligent & Fuzzy Systems, vol. 43, no. 5, pp. 6877–6891, 2022

2022
[10]

Intelligent grading method for walnut kernels based on deep learning and physiological indicators,

Siwei Chen, Dan Dai, Jian Zheng, Haoyu Kang, Dongdong Wang, Xinyu Zheng, Xiaobo Gu, Jiali Mo, and Zhuohui Luo, “Intelligent grading method for walnut kernels based on deep learning and physiological indicators,”Frontiers in Nutrition, vol. 9, pp. 1075781, 2023

2023
[11]

Lightweight and efficient deep learning models for fruit detection in orchards,

Xiaoyao Yang, Wenyang Zhao, Yong Wang, Wei Qi Yan, and Yanqiang Li, “Lightweight and efficient deep learning models for fruit detection in orchards,”Scientific Reports, vol. 14, no. 1, pp. 26086, 2024

2024
[12]

Fruitnet: Indian fruits image dataset with quality for machine learning applications,

Vishal Meshram and Kailas Patil, “Fruitnet: Indian fruits image dataset with quality for machine learning applications,”Data in Brief, vol. 40, pp. 107686, 2022

2022
[13]

Fruit defect detection using cnn models with real and virtual data,

Renzo Pacheco., Paula Gonz ´alez., Luis Chuquimarca., Boris Vintimilla., and Sergio Velastin., “Fruit defect detection using cnn models with real and virtual data,” inProceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP , (VISIGRAPP 2023). INSTICC, 2023, pp. 272...

2023
[14]

Meat quality assessment based on deep learning,

Oguzhan Ulucan, Diclehan Karakaya, and Mehmet Turkan, “Meat quality assessment based on deep learning,” in2019 Innovations in Intelligent Systems and Applications Conference (ASYU). Ieee, 2019, pp. 1–5

2019
[15]

Potato diseases detection and classification using deep learning methods,

Ali Arshaghi, Mohsen Ashourian, and Leila Ghabeli, “Potato diseases detection and classification using deep learning methods,”Multimedia Tools and Applications, vol. 82, no. 4, pp. 5725–5742, 2023

2023
[16]

fmaaa307,

Qunmasj Vision Studio, “fmaaa307,” GitHub, 2024

2024
[17]

Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer, “Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 392–408

2022
[18]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, “mixup: Beyond empirical risk minimization,”arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Cutmix: Regularization strategy to train strong classifiers with localizable features,

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6023– 6032

2019
[20]

Augmix: A simple data pro- cessing method to improve robustness and uncertainty,

Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan, “Augmix: A simple data pro- cessing method to improve robustness and uncertainty,”arXiv preprint arXiv:1912.02781, 2019

work page arXiv 1912
[21]

Pixmix: Dreamlike pictures com- prehensively improve safety measures,

Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt, “Pixmix: Dreamlike pictures com- prehensively improve safety measures,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16783–16792

2022
[22]

Ipmix: Label-preserving data augmentation method for training robust classifiers,

Zhenglin Huang, Xiaoan Bao, Na Zhang, Qingqi Zhang, Xiao Tu, Biao Wu, and Xi Yang, “Ipmix: Label-preserving data augmentation method for training robust classifiers,”Advances in Neural Information Processing Systems, vol. 36, pp. 63660–63673, 2023

2023
[23]

Unbi- ased teacher for semi-supervised object detection,

Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, and Peter Vajda, “Unbi- ased teacher for semi-supervised object detection,”arXiv preprint arXiv:2102.09480, 2021

work page arXiv 2021
[24]

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, et al., “Minicpm-v: A gpt-4v level mllm on your phone,”arXiv preprint arXiv:2408.01800, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Yolo-world: Real-time open-vocabulary object detection,

Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan, “Yolo-world: Real-time open-vocabulary object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16901–16911

2024
[27]

YOLOX: Exceeding YOLO Series in 2021

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun, “Yolox: Exceeding yolo series in 2021,”arXiv preprint arXiv:2107.08430, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[28]

Rtmdet: An empiri- cal study of designing real-time object detectors,

Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, and Kai Chen, “Rtmdet: An empiri- cal study of designing real-time object detectors,”arXiv preprint arXiv:2212.07784, 2022

work page arXiv 2022
[29]

Dab-detr: Dynamic anchor boxes are better queries for detr,

Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” inInternational Conference on Learning Representations
[30]

Dino: Detr with improved denoising anchor boxes for end-to-end object detection,

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” inThe Eleventh International Conference on Learning Representations
[31]

Grounded language-image pre-training,

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq- Neng Hwang, et al., “Grounded language-image pre-training,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10965–10975

2022
[32]

Dense distinct query for end-to-end object detection,

Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen, “Dense distinct query for end-to-end object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7329– 7338

2023
[33]

Efficientdet: Scalable and efficient object detection,

Mingxing Tan, Ruoming Pang, and Quoc V Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2020, pp. 10781– 10790

2020
[34]

Exploring plain vision transformer backbones for object detection,

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He, “Exploring plain vision transformer backbones for object detection,” inEuropean conference on computer vision. Springer, 2022, pp. 280–296

2022
[35]

Detrs with collaborative hybrid assign- ments training. arxiv 2022,

Z Zong, G Song, and Y Liu, “Detrs with collaborative hybrid assign- ments training. arxiv 2022,”arXiv preprint arXiv:2211.12860, 2022

work page arXiv 2022
[36]

Yolov10: Real-time end-to-end object detection,

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, et al., “Yolov10: Real-time end-to-end object detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 107984–108011, 2024

2024
[37]

Designing network design spaces,

Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Doll ´ar, “Designing network design spaces,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10428–10436. Fig. 4: More sample images from FDD-48, demonstrating the diversity of food types and defect categories. APPENDIX We provide a...

2020

[1] [1]

Impurities detection in edible bird’s nest using optical segmentation and image fusion,

Cong Kai Yee, Ying Heng Yeo, Lai Hoong Cheng, and Kin Sam Yen, “Impurities detection in edible bird’s nest using optical segmentation and image fusion,”Machine Vision and Applications, vol. 31, no. 7, pp. 68, 2020

2020

[2] [2]

Machine vision on the positioning accuracy evaluation of poultry viscera in the automatic evisceration robot system,

Yan Chen, Ke Feng, Jianjian Lu, and Zhigang Hu, “Machine vision on the positioning accuracy evaluation of poultry viscera in the automatic evisceration robot system,”International Journal of Food Properties, vol. 24, no. 1, pp. 933–943, 2021

2021

[3] [3]

Monochrome computer vision for detecting common external defects of mango,

Krishna Kumar Patel, A Kar, and MA Khan, “Monochrome computer vision for detecting common external defects of mango,”Journal of Food Science and Technology, pp. 1–8, 2021

2021

[4] [4]

Identification of common skin defects and classification of early decayed citrus using hyperspectral imaging technique,

Hailiang Zhang, Ying Chen, Xuemei Liu, Yifeng Huang, Baishao Zhan, and Wei Luo, “Identification of common skin defects and classification of early decayed citrus using hyperspectral imaging technique,”Food Analytical Methods, vol. 14, no. 6, pp. 1176–1193, 2021

2021

[5] [5]

Automated discrimination of deveined shrimps based on grayscale image parameters,

Nawaporn Thanasarn, Supapan Chaiprapat, Kriangkrai Waiyakan, and Kunlapat Thongkaew, “Automated discrimination of deveined shrimps based on grayscale image parameters,”Journal of Food Process Engineering, vol. 42, no. 4, pp. e13041, 2019

2019

[6] [6]

A color channel based on multiple random forest coupled with a computer vision technique for the detection and prediction of sudan dye-i adulteration in turmeric powder,

Dipankar Mandal, Arpitam Chatterjee, and Bipan Tudu, “A color channel based on multiple random forest coupled with a computer vision technique for the detection and prediction of sudan dye-i adulteration in turmeric powder,”Color Research & Application, vol. 47, no. 2, pp. 388–400, 2022

2022

[7] [7]

Ma- chine learning & computer vision-based optimum black tea fermentation detection,

Anuja Bhargava, Atul Bansal, Vishal Goyal, and Aasheesh Shukla, “Ma- chine learning & computer vision-based optimum black tea fermentation detection,”Multimedia Tools and Applications, vol. 82, no. 28, pp. 43335–43347, 2023

2023

[8] [8]

Fast and accurate detection of kiwifruit in orchard using improved yolov3-tiny model,

Longsheng Fu, Yali Feng, Jingzhu Wu, Zhihao Liu, Fangfang Gao, Yaqoob Majeed, Ahmad Al-Mallahi, Qin Zhang, Rui Li, and Yongjie Cui, “Fast and accurate detection of kiwifruit in orchard using improved yolov3-tiny model,”Precision Agriculture, vol. 22, pp. 754–776, 2021

2021

[9] [9]

A novel fuzzy pooling based modified thinnet architecture for lemon fruit classification,

KD Mohana Sundaram, T Shankar, and N Sudhakar Reddy, “A novel fuzzy pooling based modified thinnet architecture for lemon fruit classification,”Journal of Intelligent & Fuzzy Systems, vol. 43, no. 5, pp. 6877–6891, 2022

2022

[10] [10]

Intelligent grading method for walnut kernels based on deep learning and physiological indicators,

Siwei Chen, Dan Dai, Jian Zheng, Haoyu Kang, Dongdong Wang, Xinyu Zheng, Xiaobo Gu, Jiali Mo, and Zhuohui Luo, “Intelligent grading method for walnut kernels based on deep learning and physiological indicators,”Frontiers in Nutrition, vol. 9, pp. 1075781, 2023

2023

[11] [11]

Lightweight and efficient deep learning models for fruit detection in orchards,

Xiaoyao Yang, Wenyang Zhao, Yong Wang, Wei Qi Yan, and Yanqiang Li, “Lightweight and efficient deep learning models for fruit detection in orchards,”Scientific Reports, vol. 14, no. 1, pp. 26086, 2024

2024

[12] [12]

Fruitnet: Indian fruits image dataset with quality for machine learning applications,

Vishal Meshram and Kailas Patil, “Fruitnet: Indian fruits image dataset with quality for machine learning applications,”Data in Brief, vol. 40, pp. 107686, 2022

2022

[13] [13]

Fruit defect detection using cnn models with real and virtual data,

Renzo Pacheco., Paula Gonz ´alez., Luis Chuquimarca., Boris Vintimilla., and Sergio Velastin., “Fruit defect detection using cnn models with real and virtual data,” inProceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP , (VISIGRAPP 2023). INSTICC, 2023, pp. 272...

2023

[14] [14]

Meat quality assessment based on deep learning,

Oguzhan Ulucan, Diclehan Karakaya, and Mehmet Turkan, “Meat quality assessment based on deep learning,” in2019 Innovations in Intelligent Systems and Applications Conference (ASYU). Ieee, 2019, pp. 1–5

2019

[15] [15]

Potato diseases detection and classification using deep learning methods,

Ali Arshaghi, Mohsen Ashourian, and Leila Ghabeli, “Potato diseases detection and classification using deep learning methods,”Multimedia Tools and Applications, vol. 82, no. 4, pp. 5725–5742, 2023

2023

[16] [16]

fmaaa307,

Qunmasj Vision Studio, “fmaaa307,” GitHub, 2024

2024

[17] [17]

Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer, “Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 392–408

2022

[18] [18]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, “mixup: Beyond empirical risk minimization,”arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Cutmix: Regularization strategy to train strong classifiers with localizable features,

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6023– 6032

2019

[20] [20]

Augmix: A simple data pro- cessing method to improve robustness and uncertainty,

Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan, “Augmix: A simple data pro- cessing method to improve robustness and uncertainty,”arXiv preprint arXiv:1912.02781, 2019

work page arXiv 1912

[21] [21]

Pixmix: Dreamlike pictures com- prehensively improve safety measures,

Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt, “Pixmix: Dreamlike pictures com- prehensively improve safety measures,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16783–16792

2022

[22] [22]

Ipmix: Label-preserving data augmentation method for training robust classifiers,

Zhenglin Huang, Xiaoan Bao, Na Zhang, Qingqi Zhang, Xiao Tu, Biao Wu, and Xi Yang, “Ipmix: Label-preserving data augmentation method for training robust classifiers,”Advances in Neural Information Processing Systems, vol. 36, pp. 63660–63673, 2023

2023

[23] [23]

Unbi- ased teacher for semi-supervised object detection,

Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, and Peter Vajda, “Unbi- ased teacher for semi-supervised object detection,”arXiv preprint arXiv:2102.09480, 2021

work page arXiv 2021

[24] [24]

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, et al., “Minicpm-v: A gpt-4v level mllm on your phone,”arXiv preprint arXiv:2408.01800, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Yolo-world: Real-time open-vocabulary object detection,

Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, and Ying Shan, “Yolo-world: Real-time open-vocabulary object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16901–16911

2024

[27] [27]

YOLOX: Exceeding YOLO Series in 2021

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun, “Yolox: Exceeding yolo series in 2021,”arXiv preprint arXiv:2107.08430, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[28] [28]

Rtmdet: An empiri- cal study of designing real-time object detectors,

Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, and Kai Chen, “Rtmdet: An empiri- cal study of designing real-time object detectors,”arXiv preprint arXiv:2212.07784, 2022

work page arXiv 2022

[29] [29]

Dab-detr: Dynamic anchor boxes are better queries for detr,

Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” inInternational Conference on Learning Representations

[30] [30]

Dino: Detr with improved denoising anchor boxes for end-to-end object detection,

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” inThe Eleventh International Conference on Learning Representations

[31] [31]

Grounded language-image pre-training,

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq- Neng Hwang, et al., “Grounded language-image pre-training,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10965–10975

2022

[32] [32]

Dense distinct query for end-to-end object detection,

Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen, “Dense distinct query for end-to-end object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7329– 7338

2023

[33] [33]

Efficientdet: Scalable and efficient object detection,

Mingxing Tan, Ruoming Pang, and Quoc V Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2020, pp. 10781– 10790

2020

[34] [34]

Exploring plain vision transformer backbones for object detection,

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He, “Exploring plain vision transformer backbones for object detection,” inEuropean conference on computer vision. Springer, 2022, pp. 280–296

2022

[35] [35]

Detrs with collaborative hybrid assign- ments training. arxiv 2022,

Z Zong, G Song, and Y Liu, “Detrs with collaborative hybrid assign- ments training. arxiv 2022,”arXiv preprint arXiv:2211.12860, 2022

work page arXiv 2022

[36] [36]

Yolov10: Real-time end-to-end object detection,

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, et al., “Yolov10: Real-time end-to-end object detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 107984–108011, 2024

2024

[37] [37]

Designing network design spaces,

Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Doll ´ar, “Designing network design spaces,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10428–10436. Fig. 4: More sample images from FDD-48, demonstrating the diversity of food types and defect categories. APPENDIX We provide a...

2020