Recognition: no theorem link
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection
Pith reviewed 2026-05-15 18:02 UTC · model grok-4.3
The pith
A new smartphone microscopy dataset of 4,941 images enables deep learning models to detect food mold with 99.5 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MobileMold supplies 4,941 real-world smartphone microscopy images across 11 food categories, four phone models, and three clip-on microscopes. When standard pretrained convolutional networks are fine-tuned on these images, with and without common augmentations, the best configurations reach 0.9954 accuracy, 0.9954 F1, and 0.9907 Matthews correlation coefficient on mold detection; the same models simultaneously classify food type at similar performance. Saliency maps confirm that predictions align with visible mold structures.
What carries the argument
The MobileMold dataset of smartphone clip-on microscope images paired with binary mold labels and food-type labels, used to train and evaluate pretrained deep classifiers in single-task and multi-task settings.
If this is right
- Pretrained image models can be adapted to reach near-perfect mold detection without custom architecture design.
- A single network can output both mold status and food identity in one forward pass.
- Saliency maps provide human-interpretable confirmation that the model attends to mold hyphae rather than spurious cues.
- The dataset supports development of portable apps that combine phone cameras with cheap microscope attachments for spoilage screening.
Where Pith is reading between the lines
- Consumer-grade attachments could turn routine phone photos into early-warning tools for household food waste reduction.
- The same imaging setup may extend to other microscopic food contaminants such as bacterial colonies if additional labels are collected.
- Multi-task training on this data might improve robustness when the model is later deployed on videos rather than single frames.
Load-bearing premise
The collected images under diverse real-world conditions are representative enough for models to generalize to new food samples, phones, and environments not seen during training.
What would settle it
Train the reported models on the released MobileMold split and then measure accuracy on a fresh test set of images captured with an unseen phone model or an unseen food type; a drop below 0.90 accuracy would falsify the claim of near-ceiling utility.
Figures
read the original abstract
Smartphone clip-on microscopes turn everyday devices into low-cost, portable imaging systems that can even reveal fungal structures at the microscopic level, enabling mold inspection beyond unaided visual checks. In this paper, we introduce MobileMold, an open smartphone-based microscopy dataset for food mold detection and food classification. MobileMold contains 4,941 handheld microscopy images spanning 11 food types, 4 smartphones, 3 microscopes, and diverse real-world conditions. Beyond the dataset release, we establish baselines for (i) mold detection and (ii) food-type classification, including a multi-task setting that predicts both attributes. Across multiple pretrained deep learning architectures and augmentation strategies, we obtain near-ceiling performance (accuracy = 0.9954, F1 = 0.9954, MCC = 0.9907), validating the utility of our dataset for detecting food spoilage. To increase transparency, we complement our evaluation with saliency-based visual explanations highlighting mold regions associated with the model's predictions. MobileMold aims to contribute to research on accessible food-safety sensing, mobile imaging, and exploring the potential of smartphones enhanced with attachments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MobileMold, an open dataset of 4941 smartphone microscopy images spanning 11 food types, 4 smartphones, and 3 microscopes under diverse real-world conditions. It provides baselines for mold detection, food-type classification, and multi-task prediction using pretrained deep networks with augmentations, reporting near-ceiling metrics (accuracy = 0.9954, F1 = 0.9954, MCC = 0.9907), along with saliency maps for visual explanations. The central claim is that these results validate the dataset's utility for accessible food spoilage detection.
Significance. If the evaluation protocol ensures no leakage across repeated physical samples or devices, the high performance across multiple architectures would indicate that the dataset captures transferable mold morphology and food features, supporting development of low-cost mobile food-safety tools. The inclusion of saliency explanations and multi-task results adds transparency and practical value for the computer vision community.
major comments (2)
- [Evaluation Protocol] Evaluation protocol: the manuscript does not specify the train/test splitting strategy (e.g., by unique physical food sample, by device, or random per-image). With only 4 phones and 11 food types, any image-level split risks leakage of sample-specific or device-specific artifacts, directly undermining the generalization claim that performance validates utility for unseen samples, phones, and environments.
- [Dataset Description] Dataset composition: more granular statistics are needed on the number of distinct physical samples per food type and per device (beyond the aggregate 4941 images). Without this, it is difficult to assess whether the near-ceiling metrics reflect learning of general mold features or memorization of limited instances.
minor comments (2)
- [Abstract] Abstract: the phrase 'near-ceiling performance' is used without defining a reference ceiling (e.g., human expert accuracy or theoretical upper bound); consider adding a brief comparison.
- [Results] Notation: confirm that MCC refers to Matthews Correlation Coefficient and provide the exact computation or reference used for the reported value of 0.9907.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify key aspects of our evaluation and dataset presentation. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: Evaluation protocol: the manuscript does not specify the train/test splitting strategy (e.g., by unique physical food sample, by device, or random per-image). With only 4 phones and 11 food types, any image-level split risks leakage of sample-specific or device-specific artifacts, directly undermining the generalization claim that performance validates utility for unseen samples, phones, and environments.
Authors: We agree that an explicit description of the splitting strategy is necessary to support generalization claims. Our baselines used a random per-image 80/20 train/test split stratified by label. We will revise the manuscript to state this protocol clearly, including the random seed, and will add a new experiment reporting performance under a device-wise split (holding out one smartphone entirely) to directly address leakage concerns. This addition will demonstrate that the high metrics are not solely due to device-specific artifacts. revision: yes
-
Referee: Dataset composition: more granular statistics are needed on the number of distinct physical samples per food type and per device (beyond the aggregate 4941 images). Without this, it is difficult to assess whether the near-ceiling metrics reflect learning of general mold features or memorization of limited instances.
Authors: We acknowledge that more granular statistics on unique physical samples would improve transparency. Collection logs allow us to report approximate counts of distinct food items per type and device; we will add a dedicated table in the revised dataset section with these figures. While exact per-image sample tracking was not performed for all captures, the added breakdown will help readers evaluate diversity and reduce concerns about memorization. revision: partial
Circularity Check
No circularity: purely empirical dataset release and baseline evaluation
full rationale
The paper introduces a microscopy image dataset and reports direct empirical results from training standard pretrained models on it. No derivations, equations, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatzes appear in the provided text. Performance numbers (accuracy 0.9954 etc.) are straightforward outputs of supervised training and evaluation rather than any self-referential reduction. Generalization concerns (e.g., possible train/test leakage) are separate from circularity and do not trigger any of the enumerated patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in supervised deep learning for image classification hold, including that training and test images are drawn from similar distributions.
Reference graph
Works this paper leans on
-
[1]
www.alibaba.com [n. d.].30x Led-licht Lupe Mini Pocket Tragbares Handy- mikroskop - Buy 30x Led Light Magnifier mini Pocket Portable mobile Phone Mi- croscope led Lighted Handheld portable Phone Inspection Product on Alibaba.com. www.alibaba.com. https://german.alibaba.com/product-detail/30X-LED-Light- Magnifier-Mini-Pocket-1601456339685.html
-
[2]
Jiusion [n. d.].Jiusion 30X Zoom Bright LED Microscope Magnifier Clip-On Cell Phone Mobile Phone Camera Lens for Apple iPhone Samsung iPad. Jiu- sion. https://jiusion.com/products/jiusion-30x-zoom-bright-led-microscope- magnifier-clip-on-cell-phone-mobile-phone-camera-lens-for-apple-iphone- samsung-ipad-p0145
-
[3]
d.].MS001 100X LED Cell Phone Microscope Lens
APEXEL Official [n. d.].MS001 100X LED Cell Phone Microscope Lens. APEXEL Official. https://www.shopapexel.com/products/100x-microscope-lens
-
[4]
Alberto Altafini, Paola Roncada, Alessandro Guerrini, Gaetan Minkoumba Son- fack, Giorgio Fedrizzi, and Elisabetta Caprai. 2021. Occurrence of Ochratoxin A in different types of cheese offered for sale in Italy.Toxins (Basel)13, 8 (Aug. 2021), 540
work page 2021
-
[5]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.BMC Genomics21, 1 (Jan. 2020), 6
work page 2020
-
[6]
Davide Chicco and Giuseppe Jurman. 2023. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.BioData Min.16, 1 (Feb. 2023), 4
work page 2023
-
[7]
Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V
Ekin D. Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 113–123. doi:10.1109/CVPR.2019.00020
-
[8]
Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 3008–3017. doi:10.1109/CVPRW50498.2020.00359
-
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A large-scale hierarchical image database. In2009 IEEE Conference on Com- puter Vision and Pattern Recognition. 248–255. doi:10.1109/CVPR.2009.5206848
-
[10]
Vakkas Doğan, Melodi Evliya, Leyla Nesrin Kahyaoglu, and Volkan Kılıç. 2024. On- site colorimetric food spoilage monitoring with smartphone embedded machine learning.Talanta266, Pt 1 (Jan. 2024), 125021
work page 2024
-
[11]
useprefix=true family=grower.eu, prefix=led. [n. d.].Phonescope 30x Zoom. led- grower.eu. https://www.led-grower.eu/phonescope-30x-zoom-mikroskop-pro- mobilni-telefon/
-
[12]
Manuel Gómez, Andrea Casado, and Irma Caro. 2023. Assessing the effect of flour (white or whole-grain) and process (direct or par-baked) on the mycotoxin content of bread in Spain.Foods12, 23 (Nov. 2023), 4240
work page 2023
-
[13]
Zhang, Shaoqing Ren, and Jian Sun
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2015), 770–778. https://api.semanticscholar.org/CorpusID: 206594692
work page 2015
-
[14]
Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan
Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty.Proceedings of the International Conference on Learning Representations (ICLR)(2020)
work page 2020
-
[15]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.arXiv preprint arXiv:1704.04861(2017). arXiv:1704.04861
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [16]
-
[17]
Densely Connected Convolutional Networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. doi:10.1109/CVPR. 2017.243
-
[18]
Mehdi Imani, Majid Joudaki, Ayoub Bagheri, and Hamid R. Arabnia. 2026. Why ROC-AUC Is Misleading for Highly Imbalanced Data: In-Depth Evaluation of MCC, F2-Score, H-Measure, and AUC-Based Metrics Across Diverse Classifiers. Technologies14, 1 (2026). doi:10.3390/technologies14010054
-
[19]
Fahad Jubayer, Janibul Alam Soeb, Abu Naser Mojumder, Mitun Kanti Paul, Pranta Barua, Shahidullah Kayshar, Syeda Sabrina Akter, Mizanur Rahman, and Amirul Islam. 2021. Detection of mold on the food surface using YOLOv5.Curr. Res. Food Sci.4 (Oct. 2021), 724–728
work page 2021
-
[20]
Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. 2020. Captum: A unified and generic model interpretability library for PyTorch.arXiv preprint arXiv:2009.07896(2020)
-
[21]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9992–10002. doi:10.1109/ICCV48922.2021.00986
-
[22]
Samuel G. Müller and Frank Hutter. 2021. TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation.2021 IEEE/CVF International Conference on Computer Vision (ICCV)(2021), 754–762. https://api.semanticscholar.org/ CorpusID:232269620
work page 2021
-
[23]
S Oueslati, H Berrada, A Juan-García, J Mañes, and Cristina Juan. 2020. Multiple mycotoxin determination on Tunisian cereals-based food and evaluation of the population exposure.Food Anal. Methods13, 6 (June 2020), 1271–1281
work page 2020
-
[24]
2019.PyTorch: an imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019.PyTorch: an imperative style, high-per...
work page 2019
-
[25]
María Agustina Pavicich, Stefano Compagnoni, Celine Meerpoel, Katleen Raes, and Sarah De Saeger. 2024. Ochratoxin A and AFM1 in cheese and cheese sub- stitutes: LC-MS/MS method validation, natural occurrence, and risk assessment. Toxins (Basel)16, 12 (Dec. 2024), 547
work page 2024
-
[26]
Gagan Raju, Aashrayi Ranjan, Soumyabrata Banik, Ashmini Poddar, Vishwanath Managuli, and Nirmal Mazumder. 2024. A commentary on the development and use of smartphone imaging devices.Biophys. Rev.16, 2 (April 2024), 151–163
work page 2024
-
[27]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. InWorkshop at International Conference on Learning Representations
work page 2014
-
[28]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Net- works for Large-Scale Image Recognition. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Confer- ence Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)
work page 2015
-
[29]
SmoothGrad: removing noise by adding noise
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Viégas, and Martin Wat- tenberg. 2017. SmoothGrad: removing noise by adding noise.arXiv eprint arXiv:1706.03825(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[30]
Aswathi Soni, Yash Dixit, Marlon M Reis, and Gale Brightwell. 2022. Hyper- spectral imaging and machine learning in food microbiology: Developments and challenges in detection of bacterial, fungal, and viral contaminants.Compr. Rev. Food Sci. Food Saf.21, 4 (July 2022), 3717–3745
work page 2022
-
[31]
Panisa Treepong and Nawanol Theera-Ampornpunt. 2023. Early Bread Mold Detection through Microscopic Images Using Convolutional Neural Network. 7 (2023), 100574. doi:10.1016/j.crfs.2023.100574
-
[32]
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. 2022. MaxViT: Multi-axis Vision Transformer. InComputer Vision – ECCV 2022: 17th European Conference, Tel A viv, Israel, October 23–27, 2022, Proceedings, Part XXIV(Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 459–479. doi:10.1007/978-3-031-20053-3_27
-
[33]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(2017), 6848–
work page 2017
-
[34]
https://api.semanticscholar.org/CorpusID:24982157 Preprint — Author Accepted Version
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.