arxiv: 2603.01944 · v1 · submitted 2026-03-02 · 💻 cs.CV

Recognition: no theorem link

MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection

Dinh Nam Pham , Leonard Prokisch , Bennet Meyer , Jonas Thumbs

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:02 UTC · model grok-4.3

classification 💻 cs.CV

keywords smartphone microscopyfood mold detectionimage datasetdeep learning classificationfood safetyclip-on microscope

0 comments

The pith

A new smartphone microscopy dataset of 4,941 images enables deep learning models to detect food mold with 99.5 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases MobileMold, a collection of handheld microscope images taken with everyday phones that show fungal structures on eleven different foods under varied lighting and backgrounds. It then trains several standard image classifiers on the data and reports accuracy, F1, and Matthews correlation all above 0.99 for the binary task of spotting mold versus clean food. The same models also handle joint prediction of both mold presence and food type at comparable levels. Visual saliency maps are provided to show that the networks focus on the actual mold regions rather than background cues. The work positions the dataset as a practical resource for low-cost food-safety checks that do not require lab equipment.

Core claim

MobileMold supplies 4,941 real-world smartphone microscopy images across 11 food categories, four phone models, and three clip-on microscopes. When standard pretrained convolutional networks are fine-tuned on these images, with and without common augmentations, the best configurations reach 0.9954 accuracy, 0.9954 F1, and 0.9907 Matthews correlation coefficient on mold detection; the same models simultaneously classify food type at similar performance. Saliency maps confirm that predictions align with visible mold structures.

What carries the argument

The MobileMold dataset of smartphone clip-on microscope images paired with binary mold labels and food-type labels, used to train and evaluate pretrained deep classifiers in single-task and multi-task settings.

If this is right

Pretrained image models can be adapted to reach near-perfect mold detection without custom architecture design.
A single network can output both mold status and food identity in one forward pass.
Saliency maps provide human-interpretable confirmation that the model attends to mold hyphae rather than spurious cues.
The dataset supports development of portable apps that combine phone cameras with cheap microscope attachments for spoilage screening.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Consumer-grade attachments could turn routine phone photos into early-warning tools for household food waste reduction.
The same imaging setup may extend to other microscopic food contaminants such as bacterial colonies if additional labels are collected.
Multi-task training on this data might improve robustness when the model is later deployed on videos rather than single frames.

Load-bearing premise

The collected images under diverse real-world conditions are representative enough for models to generalize to new food samples, phones, and environments not seen during training.

What would settle it

Train the reported models on the released MobileMold split and then measure accuracy on a fresh test set of images captured with an unseen phone model or an unseen food type; a drop below 0.90 accuracy would falsify the claim of near-ceiling utility.

Figures

Figures reproduced from arXiv: 2603.01944 by Bennet Meyer, Dinh Nam Pham, Jonas Thumbs, Leonard Prokisch.

**Figure 2.** Figure 2: Typical data acquisition setup mimicking consumer [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Samples of MobileMold. 4 Dataset Use Case Application The MobileMold dataset can support a diverse range of tasks and applications, including mold detection, food classification, mold localization, and segmentation. The images taken with clip-on lenses could also be utilized for texture analysis on a microscopic scale. Furthermore, it serves as a valuable resource for downstream applications and transfer… view at source ↗

**Figure 5.** Figure 5: Saliency maps of MobileNet for 3 random samples [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Smartphone clip-on microscopes turn everyday devices into low-cost, portable imaging systems that can even reveal fungal structures at the microscopic level, enabling mold inspection beyond unaided visual checks. In this paper, we introduce MobileMold, an open smartphone-based microscopy dataset for food mold detection and food classification. MobileMold contains 4,941 handheld microscopy images spanning 11 food types, 4 smartphones, 3 microscopes, and diverse real-world conditions. Beyond the dataset release, we establish baselines for (i) mold detection and (ii) food-type classification, including a multi-task setting that predicts both attributes. Across multiple pretrained deep learning architectures and augmentation strategies, we obtain near-ceiling performance (accuracy = 0.9954, F1 = 0.9954, MCC = 0.9907), validating the utility of our dataset for detecting food spoilage. To increase transparency, we complement our evaluation with saliency-based visual explanations highlighting mold regions associated with the model's predictions. MobileMold aims to contribute to research on accessible food-safety sensing, mobile imaging, and exploring the potential of smartphones enhanced with attachments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is releasing the MobileMold dataset of 4941 smartphone microscope images for food mold detection, with baselines that hit very high numbers but rest on unverified split details.

read the letter

The main thing to know is that this paper puts out a new public dataset of handheld microscope images taken on phones, covering mold on 11 food types across 4 devices and 3 attachments. They also run standard pretrained models on mold detection, food classification, and a multi-task version, plus add saliency maps to show where the models focus. The reported scores reach 0.995 accuracy, which looks solid for this narrow task on the released data.

Referee Report

2 major / 2 minor

Summary. The paper introduces MobileMold, an open dataset of 4941 smartphone microscopy images spanning 11 food types, 4 smartphones, and 3 microscopes under diverse real-world conditions. It provides baselines for mold detection, food-type classification, and multi-task prediction using pretrained deep networks with augmentations, reporting near-ceiling metrics (accuracy = 0.9954, F1 = 0.9954, MCC = 0.9907), along with saliency maps for visual explanations. The central claim is that these results validate the dataset's utility for accessible food spoilage detection.

Significance. If the evaluation protocol ensures no leakage across repeated physical samples or devices, the high performance across multiple architectures would indicate that the dataset captures transferable mold morphology and food features, supporting development of low-cost mobile food-safety tools. The inclusion of saliency explanations and multi-task results adds transparency and practical value for the computer vision community.

major comments (2)

[Evaluation Protocol] Evaluation protocol: the manuscript does not specify the train/test splitting strategy (e.g., by unique physical food sample, by device, or random per-image). With only 4 phones and 11 food types, any image-level split risks leakage of sample-specific or device-specific artifacts, directly undermining the generalization claim that performance validates utility for unseen samples, phones, and environments.
[Dataset Description] Dataset composition: more granular statistics are needed on the number of distinct physical samples per food type and per device (beyond the aggregate 4941 images). Without this, it is difficult to assess whether the near-ceiling metrics reflect learning of general mold features or memorization of limited instances.

minor comments (2)

[Abstract] Abstract: the phrase 'near-ceiling performance' is used without defining a reference ceiling (e.g., human expert accuracy or theoretical upper bound); consider adding a brief comparison.
[Results] Notation: confirm that MCC refers to Matthews Correlation Coefficient and provide the exact computation or reference used for the reported value of 0.9907.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify key aspects of our evaluation and dataset presentation. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: Evaluation protocol: the manuscript does not specify the train/test splitting strategy (e.g., by unique physical food sample, by device, or random per-image). With only 4 phones and 11 food types, any image-level split risks leakage of sample-specific or device-specific artifacts, directly undermining the generalization claim that performance validates utility for unseen samples, phones, and environments.

Authors: We agree that an explicit description of the splitting strategy is necessary to support generalization claims. Our baselines used a random per-image 80/20 train/test split stratified by label. We will revise the manuscript to state this protocol clearly, including the random seed, and will add a new experiment reporting performance under a device-wise split (holding out one smartphone entirely) to directly address leakage concerns. This addition will demonstrate that the high metrics are not solely due to device-specific artifacts. revision: yes
Referee: Dataset composition: more granular statistics are needed on the number of distinct physical samples per food type and per device (beyond the aggregate 4941 images). Without this, it is difficult to assess whether the near-ceiling metrics reflect learning of general mold features or memorization of limited instances.

Authors: We acknowledge that more granular statistics on unique physical samples would improve transparency. Collection logs allow us to report approximate counts of distinct food items per type and device; we will add a dedicated table in the revised dataset section with these figures. While exact per-image sample tracking was not performed for all captures, the added breakdown will help readers evaluate diversity and reduce concerns about memorization. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical dataset release and baseline evaluation

full rationale

The paper introduces a microscopy image dataset and reports direct empirical results from training standard pretrained models on it. No derivations, equations, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatzes appear in the provided text. Performance numbers (accuracy 0.9954 etc.) are straightforward outputs of supervised training and evaluation rather than any self-referential reduction. Generalization concerns (e.g., possible train/test leakage) are separate from circularity and do not trigger any of the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities; the work rests on standard deep-learning assumptions for image classification.

axioms (1)

domain assumption Standard assumptions in supervised deep learning for image classification hold, including that training and test images are drawn from similar distributions.
Implicit in the training of pretrained architectures and the reported performance metrics.

pith-pipeline@v0.9.0 · 5506 in / 1106 out tokens · 68361 ms · 2026-05-15T18:02:22.096277+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

[1]

www.alibaba.com [n. d.].30x Led-licht Lupe Mini Pocket Tragbares Handy- mikroskop - Buy 30x Led Light Magnifier mini Pocket Portable mobile Phone Mi- croscope led Lighted Handheld portable Phone Inspection Product on Alibaba.com. www.alibaba.com. https://german.alibaba.com/product-detail/30X-LED-Light- Magnifier-Mini-Pocket-1601456339685.html

work page
[2]

d.].Jiusion 30X Zoom Bright LED Microscope Magnifier Clip-On Cell Phone Mobile Phone Camera Lens for Apple iPhone Samsung iPad

Jiusion [n. d.].Jiusion 30X Zoom Bright LED Microscope Magnifier Clip-On Cell Phone Mobile Phone Camera Lens for Apple iPhone Samsung iPad. Jiu- sion. https://jiusion.com/products/jiusion-30x-zoom-bright-led-microscope- magnifier-clip-on-cell-phone-mobile-phone-camera-lens-for-apple-iphone- samsung-ipad-p0145

work page
[3]

d.].MS001 100X LED Cell Phone Microscope Lens

APEXEL Official [n. d.].MS001 100X LED Cell Phone Microscope Lens. APEXEL Official. https://www.shopapexel.com/products/100x-microscope-lens

work page
[4]

Alberto Altafini, Paola Roncada, Alessandro Guerrini, Gaetan Minkoumba Son- fack, Giorgio Fedrizzi, and Elisabetta Caprai. 2021. Occurrence of Ochratoxin A in different types of cheese offered for sale in Italy.Toxins (Basel)13, 8 (Aug. 2021), 540

work page 2021
[5]

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.BMC Genomics21, 1 (Jan. 2020), 6

work page 2020
[6]

Davide Chicco and Giuseppe Jurman. 2023. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.BioData Min.16, 1 (Feb. 2023), 4

work page 2023
[7]

Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V

Ekin D. Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 113–123. doi:10.1109/CVPR.2019.00020

work page doi:10.1109/cvpr.2019.00020 2019
[8]

Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V

Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 3008–3017. doi:10.1109/CVPRW50498.2020.00359

work page doi:10.1109/cvprw50498.2020.00359 2020
[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A large-scale hierarchical image database. In2009 IEEE Conference on Com- puter Vision and Pattern Recognition. 248–255. doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[10]

Vakkas Doğan, Melodi Evliya, Leyla Nesrin Kahyaoglu, and Volkan Kılıç. 2024. On- site colorimetric food spoilage monitoring with smartphone embedded machine learning.Talanta266, Pt 1 (Jan. 2024), 125021

work page 2024
[11]

useprefix=true family=grower.eu, prefix=led. [n. d.].Phonescope 30x Zoom. led- grower.eu. https://www.led-grower.eu/phonescope-30x-zoom-mikroskop-pro- mobilni-telefon/

work page
[12]

Manuel Gómez, Andrea Casado, and Irma Caro. 2023. Assessing the effect of flour (white or whole-grain) and process (direct or par-baked) on the mycotoxin content of bread in Spain.Foods12, 23 (Nov. 2023), 4240

work page 2023
[13]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2015), 770–778. https://api.semanticscholar.org/CorpusID: 206594692

work page 2015
[14]

Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan

Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty.Proceedings of the International Conference on Learning Representations (ICLR)(2020)

work page 2020
[15]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.arXiv preprint arXiv:1704.04861(2017). arXiv:1704.04861

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Weinberger

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger

work page
[17]

Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Densely Connected Convolutional Networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. doi:10.1109/CVPR. 2017.243

work page doi:10.1109/cvpr 2017
[18]

Mehdi Imani, Majid Joudaki, Ayoub Bagheri, and Hamid R. Arabnia. 2026. Why ROC-AUC Is Misleading for Highly Imbalanced Data: In-Depth Evaluation of MCC, F2-Score, H-Measure, and AUC-Based Metrics Across Diverse Classifiers. Technologies14, 1 (2026). doi:10.3390/technologies14010054

work page doi:10.3390/technologies14010054 2026
[19]

Fahad Jubayer, Janibul Alam Soeb, Abu Naser Mojumder, Mitun Kanti Paul, Pranta Barua, Shahidullah Kayshar, Syeda Sabrina Akter, Mizanur Rahman, and Amirul Islam. 2021. Detection of mold on the food surface using YOLOv5.Curr. Res. Food Sci.4 (Oct. 2021), 724–728

work page 2021
[20]

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. 2020. Captum: A unified and generic model interpretability library for PyTorch.arXiv preprint arXiv:2009.07896(2020)

work page arXiv 2020
[21]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9992–10002. doi:10.1109/ICCV48922.2021.00986

work page doi:10.1109/iccv48922.2021.00986 2021
[22]

Müller and Frank Hutter

Samuel G. Müller and Frank Hutter. 2021. TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation.2021 IEEE/CVF International Conference on Computer Vision (ICCV)(2021), 754–762. https://api.semanticscholar.org/ CorpusID:232269620

work page 2021
[23]

S Oueslati, H Berrada, A Juan-García, J Mañes, and Cristina Juan. 2020. Multiple mycotoxin determination on Tunisian cereals-based food and evaluation of the population exposure.Food Anal. Methods13, 6 (June 2020), 1271–1281

work page 2020
[24]

2019.PyTorch: an imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019.PyTorch: an imperative style, high-per...

work page 2019
[25]

María Agustina Pavicich, Stefano Compagnoni, Celine Meerpoel, Katleen Raes, and Sarah De Saeger. 2024. Ochratoxin A and AFM1 in cheese and cheese sub- stitutes: LC-MS/MS method validation, natural occurrence, and risk assessment. Toxins (Basel)16, 12 (Dec. 2024), 547

work page 2024
[26]

Gagan Raju, Aashrayi Ranjan, Soumyabrata Banik, Ashmini Poddar, Vishwanath Managuli, and Nirmal Mazumder. 2024. A commentary on the development and use of smartphone imaging devices.Biophys. Rev.16, 2 (April 2024), 151–163

work page 2024
[27]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. InWorkshop at International Conference on Learning Representations

work page 2014
[28]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Net- works for Large-Scale Image Recognition. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Confer- ence Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)

work page 2015
[29]

SmoothGrad: removing noise by adding noise

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Viégas, and Martin Wat- tenberg. 2017. SmoothGrad: removing noise by adding noise.arXiv eprint arXiv:1706.03825(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Aswathi Soni, Yash Dixit, Marlon M Reis, and Gale Brightwell. 2022. Hyper- spectral imaging and machine learning in food microbiology: Developments and challenges in detection of bacterial, fungal, and viral contaminants.Compr. Rev. Food Sci. Food Saf.21, 4 (July 2022), 3717–3745

work page 2022
[31]

Panisa Treepong and Nawanol Theera-Ampornpunt. 2023. Early Bread Mold Detection through Microscopic Images Using Convolutional Neural Network. 7 (2023), 100574. doi:10.1016/j.crfs.2023.100574

work page doi:10.1016/j.crfs.2023.100574 2023
[32]

Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. 2022. MaxViT: Multi-axis Vision Transformer. InComputer Vision – ECCV 2022: 17th European Conference, Tel A viv, Israel, October 23–27, 2022, Proceedings, Part XXIV(Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 459–479. doi:10.1007/978-3-031-20053-3_27

work page doi:10.1007/978-3-031-20053-3_27 2022
[33]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(2017), 6848–

work page 2017
[34]

https://api.semanticscholar.org/CorpusID:24982157 Preprint — Author Accepted Version

work page