pith. machine review for the scientific record. sign in

arxiv: 2604.21321 · v1 · submitted 2026-04-23 · 💻 cs.CV

Recognition: unknown

FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords frying oil oxidationnon-destructive assessmentdual-stream networkadversarial domain adaptationthermal imagingRGB-MAE encoderfood safety monitoringsegmentation and regression
0
0 comments X

The pith

FryNet fuses RGB and thermal streams with adversarial regularization to assess frying oil oxidation non-destructively through joint segmentation, classification, and chemical regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current practice relies on destructive wet-chemistry tests that destroy samples and provide no spatial information. The paper demonstrates a dual-stream network that processes paired RGB and thermal video frames to segment oil regions, classify serviceability, and regress four oxidation indices in one pass. It counters the camera-fingerprint shortcut by using masked autoencoding in the RGB stream, attention in the thermal stream, and gradient-reversal adversarial training that removes video identity signals. This setup lets the model focus on oxidation chemistry instead of sensor artifacts, enabling real-time monitoring without lab assays.

Core claim

FryNet is a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices using a ThermalMiT-B2 backbone with attention for thermal features, an RGB-MAE encoder for chemically aligned representations, Dual-Encoder DANN with gradient reversal to adversarially remove video identity, and FiLM fusion to combine the streams, achieving 98.97 percent mIoU, 100 percent classification accuracy, and 2.32 mean regression MAE on 7,226 frames from 28 videos.

What carries the argument

Dual-Encoder DANN with Gradient Reversal Layers that adversarially regularizes both the RGB-MAE chemical encoder and ThermalMiT-B2 thermal backbone against video identity, bridged by FiLM fusion to integrate structure and chemistry.

If this is right

  • Enables simultaneous segmentation, classification, and regression of oxidation indices without separate models or destructive sampling.
  • Suppresses video-specific biases so the same model works across different frying sessions and cameras.
  • Delivers spatial maps of oil condition together with quantitative chemical predictions in a single forward pass.
  • Outperforms seven separate baseline architectures on the collected paired-frame dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-stream adversarial pattern could be applied to other thermal food processes such as baking or deep-fat frying of different products.
  • Real-time deployment on kitchen cameras could trigger automatic oil-change alerts, cutting both waste and health risks from degraded oil.
  • Adding more sensor modalities or longer temporal context might further improve robustness when camera or lighting conditions vary.

Load-bearing premise

The adversarial training successfully forces the model to learn oxidation chemistry rather than camera-specific noise or other spurious correlations present in the thermal and RGB streams.

What would settle it

Performance drop on a held-out test set of frying videos recorded with new cameras or under changed lighting and frying conditions that were never seen during training.

Figures

Figures reproduced from arXiv: 2604.21321 by Amer AbuGhazaleh, Khaled R Ahmed, Tamany M Alanezi, Taminul Islam, Toqi Tahamid Sarker.

Figure 1
Figure 1. Figure 1: mIoU vs. computational cost (GFLOPs). Marker size [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dataset samples. Top: fresh oil (good). Bottom: de￾graded oil (replace). Each frame carries paired thermal/RGB im￾ages with four regression targets. that addresses this shortcut while performing segmentation, classification, and regression in a single forward pass ( [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FryNet architecture. The thermal stream (top) processes FLIR images through ThermalMiT-B2 with TCA/TSA attention at each stage, producing multi-scale features F1–F4 that are merged via multi-scale fusion. The RGB-MAE encoder (bottom) concatenates thermal patches with RGB inputs and learns context features via masked autoencoding with chemical alignment. FiLM fusion combines both streams before feeding into… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of chemical oxidation indices across 28 oil samples. Colors indicate segmentation class (green = good, Totox < 25; red = replace, Totox ≥ 25); markers distinguish oil type (◦ = corn, △ = canola). Dashed line marks the Totox = 25 classification threshold. tox ranges from 5.8 (fresh canola) to 76.6 (degraded canola), spanning an order-of-magnitude variation in oxidation load [PITH_FULL_IMAGE:fi… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative segmentation comparison on representative test frames from two [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: t-SNE of backbone features on the test set, colored by [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Monitoring frying oil degradation is critical for food safety, yet current practice relies on destructive wet-chemistry assays that provide no spatial information and are unsuitable for real-time use. We identify a fundamental obstacle in thermal-image-based inspection, the camera-fingerprint shortcut, whereby models memorize sensor-specific noise and thermal bias instead of learning oxidation chemistry, collapsing under video-disjoint evaluation. We propose FryNet, a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices (PV, p-AV, Totox, temperature) in a single forward pass. A ThermalMiT-B2 backbone with channel and spatial attention extracts thermal features, while an RGB-MAE Encoder learns chemically grounded representations via masked autoencoding and chemical alignment. Dual-Encoder DANN adversarially regularizes both streams against video identity via Gradient Reversal Layers, and FiLM fusion bridges thermal structure with RGB chemical context. On 7,226 paired frames across 28 frying videos, FryNet achieves 98.97% mIoU, 100% classification accuracy, and 2.32 mean regression MAE, outperforming all seven baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper identifies the 'camera-fingerprint shortcut' as a failure mode in thermal-image models for frying-oil monitoring, where networks exploit sensor noise rather than oxidation chemistry and collapse under video-disjoint splits. It proposes FryNet, a dual-stream RGB-thermal architecture that performs joint oil-region segmentation, serviceability classification, and regression of four chemical indices (PV, p-AV, Totox, temperature) using a ThermalMiT-B2 backbone with attention, an RGB-MAE encoder with chemical alignment, Dual-Encoder DANN with Gradient Reversal Layers on both streams, and FiLM fusion. On 7,226 paired frames from 28 videos the model reports 98.97% mIoU, 100% classification accuracy and 2.32 mean regression MAE, outperforming seven baselines.

Significance. If the adversarial regularization demonstrably eliminates video-identity shortcuts and the reported metrics are obtained under properly video-disjoint evaluation with ablations, the work would offer a practical non-destructive, spatially resolved alternative to wet-chemistry assays for real-time frying-oil quality control in food safety applications.

major comments (3)
  1. [Abstract] Abstract: the headline metrics (98.97% mIoU, 100% accuracy, 2.32 MAE) are presented without error bars, statistical significance tests, or any description of the video-disjoint train/test partitioning across the 28 videos; given the small number of source videos, residual thermal bias or sensor correlations could still exist between partitions, directly undermining the claim that performance reflects oxidation chemistry rather than camera fingerprints.
  2. [Method (Dual-Encoder DANN and RGB-MAE sections)] The description of the Dual-Encoder DANN with Gradient Reversal Layers and RGB-MAE chemical alignment asserts that these components force the model to ignore video identity, yet no ablation isolating the adversarial loss term, no t-SNE or mutual-information analysis showing feature independence from video ID, and no comparison of performance with/without GRL are supplied; these omissions leave the central regularization claim unverified.
  3. [Experiments] The experimental section reports results against seven baselines but supplies no implementation details, hyper-parameter settings, or confirmation that the baselines were also evaluated under identical video-disjoint splits; without this information the claim of consistent outperformance cannot be assessed.
minor comments (2)
  1. [Abstract] Notation for the four chemical targets (PV, p-AV, Totox, temperature) and the precise definition of 'mean regression MAE' should be stated explicitly in the abstract or early methods.
  2. [Figures] Figure captions and axis labels for any qualitative segmentation or attention maps should include the video ID or frame index to allow readers to verify that train and test videos are visually distinct.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify how to better substantiate our claims about eliminating camera-fingerprint shortcuts. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline metrics (98.97% mIoU, 100% accuracy, 2.32 MAE) are presented without error bars, statistical significance tests, or any description of the video-disjoint train/test partitioning across the 28 videos; given the small number of source videos, residual thermal bias or sensor correlations could still exist between partitions, directly undermining the claim that performance reflects oxidation chemistry rather than camera fingerprints.

    Authors: We agree that error bars, significance tests, and explicit partitioning details would strengthen the abstract. In the revision we will report standard deviations across repeated runs, include statistical significance tests for key comparisons, and describe the video-disjoint split (e.g., number of videos per partition and the exact protocol ensuring no video overlap). revision: yes

  2. Referee: [Method (Dual-Encoder DANN and RGB-MAE sections)] The description of the Dual-Encoder DANN with Gradient Reversal Layers and RGB-MAE chemical alignment asserts that these components force the model to ignore video identity, yet no ablation isolating the adversarial loss term, no t-SNE or mutual-information analysis showing feature independence from video ID, and no comparison of performance with/without GRL are supplied; these omissions leave the central regularization claim unverified.

    Authors: The video-disjoint evaluation already demonstrates that unregularized models collapse while FryNet does not, supporting the design. To directly verify the contribution of the adversarial term we will add, in the revision, an ablation isolating the adversarial loss, performance numbers with and without GRL, and t-SNE plots with mutual-information quantification showing reduced dependence on video ID. revision: yes

  3. Referee: [Experiments] The experimental section reports results against seven baselines but supplies no implementation details, hyper-parameter settings, or confirmation that the baselines were also evaluated under identical video-disjoint splits; without this information the claim of consistent outperformance cannot be assessed.

    Authors: We will expand the experimental section to include full implementation details, hyper-parameter tables for FryNet and all baselines, and an explicit statement confirming that every baseline was trained and tested under the identical video-disjoint protocol used for FryNet. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML pipeline with standard adversarial components

full rationale

The paper presents a dual-encoder architecture (ThermalMiT-B2 + RGB-MAE) trained with DANN, gradient reversal layers, FiLM fusion, and multi-task losses for segmentation/classification/regression. Reported metrics (98.97% mIoU, 100% accuracy, 2.32 MAE) are obtained from supervised training and evaluation on 7,226 frames from 28 videos under video-disjoint splits. No equations, predictions, or first-principles derivations are shown that reduce by construction to fitted parameters or self-citations; the central claims rest on experimental outcomes rather than tautological redefinitions or load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Ledger is necessarily incomplete because only the abstract is available; full paper would list exact hyperparameters and any additional domain assumptions.

free parameters (1)
  • Model hyperparameters and attention weights
    Standard deep-learning parameters not enumerated in abstract but required for the reported performance.
axioms (1)
  • domain assumption Thermal-image models primarily learn camera-specific noise and bias rather than oxidation chemistry unless explicitly regularized against video identity.
    Presented as the fundamental obstacle identified by the authors.

pith-pipeline@v0.9.0 · 5529 in / 1390 out tokens · 42973 ms · 2026-05-09T22:00:37.343103+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 3 canonical work pages · 3 internal anchors

  1. [1]

    Improving prediction of peroxide value of edible oils using regularized regression models.Foods,

    Amira Al-Alawi et al. Improving prediction of peroxide value of edible oils using regularized regression models.Foods,

  2. [2]

    American Oil Chemists’ Society, 7th edition,

    AOCS.Official Methods and Recommended Practices of the American Oil Chemists’ Society, Method Cd 18-90: p- Anisidine Value. American Oil Chemists’ Society, 7th edition,

  3. [3]

    Muhammad Aqeel, Hifza Munawar, Ahmed Sohaib, Khan Ba- hadar Khan, and Yiming Deng. Spectral band selection for nondestructive detection of edible oil adulteration using hy- perspectral imaging and chemometric analysis.Journal of Food Measurement and Characterization, 20(2):1482–1503,

  4. [4]

    Semanur Aydin, Umut Sayin, Mehmet ¨Ozge Sezer, and Seher Sayar. Antioxidant efficiency of citrus peels on oxidative stability during repetitive deep-fat frying: Evaluation with EPR and conventional methods.Journal of Food Processing and Preservation, 45(7), 2021. 5

  5. [5]

    Siddhartha Bhattacharya, Aarham Wasit, J Mason Earles, Nitin Nitin, and Jiyoon Yi. Enhancing AI microscopy for foodborne bacterial classification using adversarial domain adaptation to address optical and biological variability.Fron- tiers in Artificial Intelligence, 8, 2025. 2

  6. [6]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021. 2

  7. [7]

    The quality predic- tion of olive and sunflower oils using NIR spectroscopy and chemometrics: A sustainable approach.Sensors, 2025

    Jos´e Antonio Cayuela and Nieves Caliani. The quality predic- tion of olive and sunflower oils using NIR spectroscopy and chemometrics: A sustainable approach.Sensors, 2025. 1, 2

  8. [8]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartmut Adam. Rethinking atrous convolution for semantic image segmentation. InarXiv preprint arXiv:1706.05587,

  9. [9]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. InInternational Conference on Machine Learning (ICML), pages 1597–1607, 2020. 2

  10. [10]

    GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

    Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning (ICML), pages 794–803,

  11. [11]

    Digital detection of olive oil rancidity levels and aroma profiles using near-infrared spectroscopy, a low-cost electronic nose and machine learning modelling

    Daniel Cozzolino et al. Digital detection of olive oil rancidity levels and aroma profiles using near-infrared spectroscopy, a low-cost electronic nose and machine learning modelling. Chemosensors, 10(5):159, 2022. 2

  12. [12]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions (ICLR), 2021. 2

  13. [13]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions (ICLR), 2021. 3

  14. [14]

    Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17 (1):2096–2030, 2016

    Yaroslav Ganin, Evgeniya Ustunova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Franc ¸ois Laviolette, Mario Marc- hand, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of Machine Learning Research, 17 (1):2096–2030, 2016. 2, 3, 4

  15. [15]

    Adversarial training based domain adaptation of skin cancer images.Life, 14(8):1009,

    Syed Qasim Gilani, Muhammad Umair, Maryam Naqvi, Oge Marques, and Hee-Cheol Kim. Adversarial training based domain adaptation of skin cancer images.Life, 14(8):1009,

  16. [16]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, 2022. 2

  17. [17]

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

    Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7482–7491, 2018. 3

  18. [18]

    Application of infrared thermography in identifying plant oils.Foods, 13(24):4090, 2024

    Olga Kopelevich et al. Application of infrared thermography in identifying plant oils.Foods, 13(24):4090, 2024. 2

  19. [19]

    Recent advances and applications of nondestructive testing in agricultural products: A review.Processes, 13(9):2674, 2025

    Mian Li, Hongliang Yin, Fei Gu, Yanjun Duan, Wenxu Zhuang, Kang Han, and Xiaojun Jin. Recent advances and applications of nondestructive testing in agricultural products: A review.Processes, 13(9):2674, 2025. 2

  20. [20]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021. 6

  21. [21]

    A ConvNet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 6

  22. [22]

    Frying oil evaluation by a portable sensor based on dielectric constant measurement.Sensors, 19(24): 5375, 2019

    Huang Lizhi et al. Frying oil evaluation by a portable sensor based on dielectric constant measurement.Sensors, 19(24): 5375, 2019. 1, 2

  23. [23]

    Decoupled weight de- cay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations (ICLR), 2019. 6

  24. [24]

    Fast olive quality assessment through RGB images and advanced convolutional neural network mod- eling.European Food Research and Technology, 2022

    Marco Mancini et al. Fast olive quality assessment through RGB images and advanced convolutional neural network mod- eling.European Food Research and Technology, 2022. 2

  25. [25]

    Gonz´alez-S´aiz, and Consuelo Pizarro

    Taha Mehany, Jos´e M. Gonz´alez-S´aiz, and Consuelo Pizarro. Rapid monitoring and quantification of primary and secondary oxidative markers in edible oils during deep frying using near- infrared spectroscopy and chemometrics.Foods, 15(3):557,

  26. [26]

    Prediction of significant oil properties using image processing based on RGB pixel intensity.Fuel,

    Mohammad Naser et al. Prediction of significant oil properties using image processing based on RGB pixel intensity.Fuel,

  27. [27]

    MMSegmentation: Openmmlab semantic seg- mentation toolbox and benchmark

    OpenMMLab. MMSegmentation: Openmmlab semantic seg- mentation toolbox and benchmark. https://github. com/open-mmlab/mmsegmentation, 2020. 6

  28. [28]

    DI- NOv2: Learning robust visual features without supervision

    Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DI- NOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (TMLR), 2024. 2

  29. [29]

    DI- NOv2: Learning robust visual features without supervision

    Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DI- NOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (TMLR), 2024. 6

  30. [30]

    Compari- son of spectroscopic techniques for determining the peroxide value of 19 classes of naturally aged, plant-based edible oils

    Joshua M Ottaway, J Chance Carter, Kristl L Adams, Joseph Camancho, Barry K Lavine, and Karl S Booksh. Compari- son of spectroscopic techniques for determining the peroxide value of 19 classes of naturally aged, plant-based edible oils. Applied Spectroscopy, 75(6):733–744, 2021. 2

  31. [31]

    FiLM: Visual reasoning with a general conditioning layer

    Ethan Perez, Florian Strub, Harm De Vries, Vincent Du- moulin, and Aaron Courville. FiLM: Visual reasoning with a general conditioning layer. InAAAI Conference on Artificial Intelligence, 2018. 3, 4

  32. [32]

    Infrared thermographic signal analysis of bioactive edible oils using CNNs for quality assessment

    Chiara Pirola et al. Infrared thermographic signal analysis of bioactive edible oils using CNNs for quality assessment. Signals, 6(3):38, 2024. 1, 2

  33. [33]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolber, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feichtenhofer. SAM 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00...

  34. [34]

    An Overview of Multi-Task Learning in Deep Neural Networks

    Sebastian Ruder. An overview of multi-task learning in deep neural networks.arXiv preprint arXiv:1706.05098, 2017. 3

  35. [35]

    Kinnunen

    Md Sahidullah, Hye-jin Shim, Rosa Gonzalez Hautam¨aki, and Tomi H. Kinnunen. Shortcut learning in binary classifier black boxes: Applications to voice anti-spoofing and biometrics. IEEE Journal of Selected Topics in Signal Processing, 2025. 2

  36. [36]

    Review on food quality assess- ment using machine learning and electronic nose system.Re- sults in Engineering, 2023

    Veronica Sberveglieri et al. Review on food quality assess- ment using machine learning and electronic nose system.Re- sults in Engineering, 2023. 2

  37. [37]

    Wanasundara

    Fereidoon Shahidi and Udaya N. Wanasundara. Methods for measuring oxidative rancidity in fats and oils. InFood Lipids: Chemistry, Nutrition, and Biotechnology, pages 465–

  38. [38]

    Video- MAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training

    Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Video- MAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 2, 4

  39. [39]

    Can the image processing technique be potentially used to evaluate quality of frying oil? Journal of Food Quality, 2019

    Patchimaporn Udomkun et al. Can the image processing technique be potentially used to evaluate quality of frying oil? Journal of Food Quality, 2019. 1, 2, 3

  40. [40]

    Applications of thermal imaging in food quality and safety assessment.Food and Bioprocess Technology, 4(2):169–185, 2011

    Rajagopal Vadivambal and Digvir S Jayas. Applications of thermal imaging in food quality and safety assessment.Food and Bioprocess Technology, 4(2):169–185, 2011. 2

  41. [41]

    Convolutional neural networks in the realm of food quality and safety evaluation: Current achievements and future prospects.Trends in Food Science & Technology,

    Xinzhi Wang et al. Convolutional neural networks in the realm of food quality and safety evaluation: Current achievements and future prospects.Trends in Food Science & Technology,

  42. [42]

    CBAM: Convolutional block attention module

    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV), pages 3–19, 2018. 3

  43. [43]

    SegFormer: Simple and efficient design for semantic segmentation with transform- ers

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. SegFormer: Simple and efficient design for semantic segmentation with transform- ers. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 3, 4, 6

  44. [44]

    Gradient surgery for multi- task learning

    Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi- task learning. InAdvances in Neural Information Processing Systems (NeurIPS), pages 5824–5836, 2020. 3

  45. [45]

    Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 2023

    Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruip- ing Liu, and Rainer Stiefelhagen. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 2023. 3, 6

  46. [46]

    Delivering arbitrary-modal semantic segmenta- tion

    Jiaming Zhang, Ruiping Liu, Hao Shi, Kailun Yang, Simon Reiß, Kunyu Peng, Haodong Fu, Kaiwei Wang, and Rainer Stiefelhagen. Delivering arbitrary-modal semantic segmenta- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 1136–1147, 2023. 3, 6

  47. [47]

    Application of deep learning in food: A review.Com- prehensive Reviews in Food Science and Food Safety, 18(6): 1793–1811, 2019

    Lei Zhou, Chao Zhang, Fei Liu, Zhengjun Qiu, and Yong He. Application of deep learning in food: A review.Com- prehensive Reviews in Food Science and Food Safety, 18(6): 1793–1811, 2019. 2 10