pith. machine review for the scientific record. sign in

arxiv: 2604.05960 · v1 · submitted 2026-04-07 · 💻 cs.LG

Recognition: no theorem link

A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis

Sk Miraj Ahmed , Yuewei Lin , Chuntian Cao , Shinjae Yoo , Xinpei Wu , Won-Il Lee , Nikhil Tiwale , Dan N. Le , Thi Thu Huong Chu , Jiyoung Kim , Kevin G. Yager , Chang-Yong Nam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:28 UTC · model grok-4.3

classification 💻 cs.LG
keywords SEMfoundation modelmixture of expertsself-supervised learningimage restorationdefocus correctionmaterials sciencetransformer
0
0 comments X

The pith

A mixture-of-experts transformer pretrained on diverse SEM images serves as a foundation model that generalizes across materials and restores focus from defocused inputs without paired supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish a single pretrained model for scanning electron microscopy that works across many different materials and imaging setups. By training a self-supervised transformer with mixture of experts on a large collection of micrographs, the authors aim to create transferable representations that avoid the need for task-specific training from scratch. They demonstrate this on the problem of recovering sharp images from blurry, defocused ones, where the model succeeds without any matched before-and-after pairs. This would matter if true because SEM imaging is central to materials research yet currently hampered by slow, specialized acquisition and analysis steps that do not scale easily.

Core claim

The authors introduce the first foundation model for SEM images, created by pretraining a mixture of experts transformer architecture on a large corpus of multi-instrument and multi-condition scientific micrographs using self-supervision. This produces representations that generalize to diverse material systems and imaging conditions. As a key application, the model performs defocus-to-focus image translation without paired supervision and outperforms state-of-the-art methods on multiple metrics, laying groundwork for adaptable models that accelerate materials discovery.

What carries the argument

Mixture-of-experts transformer pretrained self-supervised on multi-instrument SEM corpus, enabling transferable representations for tasks like unsupervised defocus restoration.

If this is right

  • The model generalizes across diverse material systems and imaging conditions.
  • It restores focused detail from defocused inputs without paired supervision.
  • It outperforms state-of-the-art techniques across multiple evaluation metrics.
  • It can be fine-tuned or adapted to a wide range of downstream SEM tasks.
  • Such models accelerate materials discovery by bridging representation learning with imaging needs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar pretraining strategies might apply to other high-resolution imaging techniques in materials science, such as atomic force microscopy.
  • The unsupervised restoration capability could integrate into automated SEM workflows to reduce manual focusing time.
  • Mixture-of-experts design may enable efficient handling of varied imaging conditions without increasing inference cost proportionally.
  • Future work could test zero-shot performance on entirely novel material classes not seen during pretraining.

Load-bearing premise

That the representations learned from pretraining on the multi-instrument SEM corpus will generalize to unseen material systems and transfer effectively to the defocus-to-focus task without any paired supervision.

What would settle it

A controlled test on SEM images from a new material or instrument absent from the pretraining set, measuring whether defocus restoration quality drops below supervised baselines or fails to match the reported metrics.

read the original abstract

Scanning Electron Microscopy (SEM) is indispensable in modern materials science, enabling high-resolution imaging across a wide range of structural, chemical, and functional investigations. However, SEM imaging remains constrained by task-specific models and labor-intensive acquisition processes that limit its scalability across diverse applications. Here, we introduce the first foundation model for SEM images, pretrained on a large corpus of multi-instrument, multi-condition scientific micrographs, enabling generalization across diverse material systems and imaging conditions. Leveraging a self-supervised transformer architecture, our model learns rich and transferable representations that can be fine-tuned or adapted to a wide range of downstream tasks. As a compelling demonstration, we focus on defocus-to-focus image translation-an essential yet underexplored challenge in automated microscopy pipelines. Our method not only restores focused detail from defocused inputs without paired supervision but also outperforms state-of-the-art techniques across multiple evaluation metrics. This work lays the groundwork for a new class of adaptable SEM models, accelerating materials discovery by bridging foundational representation learning with real-world imaging needs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the first foundation model for SEM images: a Mixture of Experts self-supervised transformer pretrained on a large corpus of multi-instrument, multi-condition scientific micrographs. It claims this yields rich, transferable representations that generalize across diverse material systems and imaging conditions. The central demonstration is defocus-to-focus image translation performed without paired supervision, with the model asserted to outperform state-of-the-art techniques on multiple evaluation metrics.

Significance. If the generalization and transfer claims are substantiated with appropriate validation, the work would be significant as the first dedicated foundation model for SEM, offering a pathway to reduce reliance on task-specific models and labor-intensive acquisition in materials science.

major comments (1)
  1. [Abstract] Abstract: the central claim that pretraining 'enables generalization across diverse material systems and imaging conditions' is load-bearing yet unsupported by any described cross-domain validation. The defocus-to-focus demonstration (without paired supervision) does not establish out-of-distribution performance unless the evaluation sets are shown to contain entirely unseen material classes, instrument types, or imaging parameters distinct from the pretraining corpus; no such separation is reported.
minor comments (2)
  1. [Abstract] Abstract: the statement that the method 'outperforms state-of-the-art techniques across multiple evaluation metrics' supplies neither the metric names nor any quantitative values, preventing assessment of effect size or comparison to baselines.
  2. [Abstract] Abstract: no information is provided on corpus size, number of instruments/conditions, model scale (e.g., number of experts, total parameters), or training details, all of which are required for reproducibility of a claimed foundation model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting the need for stronger substantiation of the generalization claims. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that pretraining 'enables generalization across diverse material systems and imaging conditions' is load-bearing yet unsupported by any described cross-domain validation. The defocus-to-focus demonstration (without paired supervision) does not establish out-of-distribution performance unless the evaluation sets are shown to contain entirely unseen material classes, instrument types, or imaging parameters distinct from the pretraining corpus; no such separation is reported.

    Authors: We agree that the abstract's generalization claim requires explicit support via documented cross-domain validation. The manuscript describes pretraining on a large multi-instrument, multi-condition corpus and evaluates the defocus-to-focus task on diverse SEM images, but does not report a clear partition showing that evaluation materials, instruments, or parameters are entirely held out from pretraining. To address this, we will add a dedicated subsection detailing the dataset splits, explicitly identifying any unseen material classes and imaging conditions in the test sets, and include additional quantitative results on these out-of-distribution cases. This revision will directly substantiate the claim without altering the core contributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper's central claims rest on empirical pretraining of a self-supervised transformer on a multi-instrument SEM corpus followed by fine-tuning or adaptation for downstream tasks, with a demonstration on unpaired defocus-to-focus translation. No equations, derivations, fitted parameters, or first-principles results are presented that reduce by construction to the inputs. Generalization assertions are framed as experimental outcomes rather than self-definitional or self-citation-dependent necessities. Standard self-supervised practices are used without any load-bearing reduction to prior author work or renaming of known patterns. The work is self-contained against external benchmarks via reported metrics and comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no technical details on free parameters, axioms, or invented entities are provided.

pith-pipeline@v0.9.0 · 5521 in / 1143 out tokens · 66676 ms · 2026-05-10T18:28:16.210343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 7 canonical work pages · 5 internal anchors

  1. [1]

    In: Handbook of Silicon Semiconductor Metrology, pp

    Postek, M.T., Vlad´ ar, A.E.: Critical-dimension metrology and the scanning elec- tron microscope. In: Handbook of Silicon Semiconductor Metrology, pp. 244–275. CRC Press, ??? (2001)

  2. [2]

    Nature electronics1(10), 532–547 (2018)

    Orji, N.G., Badaroglu, M., Barnes, B.M., Beitia, C., Bunday, B.D., Celano, U., Kline, R.J., Neisser, M., Obeng, Y., Vladar, A.: Metrology for the next generation of semiconductor devices. Nature electronics1(10), 532–547 (2018)

  3. [3]

    In: Photomask Technology 2025, vol

    Kumar, D.K., Seeger, A.A., Ganti, P., Mohan, V., Straney, P., Rice, Z., Sun- daramurthy, A., Tavassoli, M.: Resist level critical dimension sem metrology: challenges and developments for euv photomasks. In: Photomask Technology 2025, vol. 13687, pp. 34–44 (2025). SPIE

  4. [4]

    In: Metrology, Inspection, and Process Control XXXVI, vol

    Lorusso, G.F., Beral, C., Bogdanowicz, J., De Simone, D., Hasan, M., Jehoul, C., Moussa, A., Saib, M., Zidan, M., Severi, J.,et al.: Metrology of thin resist for high na euvl. In: Metrology, Inspection, and Process Control XXXVI, vol. 12053, pp. 229–240 (2022). SPIE

  5. [5]

    Microelectronic Engineering190, 33– 37 (2018)

    Lorusso, G.F., Rutigliani, V., Van Roey, F., Mack, C.A.: Unbiased roughness measurements: subtracting out sem effects. Microelectronic Engineering190, 33– 37 (2018)

  6. [6]

    In: Metrology, Inspection, and Process Control for Semiconductor Manufacturing XXXV, vol

    Orji, N.G.: Spectral analysis of line edge and line width roughness using wavelets. In: Metrology, Inspection, and Process Control for Semiconductor Manufacturing XXXV, vol. 11611, pp. 255–266 (2021). SPIE

  7. [7]

    Nature Communications15(1), 948 (2024)

    Schubert, P.J., Saxena, R., Kornfeld, J.: Deepfocus: Fast focus and astigmatism correction for electron microscopy. Nature Communications15(1), 948 (2024)

  8. [8]

    Experimental mechanics59(4), 489–516 (2019)

    Maraghechi, S., Hoefnagels, J., Peerlings, R., Rokoˇ s, O., Geers, M.: Correction of scanning electron microscope imaging artifacts in a novel digital image correlation framework. Experimental mechanics59(4), 489–516 (2019)

  9. [9]

    In: 40th European Mask and Lithography Conference (EMLC 2025), vol

    Abaidi, M., Yang, X., Fang, H., Clifford, C., Meng, R., Gillijns, W.: Analytical methods for sem image enhancement: noise and charging effect reduction for precise contour extraction. In: 40th European Mask and Lithography Conference (EMLC 2025), vol. 13787, pp. 255–270 (2025). SPIE

  10. [10]

    In: Metrology, Inspection, and Process Control XXXIX, vol

    Chung, N.-Y., Harari, Y.: True metrology with reduced resist shrinkage effect for process window optimization and euv stochastic effects analysis. In: Metrology, Inspection, and Process Control XXXIX, vol. 13426, p. 134260 (2025). SPIE

  11. [11]

    Scientific Reports (2025) 30

    Park, H., Oh, B.-S., Jang, K.J.: Deep learning denoising enables rapid sem imag- ing under charging conditions for fe sem, cd sem, and review sem. Scientific Reports (2025) 30

  12. [12]

    Educational resource on common SEM artifacts including charging (n.d.)

    MyScope Training: Image artefacts and trouble-shooting – SEM. Educational resource on common SEM artifacts including charging (n.d.). https://myscope. training/SEM Image artefacts and trouble shooting

  13. [13]

    Technologies9(1), 2 (2020)

    Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technologies9(1), 2 (2020)

  14. [14]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  15. [15]

    On the Opportunities and Risks of Foundation Models

    Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bern- stein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

  16. [16]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

  17. [17]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

  18. [18]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649 (2021)

  19. [19]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017)

  20. [20]

    Journal of Machine Learning Research23(120), 1–39 (2022)

    Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity. Journal of Machine Learning Research23(120), 1–39 (2022)

  21. [21]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  22. [22]

    Journal of the optical society of America62(1), 55–59 (1972)

    Richardson, W.H.: Bayesian-based iterative method of image restoration. Journal of the optical society of America62(1), 55–59 (1972)

  23. [23]

    Astronomical Journal, Vol

    Lucy, L.B.: An iterative technique for the rectification of observed distributions. Astronomical Journal, Vol. 79, p. 745 (1974)79, 745 (1974)

  24. [24]

    The MIT press, ??? (1949)

    Wiener, N.: Extrapolation, Interpolation, and Smoothing of Stationary Time 31 Series: with Engineering Applications. The MIT press, ??? (1949)

  25. [25]

    IEEE Transactions on image processing 16(8), 2080–2095 (2007)

    Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing 16(8), 2080–2095 (2007)

  26. [26]

    Noise2Noise: Learning Image Restoration without Clean Data

    Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189 (2018)

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Krull, A., Buchholz, T.-O., Jug, F.: Noise2void-learning denoising from single noisy images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2019)

  28. [28]

    Acta Materialia214, 116987 (2021)

    Na, J., Kim, G., Kang, S.-H., Kim, S.-J., Lee, S.: Deep learning-based discrim- inative refocusing of scanning electron microscopy images for materials science. Acta Materialia214, 116987 (2021)

  29. [29]

    International journal of computer vision115(3), 211–252 (2015)

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.,et al.: Imagenet large scale visual recognition challenge. International journal of computer vision115(3), 211–252 (2015)

  30. [30]

    Journal of Computer and Communications7(3), 8–18 (2019)

    Sara, U., Akter, M., Uddin, M.S.,et al.: Image quality assessment through fsim, ssim, mse and psnr—a comparative study. Journal of Computer and Communications7(3), 8–18 (2019)

  31. [31]

    IEEE transactions on image processing13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assess- ment: from error visibility to structural similarity. IEEE transactions on image processing13(4), 600–612 (2004)

  32. [32]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

  33. [33]

    completely blind

    Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal processing letters20(3), 209–212 (2012)

  34. [34]

    Journal of Applied Physics111(8) (2012)

    Azarnouche, L., Pargon, E., Menguelti, K., Fouchier, M., Fuard, D., Gouraud, P., Verove, C., Joubert, O.: Unbiased line width roughness measurements with critical dimension scanning electron microscopy and critical dimension atomic force microscopy. Journal of Applied Physics111(8) (2012)

  35. [35]

    Journal of computational and applied mathematics20, 53–65 (1987) 32

    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics20, 53–65 (1987) 32

  36. [36]

    IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (2009)

    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (2009)

  37. [37]

    Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

    Cali´ nski, T., Harabasz, J.: A dendrite method for cluster analysis. Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

  38. [38]

    In: Photomask Technology 2020, vol

    Mochi, I., Vockenhuber, M., Allenet, T., Ekinci, Y.: Open-source software for sem metrology. In: Photomask Technology 2020, vol. 11518, pp. 58–67 (2020). SPIE

  39. [39]

    In: Photomask Technology 2021, vol

    Mochi, I., Vockenhuber, M., Allenet, T., Ekinci, Y.: Contacts and lines sem image metrology with smile. In: Photomask Technology 2021, vol. 11855, p. 1185502 (2021). SPIE

  40. [40]

    In: Advances in Patterning Materials and Processes XL, vol

    Develioglu, A., Allenet, T.P., Vockenhuber, M., Lent-Protasova, L., Mochi, I., Ekinci, Y., Kazazis, D.: The euv lithography resist screening activities in h2-2022. In: Advances in Patterning Materials and Processes XL, vol. 12498, pp. 9–17 (2023). SPIE

  41. [41]

    GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

    Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M., Shazeer, N., Chen, Z.: Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668 (2020)

  42. [42]

    In: Proceedings of 1st International Conference on Image Processing, vol

    Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barlaud, M.: Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proceedings of 1st International Conference on Image Processing, vol. 2, pp. 168–172 (1994). IEEE

  43. [43]

    arXiv preprint arXiv:1511.05440 (2015)

    Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)

  44. [44]

    Physica D: nonlinear phenomena60(1-4), 259–268 (1992)

    Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena60(1-4), 259–268 (1992)

  45. [45]

    Mphill thesis, University of Cambridge89(2000)

    Batten, C.F.: Autofocusing and astigmatism correction in the scanning electron microscope. Mphill thesis, University of Cambridge89(2000)

  46. [46]

    Scientific Reports11(1), 20933 (2021)

    Lee, W., Nam, H.S., Kim, Y.G., Kim, Y.J., Lee, J.H., Yoo, H.: Robust autofo- cusing for scanning electron microscopy based on a dual deep learning network. Scientific Reports11(1), 20933 (2021)

  47. [47]

    American Institute of Physics (1969)

    Goodman, J.W., Cox, M.E.: Introduction to Fourier optics. American Institute of Physics (1969)

  48. [48]

    Methods in cell biology 81, 11–42 (2007)

    Wolf, D.E.: The optics of microscope image formation. Methods in cell biology 81, 11–42 (2007)

  49. [49]

    Physical review letters113(13), 133902 (2014) 33

    Shechtman, Y., Sahl, S.J., Backer, A.S., Moerner, W.E.: Optimal point spread function design for 3d imaging. Physical review letters113(13), 133902 (2014) 33

  50. [50]

    Computers in Biology and Medicine164, 107308 (2023)

    Chong, X., Cheng, M., Fan, W., Li, Q., Leung, H.: M-denoiser: Unsupervised image denoising for real-world optical and electron microscopy data. Computers in Biology and Medicine164, 107308 (2023)

  51. [51]

    Advanced Structural and Chemical Imaging1(1), 3 (2015)

    Mevenkamp, N., Binev, P., Dahmen, W., Voyles, P.M., Yankovich, A.B., Berkels, B.: Poisson noise removal from high-resolution stem images based on periodic block matching. Advanced Structural and Chemical Imaging1(1), 3 (2015)

  52. [52]

    Optica 9(4), 335–345 (2022)

    Mannam, V., Zhang, Y., Zhu, Y., Nichols, E., Wang, Q., Sundaresan, V., Zhang, S., Smith, C., Bohn, P.W., Howard, S.S.: Real-time image denoising of mixed poisson–gaussian noise in fluorescence microscopy images using imagej. Optica 9(4), 335–345 (2022)

  53. [53]

    Materials Advances5(14), 5698–5708 (2024) 34

    Lee, W.-I., Subramanian, A., Kisslinger, K., Tiwale, N., Nam, C.-Y.: Effects of alumina priming on the electrical properties of zno nanostructures derived from vapor-phase infiltration into self-assembled block copolymer thin films. Materials Advances5(14), 5698–5708 (2024) 34