arxiv: 2604.18957 · v1 · submitted 2026-04-21 · 💻 cs.CV

Recognition: unknown

Bridging Foundation Models and ASTM Metallurgical Standards for Automated Grain Size Estimation from Microscopy Images

Abdul Mueez , Shruti Vyas

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords grain size estimationASTM E112microscopy imagesinstance segmentationfew-shot learningfoundation modelsmaterials characterizationCellpose-SAM

0 comments

The pith

Adapted Cellpose-SAM estimates ASTM grain size with 1.5% error from just two training images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an automated pipeline that segments grains in microscopy images of metal microstructures and computes the standardized ASTM grain size number. It starts from the Cellpose-SAM foundation model, adds topology-aware gradient tracking to keep grains separate, and directly links the output to the ASTM E112 Jeffries planimetric method for calculating the grain size number G. The system reaches a mean absolute percentage error of 1.50 percent when trained on only two labeled samples, outperforming a standard U-Net, a domain-specific MatSAM, and a vision-language model. This matters for materials characterization because grain size strongly influences mechanical properties, yet manual counting is slow and subjective while most deep-learning approaches demand large training sets. The work also tests the pipeline across different target grain counts to confirm the ASTM recommendation of at least 50 grains for reliable sampling.

Core claim

Adapting Cellpose-SAM with topology-aware gradient tracking and integrating an ASTM E112 Jeffries planimetric module produces dense instance segmentation that preserves grain boundaries, enabling prediction of the ASTM grain size number G at a mean absolute percentage error of 1.50 percent using only two training samples while robustness checks across varying grain counts empirically support the ASTM 50-grain minimum.

What carries the argument

The adapted Cellpose-SAM model using topology-aware gradient tracking, combined with an ASTM E112 Jeffries planimetric module for grain size calculation.

If this is right

Grain size analysis can shift from manual counting or large-dataset training to reliable automation with minimal labeled examples.
The pipeline outperforms both classical networks like U-Net and other foundation-model baselines on this microstructure task.
Empirical checks across grain counts directly support the long-standing ASTM guideline of sampling at least 50 grains.
Foundation models become practical for standardized industrial measurements once domain-specific topology rules and calculation modules are added.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptation pattern could extend to other quantitative microscopy measurements such as phase fraction or inclusion counting.
Few-shot performance lowers the barrier for small materials labs that cannot assemble hundreds of annotated images.
Further tests on images from varied alloys or different microscopes would clarify how far the topological separation holds.
The success of connectivity-preserving tracking suggests similar mechanisms may help dense object segmentation in other scientific imaging domains.

Load-bearing premise

That the specific additions of topology-aware gradient tracking and ASTM integration to Cellpose-SAM are what produce the few-shot accuracy and clean boundary separation, rather than quirks of the image dataset or unstated training choices.

What would settle it

Running the same two-sample training experiments with the unmodified original Cellpose-SAM and obtaining a MAPE at or below 1.50 percent would show that the reported adaptations are not required for the performance.

Figures

Figures reproduced from arXiv: 2604.18957 by Abdul Mueez, Shruti Vyas.

**Figure 1.** Figure 1: Proposed automated metallographic analysis workflow. A microscopy image with known magnification is processed by a [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Application of the Jeffries planimetric method. Whole [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of instance segmentation across architectures. Zero-shot Cellpose-SAM under-segments significantly, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Example of the visual input to the Vision-Language [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: Qualitative zero-shot performance comparison on four [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Extracting standardized metallurgical metrics from microscopy images remains challenging due to complex grain morphology and the data demands of supervised segmentation. To bridge foundational computer vision with practical metallurgical evaluation, we propose an automated pipeline for dense instance segmentation and grain size estimation that adapts Cellpose-SAM to microstructures and integrates its topology-aware gradient tracking with an ASTM E112 Jeffries planimetric module. We systematically benchmark this pipeline against a classical convolutional network (U-Net), an adaptive-prompting vision foundation model (MatSAM) and a contemporary vision-language model (Qwen2.5-VL-7B). Our evaluations reveal that while the out-of-the-box vision-language model struggles with the localized spatial reasoning required for dense microscopic counting and MatSAM suffers from over-segmentation despite its domain-specific prompt generation, our adapted pipeline successfully maintains topological separation. Furthermore, experiments across progressively reduced training splits demonstrate exceptional few-shot scalability; utilizing only two training samples, the proposed system predicts the ASTM grain size number (G) with a mean absolute percentage error (MAPE) as low as 1.50%, while robustness testing across varying target grain counts empirically validates the ASTM 50-grain sampling minimum. These results highlight the efficacy of application-level foundation model integration for highly accurate, automated materials characterization. Our project repository is available at https://github.com/mueez-overflow/ASTM-Grain-Size-Estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts Cellpose-SAM with ASTM Jeffries integration for grain sizing and claims 1.5% MAPE on G from two training images, but the result needs ablations to separate pipeline from data effects.

read the letter

The core thing to know is that this work takes Cellpose-SAM, adds topology-aware gradient tracking and a direct ASTM E112 Jeffries planimetric module, then shows it can estimate grain size number G with low error even when trained on only two images. They benchmark against U-Net, MatSAM, and Qwen2.5-VL and report that their version avoids the over-segmentation seen in MatSAM while beating the others on the few-shot splits. The repo link is a plus for anyone who wants to try the pipeline on their own micrographs. What stands out is the explicit tie to the metallurgical standard rather than just another segmentation metric; that makes the output directly usable in a lab setting. The few-shot scalability claim is the strongest part of the abstract, and the 50-grain robustness check lines up with existing ASTM guidance. On the soft side, the headline 1.50% MAPE rests on the assumption that the specific adaptations drive the performance. The abstract gives no ablation that removes the topology tracking or the ASTM module one at a time, and it does not report total dataset size, how the two training images were chosen, or variance across different pairs. If the microstructures in the collection have limited grain-size spread, the low error could be partly an artifact of the data rather than the method. Statistical tests on the MAPE differences are also not mentioned. This paper is aimed at materials characterization groups that already run ASTM grain-size measurements and want to cut down on manual counting, or at applied CV researchers looking for a concrete domain-adaptation example. A reader who needs a working pipeline with standard-compliant output will find the numbers and the code useful even if they later add their own controls. It is worth sending to peer review because the application is practical, the comparisons are head-to-head, and the few-shot angle is worth checking with more data detail. Ask the authors for the missing ablations and split information before a final decision.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an automated pipeline adapting Cellpose-SAM for dense instance segmentation of grains in microscopy images, integrating its topology-aware gradient tracking with an ASTM E112 Jeffries planimetric module to estimate the ASTM grain size number G. It benchmarks the pipeline against U-Net, MatSAM, and Qwen2.5-VL-7B, claiming better topological separation and, in few-shot experiments, a MAPE as low as 1.50% for G prediction using only two training samples. Robustness tests across varying target grain counts are said to empirically validate the ASTM 50-grain sampling minimum. The code repository is made public.

Significance. If the few-shot performance and attribution to the specific adaptations hold, the work would be significant for materials characterization by showing how foundation models can be tailored to produce standardized ASTM metrics with minimal supervision. The explicit integration with the Jeffries planimetric method and the public repository are strengths that support reproducibility and practical adoption in metallurgy.

major comments (2)

[Abstract and Experiments section] Abstract and Experiments section: the central claim of 1.50% MAPE for ASTM G using only two training samples lacks any reported details on total dataset size, train/test split, how the two samples were chosen, variance across different sample selections, or statistical tests. This information is load-bearing for the few-shot scalability assertion and the conclusion that the pipeline (rather than data characteristics) drives the result.
[Methods (§3) and Results (§4)] Methods (§3) and Results (§4): no ablation studies isolate the contribution of the topology-aware gradient tracking or the ASTM Jeffries integration from the base Cellpose-SAM, hyperparameter choices, or dataset properties. Without these, the superiority over MatSAM (over-segmentation) and the attribution of topological separation cannot be rigorously supported.

minor comments (1)

[Abstract] Abstract: the phrase 'systematically benchmark' would be clearer if the full set of metrics (beyond MAPE for G) such as segmentation IoU or boundary accuracy were listed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for improving the clarity and rigor of our few-shot claims and component attributions. We address each major comment below and will revise the manuscript to incorporate additional details and experiments as outlined.

read point-by-point responses

Referee: [Abstract and Experiments section] Abstract and Experiments section: the central claim of 1.50% MAPE for ASTM G using only two training samples lacks any reported details on total dataset size, train/test split, how the two samples were chosen, variance across different sample selections, or statistical tests. This information is load-bearing for the few-shot scalability assertion and the conclusion that the pipeline (rather than data characteristics) drives the result.

Authors: We agree that these experimental details are critical for substantiating the few-shot scalability claims. In the revised manuscript, we will expand the Experiments section (and update the abstract if space permits) to report: the total dataset size and composition, the exact train/test split methodology, the selection criteria for the two training samples (e.g., ensuring representation across grain size distributions), variance and standard deviations computed across multiple independent selections of the two-sample subsets, and statistical measures such as confidence intervals or paired t-tests where appropriate. This will clarify that performance derives from the pipeline rather than idiosyncratic data properties. revision: yes
Referee: [Methods (§3) and Results (§4)] Methods (§3) and Results (§4): no ablation studies isolate the contribution of the topology-aware gradient tracking or the ASTM Jeffries integration from the base Cellpose-SAM, hyperparameter choices, or dataset properties. Without these, the superiority over MatSAM (over-segmentation) and the attribution of topological separation cannot be rigorously supported.

Authors: We acknowledge that explicit ablation studies would strengthen the attribution of improvements to the topology-aware gradient tracking and Jeffries planimetric integration. While the existing benchmarks against MatSAM (which uses different prompting) and other baselines provide comparative evidence of better topological separation and reduced over-segmentation, we will add targeted ablation experiments in the revised Results section. These will include controlled variants of the pipeline with and without the topology-aware components, with and without the ASTM integration module, while holding hyperparameters and dataset splits fixed. This will enable more rigorous isolation of each contribution. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline results independent of inputs

full rationale

The paper describes an empirical adaptation of Cellpose-SAM with topology-aware tracking and ASTM E112 Jeffries integration for grain segmentation and G-number prediction. All reported outcomes, including the 1.50% MAPE on two-sample few-shot splits and validation of the 50-grain rule, are presented as measured performance on held-out microstructures rather than any closed-form derivation, parameter fit renamed as prediction, or self-referential definition. No equations appear, no self-citations are invoked to justify uniqueness or ansatzes, and the central claims rest on benchmark comparisons and ablation-style split experiments that remain falsifiable against external data. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical success of the adapted segmentation model on microscopy images of metallic microstructures; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5550 in / 1130 out tokens · 33080 ms · 2026-05-10T03:40:49.105608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Standard test methods for determining average grain size,
[2]

Khaled Adam, Dana Z ¨ollner, and David P Field. 3d mi- crostructural evolution of primary recrystallization and grain growth in cold rolled single-phase aluminum alloys.Mod- elling and Simulation in Materials Science and Engineering, 26(3):035011, 2018. 1

2018
[3]

Matsam public dataset repository: AZA, NBS-2, NBS-3, and UHCS

Armand Albert. Matsam public dataset repository: AZA, NBS-2, NBS-3, and UHCS. GitHub repository, 2024. https://github.com/USTB- AI3DVIP/matsam/ tree/main/dataset. 7, 8

2024
[4]

Computational model of mechano-electrochemical effect of aluminum alloys corrosion.Journal of Engineering for Gas Turbines and Power, 144(4):041004, 2022

Hessein Ali, Zachary Stein, Quentin Fouliard, Hossein Ebrahimi, Peter Warren, Seetha Raghavan, and Ranajay Ghosh. Computational model of mechano-electrochemical effect of aluminum alloys corrosion.Journal of Engineering for Gas Turbines and Power, 144(4):041004, 2022. 1

2022
[5]

John wiley & sons, 2020

William D Callister Jr and David G Rethwisch.Materials science and engineering: an introduction. John wiley & sons, 2020. 2

2020
[6]

New methodologies for grain boundary detection in ebsd data of microstructures

Richard K Catania, Arulmurugan Senthilnathan, John Sions, Kyle Snyder, Huda Al-Ghaib, Ben Zimmerman, and Pinar Acar. New methodologies for grain boundary detection in ebsd data of microstructures. InAIAA SCITECH 2022 Fo- rum, page 1424, 2022. 1

2022
[7]

Fei Chen, Zhenshan Cui, Juan Liu, Xiaoxun Zhang, and Wen Chen. Modeling and simulation on dynamic recrystalliza- tion of 30cr2ni4mov rotor steel using the cellular automaton method.Modelling and Simulation in Materials Science and Engineering, 17(7):075015, 2009. 1

2009
[8]

Grain boundary detection in microstructure images using computa- tional intelligence.Computers in industry, 56(8-9):854–866,

Orhan Dengiz, Alice E Smith, and Ian Nettleship. Grain boundary detection in microstructure images using computa- tional intelligence.Computers in industry, 56(8-9):854–866,
[9]

Effect of grain size on the tensile properties of magnesium alloy.Materials Science and Engineering: A, 459(1-2):355–360, 2007

Choong Do Lee. Effect of grain size on the tensile properties of magnesium alloy.Materials Science and Engineering: A, 459(1-2):355–360, 2007. 1

2007
[10]

Ex- ploring the limits of out-of-distribution detection.Advances in neural information processing systems, 34:7068–7081,

Stanislav Fort, Jie Ren, and Balaji Lakshminarayanan. Ex- ploring the limits of out-of-distribution detection.Advances in neural information processing systems, 34:7068–7081,
[11]

Metal additive manufacturing: a review

William E Frazier. Metal additive manufacturing: a review. Journal of Materials Engineering and performance, 23(6): 1917–1928, 2014. 1

1917
[12]

Pretrained transformers improve out-of-distribution robustness

Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, and Dawn Song. Pretrained transformers improve out-of-distribution robustness. InPro- ceedings of the 58th annual meeting of the association for computational linguistics, pages 2744–2751, 2020. 2

2020
[13]

SI Heo, JC Yun, KS Oh, and KS Han. Influence of parti- cle size and shape on electrical and mechanical properties of graphite reinforced conductive polymer composites for the bipolar plate of pem fuel cells.Advanced composite materi- als, 15(1):115–126, 2006. 1

2006
[14]

Carl Herriott, Xuxiao Li, Nadia Kouraytem, Vahid Tari, Wenda Tan, Benjamin Anglin, Anthony D Rollett, and Ash- ley D Spear. A multi-scale, multi-physics modeling frame- work to predict spatial variation of properties in additive- manufactured metals.Modelling and Simulation in Mate- rials Science and Engineering, 27(2):025009, 2019. 1

2019
[15]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 2, 4

2023
[16]

Naturalbench: Eval- uating vision-language models on natural adversarial sam- ples.Advances in Neural Information Processing Systems, 37:17044–17068, 2024

Baiqi Li, Zhiqiu Lin, Wenxuan Peng, Jean de Dieu Nyandwi, Daniel Jiang, Zixian Ma, Simran Khanuja, Ranjay Krishna, Graham Neubig, and Deva Ramanan. Naturalbench: Eval- uating vision-language models on natural adversarial sam- ples.Advances in Neural Information Processing Systems, 37:17044–17068, 2024. 8

2024
[17]

A novel training-free approach to efficiently extracting material microstructures via visual large model.Acta Materialia, 290: 120962, 2025

Changtai Li, Xu Han, Chao Yao, Yu Guo, Zixin Li, Lei Jiang, Wei Liu, Haiyou Huang, Huadong Fu, and Xiaojuan Ban. A novel training-free approach to efficiently extracting material microstructures via visual large model.Acta Materialia, 290: 120962, 2025. 2

2025
[18]

Im- agegrains 2.0: Improved precision and generalization for grain segmentation.EGUsphere, 2026:1–31, 2026

David Mair, Guillaume Witz, Ariel Do Prado, Philippos Garefalakis, Amanda Wild, Fanny Ville, Bennet Schuster, Michael Horn, J ¨urgen ¨Osterle, Stefano C Fabbri, et al. Im- agegrains 2.0: Improved precision and generalization for grain segmentation.EGUsphere, 2026:1–31, 2026. 2

2026
[19]

Field-assisted sin- tering.Science and Applications, 645, 2018

Eugene A Olevsky and Dina V Dudina. Field-assisted sin- tering.Science and Applications, 645, 2018. 1

2018
[20]

Cellpose-sam: superhuman generalization for cellular seg- mentation.BioRxiv, pages 2025–04, 2025

Marius Pachitariu, Michael Rariden, and Carsen Stringer. Cellpose-sam: superhuman generalization for cellular seg- mentation.BioRxiv, pages 2025–04, 2025. 2, 4

2025
[21]

Automated grain boundary detection for bright-field transmission electron mi- croscopy images via u-net.Microscopy and Microanalysis, 29(6):1968–1979, 2023

Matthew J Patrick, James K Eckstein, Javier R Lopez, Sil- via Toderas, Sarah A Asher, Sylvia I Whang, Stacey Levine, Jeffrey M Rickman, and Katayun Barmak. Automated grain boundary detection for bright-field transmission electron mi- croscopy images via u-net.Microscopy and Microanalysis, 29(6):1968–1979, 2023. 2, 4

1968
[22]

Roberto Perera, Davide Guzzetti, and Vinamra Agrawal. Op- timized and autonomous machine learning framework for characterizing pores, particles, grains and grain boundaries in microstructural images.Computational Materials Sci- ence, 196:110524, 2021. 1, 2

2021
[23]

Lvlm-count: Enhancing the count- ing ability of large vision-language models.arXiv preprint arXiv:2412.00686, 2024

Muhammad Fetrat Qharabagh, Mohammadreza Ghofrani, and Kimon Fountoulakis. Lvlm-count: Enhancing the count- ing ability of large vision-language models.arXiv preprint arXiv:2412.00686, 2024. 8

work page arXiv 2024
[24]

U- net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 4

2015
[25]

Robust image- based cross-sectional grain boundary detection and charac- terization using machine learning.Journal of Intelligent Manufacturing, 36(5):3067–3095, 2025

Nicholas Satterlee, Runjian Jiang, Eugene Olevsky, Elisa Torresani, Xiaowei Zuo, and John S Kang. Robust image- based cross-sectional grain boundary detection and charac- terization using machine learning.Journal of Intelligent Manufacturing, 36(5):3067–3095, 2025. 1, 2, 4

2025
[26]

Influence of grain size on mechan- ical properties of aluminium gta weld metal.Welding in the World, 57(3):293–304, 2013

Philipp Schempp, CE Cross, Ralf H ¨acker, Andreas Pittner, and Michael Rethmeier. Influence of grain size on mechan- ical properties of aluminium gta weld metal.Welding in the World, 57(3):293–304, 2013. 1

2013
[27]

Microstructure modelling for metallic additive man- ufacturing: a review.Virtual and Physical Prototyping, 15 (1):87–105, 2020

Joel Heang Kuan Tan, Swee Leong Sing, and Wai Yee Yeong. Microstructure modelling for metallic additive man- ufacturing: a review.Virtual and Physical Prototyping, 15 (1):87–105, 2020. 1

2020
[28]

Sheikh M Uddin, Tanvir Mahmud, Christoph Wolf, Carsten Glanz, Ivica Kolaric, Christoph V olkmer, Helmut H¨oller, Ul- rich Wienecke, Siegmar Roth, and Hans-J ¨org Fecht. Effect of size and shape of metal particles to improve hardness and electrical properties of carbon nanotube reinforced copper and copper alloy composites.Composites Science and Tech- nolo...

2010
[29]

Vision Language Models are Biased

An V o, Khai-Nguyen Nguyen, Mohammad Reza Tae- siri, Vy Tuong Dang, Anh Totti Nguyen, and Daeyoung Kim. Vision language models are biased.arXiv preprint arXiv:2505.23941, 2025. 8

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Effect of grain size on mechanical properties of nanocrystalline mate- rials.Acta Metallurgica et Materialia, 43(2):519–528, 1995

Ning Wang, Zhirui Wang, KT Aust, and Uwe Erb. Effect of grain size on mechanical properties of nanocrystalline mate- rials.Acta Metallurgica et Materialia, 43(2):519–528, 1995. 1

1995
[31]

Artificial grains and real grains

Peter Warren. Artificial grains and real grains. Kag- gle Dataset, 2023.https : / / www . kaggle . com / datasets/peterwarren/voronoi-artificial- grains-gen. 2

2023
[32]

Exone stainless steel 316l grains 500x

Peter Warren. Exone stainless steel 316l grains 500x. Kaggle Dataset, 2023.https : / / www . kaggle . com / datasets / peterwarren / exone - stainless-steel-316l-grains-500x. 3, 5

2023
[33]

Grain and grain boundary segmentation using machine learning with real and generated datasets.Computational Materials Science, 233:112739,

Peter Warren, Nandhini Raju, Abhilash Prasad, Md Shahja- han Hossain, Ramesh Subramanian, Jayanta Kapat, Navin Manjooran, and Ranajay Ghosh. Grain and grain boundary segmentation using machine learning with real and generated datasets.Computational Materials Science, 233:112739,
[34]

A review of metal additive manufacturing technologies.Solid State Phenomena, 278:1–14, 2018

Mostafa Yakout, MA Elbestawi, and Stephen C Veldhuis. A review of metal additive manufacturing technologies.Solid State Phenomena, 278:1–14, 2018. 1

2018
[35]

Grain structure control of additively manufactured metallic materials.Mate- rials, 10(11):1260, 2017

Fuyao Yan, Wei Xiong, and Eric J Faierson. Grain structure control of additively manufactured metallic materials.Mate- rials, 10(11):1260, 2017. 1

2017