Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning with Vision-Language Models

Qinwu Xu; Yifan Jiang

arxiv: 2605.20416 · v1 · pith:QC2WNFZRnew · submitted 2026-05-19 · 💻 cs.LG · physics.comp-ph

Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning with Vision-Language Models

Qinwu Xu , Yifan Jiang This is my paper

Pith reviewed 2026-05-21 07:42 UTC · model grok-4.3

classification 💻 cs.LG physics.comp-ph

keywords Miller indicesfracture mechanicsvision-language modelslatent reasoningcrystallographic planesmultimodal AIphysics-aware reasoning

0 comments

The pith

Multimodal large language models can map fracture images to Miller index plane hypotheses and reject the representation when physics does not support it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether vision-language models can treat Miller indices as a hidden variable that describes flat fracture surfaces in crystals. Experiments cover synthetic images, paired 2D-3D shapes, and real fractures from ceramics, glass, metals, and concrete. Models succeed at guessing the correct indices when the fracture is idealized and planar, yet they also correctly identify cases where no single crystallographic plane fits the image. This capability matters because it demonstrates models can incorporate specific physical structure rather than relying on surface pattern matching alone.

Core claim

Multimodal large language models can perform latent inference by mapping visual observations of fractures to Miller index hypotheses (h,k,l) under physically valid conditions, and they can conduct latent applicability assessment by determining whether a crystallographic plane representation is meaningful for a given fracture image across synthetic, geometric, and real-world data sets.

What carries the argument

Miller indices z = (h,k,l) used as a latent variable that governs the geometry of idealized planar fracture.

Load-bearing premise

Miller indices constitute a physically valid latent variable for planar fracture in the tested material classes and the chosen images distinguish genuine reasoning from pattern matching.

What would settle it

A clear failure would be the models assigning Miller indices to fractures that are visibly non-planar or non-crystallographic, such as irregular shattering or ductile tearing, while still claiming the representation applies.

Figures

Figures reproduced from arXiv: 2605.20416 by Qinwu Xu, Yifan Jiang.

**Figure 2.** Figure 2: Miller indices planes Planes in the {100} family are aligned with the cube faces and therefore produce square or rectangular cross-sections. Planes in the {110} family intersect two axes, resulting in skewed quadrilateral shapes. In contrast, planes in the {111} family intersect all three axes equally, producing triangular cross-sections. More generally, as the Miller indices increase or become more asymme… view at source ↗

**Figure 3.** Figure 3: Latency variations of index planes within cubic unit [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Two fracture planes and that with higher index [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Fractures of: a) glass and b) ceramic 3.3 Consistency Reasoning and Negative Examples To evaluate whether the model uses the latent variable as a structured hypothesis rather than a classification label, we construct explicit consistency and inconsistency cases. These include both positive pairings, where fragment geometry matches the plane orientation, and negative pairings, where the two are incompatible… view at source ↗

**Figure 6.** Figure 6: Fracture of concrete objects of variable scale lengths [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Metal ductile fracture The absence of planar cleavage surfaces is correctly recognized as a key indicator that Miller indices are not applicable. 3.8 Unified Interpretation Across Regimes The experimental results reveal a consistent pattern in model behavior across different fracture scenarios, which can be understood in terms of three distinct regimes. These regimes correspond to whether the underlying fr… view at source ↗

**Figure 8.** Figure 8: Representative examples of different regime [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

We study whether multimodal large language models (MLLMs) can leverage crystallographic plane indices (Miller indices) as a structured latent representation for reasoning about fracture geometry. We formulate Miller indices $z = (h,k,l)$ as a latent variable governing idealized planar fracture and evaluate two complementary capabilities: (i) latent inference, where the model maps visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, where the model determines whether such a representation is meaningful for a given fracture image. Through extensive experiments spanning synthetic data, controlled 2D--3D geometric pairs, and real-world fracture images across multiple material classes -- including ceramics, glass, metals, and concrete -- we show that MLLMs can reliably perform latent inference in idealized settings and, critically, can reject the latent representation when the underlying physics does not support it. These results suggest that MLLMs can act as physics-aware reasoning systems conditioned on structured latent priors, provided that the domain of validity is explicitly modeled.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper investigates whether multimodal large language models (MLLMs) can treat Miller indices z = (h,k,l) as a structured latent variable for reasoning about idealized planar fracture geometry. It evaluates two capabilities: (i) latent inference, mapping visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, determining when the crystallographic representation is meaningful for a given fracture image. Experiments span synthetic data, controlled 2D–3D geometric pairs, and real-world images across ceramics, glass, metals, and concrete, with the central claim that MLLMs can perform reliable inference in idealized settings and correctly reject the latent when physics does not support it.

Significance. If the empirical results hold under rigorous controls, the work would demonstrate that MLLMs can condition reasoning on explicit physical structure (crystallographic latents) rather than purely statistical correlations, with the rejection capability providing evidence of domain-aware behavior. This could support broader use of vision-language models as physics-informed reasoning engines in materials science and fracture analysis, particularly where structured priors like Miller indices offer falsifiable predictions.

major comments (3)

[Abstract] Abstract and Experiments section: the claim of 'reliable' latent inference and 'critical' rejection capability across synthetic, controlled, and real-world data is presented without any quantitative metrics, accuracy rates, error bars, confusion matrices, or statistical significance tests. This absence directly undermines evaluation of the central empirical demonstration.
[Experiments] Experimental evaluation of latent applicability assessment: the setup does not describe controls or ablations that isolate conditioning on the structured latent z = (h,k,l) from exploitation of low-level visual statistics (edge orientation histograms, texture periodicity, or material appearance cues) that may correlate with expected indices. Real-world glass and concrete fractures, which lack crystallographic indexing, make it especially important to rule out prior material knowledge as the driver of rejection behavior.
[Method] Latent inference task description: it is unclear whether the model is prompted or fine-tuned to explicitly use the Miller-index representation during inference or whether success is measured only by final output alignment with ground-truth planes. Without intermediate reasoning traces or counterfactual tests (e.g., mismatched indices), the claim that the latent is actively leveraged remains unverified.

minor comments (2)

[Figures] Figure captions and axis labels in the geometric pair experiments could be clarified to indicate whether the 2D–3D correspondence is provided as input or must be inferred.
[Implementation] The manuscript would benefit from an explicit statement of model versions, prompting strategies, and any fine-tuning details to support reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which identify key opportunities to strengthen the empirical rigor and methodological clarity of the manuscript. We address each major point below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: the claim of 'reliable' latent inference and 'critical' rejection capability across synthetic, controlled, and real-world data is presented without any quantitative metrics, accuracy rates, error bars, confusion matrices, or statistical significance tests. This absence directly undermines evaluation of the central empirical demonstration.

Authors: We agree that explicit quantitative support is necessary to substantiate the claims. The current version presents results primarily through qualitative description and example outputs to emphasize the conceptual demonstration. In the revised manuscript we will add accuracy rates with standard deviations for latent inference on synthetic and controlled 2D–3D pairs, rejection rates and confusion matrices for the applicability assessment task across material classes, and appropriate statistical significance tests. revision: yes
Referee: [Experiments] Experimental evaluation of latent applicability assessment: the setup does not describe controls or ablations that isolate conditioning on the structured latent z = (h,k,l) from exploitation of low-level visual statistics (edge orientation histograms, texture periodicity, or material appearance cues) that may correlate with expected indices. Real-world glass and concrete fractures, which lack crystallographic indexing, make it especially important to rule out prior material knowledge as the driver of rejection behavior.

Authors: This concern about potential confounds is well taken. We will add ablation experiments that compare performance under prompts that explicitly reference the Miller-index latent versus prompts that omit it, as well as tests on images with disrupted low-level cues (e.g., edge-preserved but texture-scrambled versions). For the real-world glass and concrete cases we will include control prompts that withhold material identity and will report that rejection decisions align with geometric inconsistency rather than material priors. revision: yes
Referee: [Method] Latent inference task description: it is unclear whether the model is prompted or fine-tuned to explicitly use the Miller-index representation during inference or whether success is measured only by final output alignment with ground-truth planes. Without intermediate reasoning traces or counterfactual tests (e.g., mismatched indices), the claim that the latent is actively leveraged remains unverified.

Authors: The experiments employ zero-shot prompting in which the model is explicitly instructed to treat z = (h,k,l) as a latent variable governing planar fracture geometry. Success is assessed both by final alignment and by coherence of the generated reasoning. In the revision we will include the exact prompt templates, representative reasoning traces, and new counterfactual experiments that supply mismatched indices to verify that the model’s geometric interpretation adjusts accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical demonstration on held-out data with independent evaluation criteria

full rationale

The paper conducts an empirical evaluation of MLLM capabilities on latent inference and applicability assessment tasks across synthetic, geometric-pair, and real fracture image datasets. No derivation chain, equations, or first-principles claims are present that could reduce to fitted parameters or self-referential definitions. The reported outcomes rest on held-out image performance and rejection behavior rather than any construction that equates predictions to inputs by design. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing elements in the provided text. This is a standard self-contained experimental study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that Miller indices provide a meaningful latent representation for planar fractures under idealized conditions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Miller indices z = (h,k,l) can be formulated as a latent variable governing idealized planar fracture
Explicitly stated in the abstract as the starting formulation for both latent inference and applicability assessment.

pith-pipeline@v0.9.0 · 5704 in / 1258 out tokens · 47319 ms · 2026-05-21T07:42:17.678876+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate Miller indices z = (h, k, l) as a latent variable governing idealized planar fracture... three coordinate axes of the lattice.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 2 internal anchors

[1]

B. D. Cullity and S. R. Stock (2001).Elements of X-Ray Diffraction(3rd ed.). Prentice Hall

work page 2001
[2]

T. L. Anderson (2017).Fracture Mechanics: Fundamentals and Applications(4th ed.). CRC Press

work page 2017
[3]

W. D. Callister and D. G. Rethwisch (2018).Materials Science and Engineering: An Introduction (10th ed.). Wiley

work page 2018
[4]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, et al. (2021). Learning Transferable Visual Models From Natural Language Supervision.International Conference on Machine Learning (ICML)

work page 2021
[5]

H. Liu, C. Li, Q. Wu, and Y . J. Lee (2023). Visual Instruction Tuning.Advances in Neural Information Processing Systems (NeurIPS)

work page 2023
[6]

J. Yang, L. Gao, K. Li, et al. (2023). MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action.International Conference on Machine Learning (ICML)

work page 2023
[7]

GPT-4 Technical Report

OpenAI (2023). GPT-4 Technical Report.arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Gemini: A Family of Highly Capable Multimodal Models

Google DeepMind (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

D. P. Kingma and M. Welling (2014). Auto-Encoding Variational Bayes.International Conference on Learning Representations (ICLR)

work page 2014
[10]

Higgins, L

I. Higgins, L. Matthey, A. Pal, et al. (2017). beta-V AE: Learning Basic Visual Concepts with a Con- strained Variational Framework.International Conference on Learning Representations (ICLR). 12

work page 2017

[1] [1]

B. D. Cullity and S. R. Stock (2001).Elements of X-Ray Diffraction(3rd ed.). Prentice Hall

work page 2001

[2] [2]

T. L. Anderson (2017).Fracture Mechanics: Fundamentals and Applications(4th ed.). CRC Press

work page 2017

[3] [3]

W. D. Callister and D. G. Rethwisch (2018).Materials Science and Engineering: An Introduction (10th ed.). Wiley

work page 2018

[4] [4]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, et al. (2021). Learning Transferable Visual Models From Natural Language Supervision.International Conference on Machine Learning (ICML)

work page 2021

[5] [5]

H. Liu, C. Li, Q. Wu, and Y . J. Lee (2023). Visual Instruction Tuning.Advances in Neural Information Processing Systems (NeurIPS)

work page 2023

[6] [6]

J. Yang, L. Gao, K. Li, et al. (2023). MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action.International Conference on Machine Learning (ICML)

work page 2023

[7] [7]

GPT-4 Technical Report

OpenAI (2023). GPT-4 Technical Report.arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

Gemini: A Family of Highly Capable Multimodal Models

Google DeepMind (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

D. P. Kingma and M. Welling (2014). Auto-Encoding Variational Bayes.International Conference on Learning Representations (ICLR)

work page 2014

[10] [10]

Higgins, L

I. Higgins, L. Matthey, A. Pal, et al. (2017). beta-V AE: Learning Basic Visual Concepts with a Con- strained Variational Framework.International Conference on Learning Representations (ICLR). 12

work page 2017