pith. sign in

arxiv: 2607.02329 · v1 · pith:F3ROECFLnew · submitted 2026-07-02 · 💻 cs.AI · cond-mat.mtrl-sci· physics.comp-ph

Grounded autonomous research: a fault-tolerant LLM pipeline from corpus to manuscript in frontier computational physics

Pith reviewed 2026-07-03 13:38 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.mtrl-sciphysics.comp-ph
keywords autonomous researchLLM pipelinecomputational physicsaltermagnetic piezomagnetismfault toleranceliterature groundingfirst-principles computationscondensed matter
0
0 comments X

The pith

An LLM pipeline produces a publication-grade physics manuscript from 11,083 arXiv papers by mapping the corpus, reproducing references for calibration, and running new computations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a pipeline that lets an LLM agent carry out complete autonomous research in computational physics. It starts with a large corpus of recent condensed-matter papers, identifies a direction, reproduces earlier published results to anchor its methods, performs original first-principles calculations, and assembles the findings into a manuscript. The work is kept grounded by repeated literature consultation across many separate sessions that share only on-disk files. Redundancy from fresh-context isolation and adversarial review supplies fault tolerance, so that errors in any one session are caught by others. Human input is restricted to fixing reproduction failures rather than directing the science.

Core claim

The pipeline runs end-to-end from a corpus of 11,083 recent condensed-matter physics arXiv papers to a publication-grade manuscript with three substantive physics findings on altermagnetic piezomagnetism. The agent autonomously conceives a research direction by mapping the corpus, calibrates methodology by reproducing published references, conducts novel first-principles computations, and writes the manuscript, grounded in literature throughout across 47 fresh-context sessions in six phases sharing only on-disk state, with 2,162 literature-consultation events. Fault tolerance emerges from redundancy: fresh-context isolation, distributed grounding, and adversarial review catch what any single

What carries the argument

The fault-tolerant pipeline that isolates each session in fresh context and requires the agent to reproduce published references before attempting novel first-principles computations.

If this is right

  • Autonomous research becomes possible in domains that require physical reasoning and underdocumented toolchains.
  • Calibration by numerical reproduction of references, rather than internal priors alone, supplies the operative grounding.
  • Pre- and post-pilot stages can run fully autonomously while pilot stages need human help only for operational reproduction failures.
  • The same redundancy pattern can be applied to other high-stakes scientific domains beyond computational physics.
  • Characterized failure modes show that numerical confrontation at calibration checkpoints is what prevents hallucination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reproduction checkpoints could be added to agents working in experimental design or materials discovery where literature anchors exist.
  • The on-disk state sharing across sessions suggests a lightweight way to scale multi-agent scientific workflows without persistent memory.
  • Quantifying the exact number of literature consultations needed may help set budgets for future autonomous systems in other fields.
  • If the reproduction step can be automated further, the human intervention pattern could shrink to near zero.

Load-bearing premise

Reproducing published references in separate fresh-context sessions is enough to keep the agent from generating plausible but unverifiable results during new first-principles calculations.

What would settle it

Independent first-principles calculations or experiments that contradict any of the three new findings reported in the generated manuscript on altermagnetic piezomagnetism.

Figures

Figures reproduced from arXiv: 2607.02329 by Haonan Huang.

Figure 1
Figure 1. Figure 1: Pipeline architecture and literature footprint. (A) Six phases run as 47 fresh-context LLM sessions sharing only on-disk state. The pilot lane iterates a computational-gate plus adversarial-review unit; an iteration-cap hyperparameter upgrades the next review to a transition-planning role rather than continuing iteration (details and trade-off in §3). The bottom band depicts scaffolding access: the curated… view at source ↗
Figure 2
Figure 2. Figure 2: Information flow from corpus to depth programs. Each dot is one arXiv ID; large filled markers are IDs cited in a breadth report or by a depth program, small faint markers are IDs the agent actively retrieved but did not cite. Position: x = regex-classified theme (12 categories cover 99.1% of breadth-cited and 100% of depth-cited IDs); y = continuous arXiv submission month. Bands top-to-bottom: external (d… view at source ↗
Figure 3
Figure 3. Figure 3: Pilot anchor enforcement and fault tolerance. (A) MnTe orbital-magnetization Morb z trajectory shown as ratio to Ye et al. (Ye et al., 2026)’s published value (0.176 µB/cell). The first MnTe orbital-magnetization gate’s HIGH at 2.44× is retracted to LOW by the first adversarial review session, whose sub-agent gap-hunt surfaced the published anchor; subsequent gates close the gap step-by-step at the canonic… view at source ↗
Figure 4
Figure 4. Figure 4: Pilot-stage anchor comparisons in raw units. Each row tracks one published reference (Ye orbital magnetization (Ye et al., 2026), Lopez Fe orbital magnetization (Lopez et al., 2012), V2Te2O magnetization, Mn3NiN multipole, KV2Se2O level split￾ting, CrSb anomalous Hall, Khodas–Mu–Mazin spin piezomag￾netic coefficient (Khodas et al., 2026)); points are pipeline-vs￾literature ratios on a log2 axis at the sess… view at source ↗
Figure 5
Figure 5. Figure 5: Per-phase prompt-mandated workflow of the canonical run pipeline. Pipeline phases run left to right across seven columns; within each column, task cards stack vertically; within each card, prompt-mandated workflow steps stack top-to-bottom with adjacent step boxes touching. Title-bar “×N” denotes the number of canonical sessions of that task type. Iteration arrows: cycle 1 (orange) is the Pilot gate↔Pilot … view at source ↗
Figure 1
Figure 1. Figure 1: FIG. 1. Materials, N´eel-vector configurations, and symmetry-allowed piezomagnetic-response channels for the three altermag [PITH_FULL_IMAGE:figures/full_fig_p030_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. First-principles realization of Bell–Venderbos topological orbital piezomagnetism in CsV [PITH_FULL_IMAGE:figures/full_fig_p032_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. sin(3 [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Brillouin-zone-corner Wannier-gauge instability sets a factor 2–5 systematic on Λ [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Reproduction of the Smolenski multipolar Berry [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. CrSb canted- [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗
read the original abstract

Autonomous-research agents have demonstrated end-to-end LLM automation in machine-learning sandboxes where execution provides calibration. Frontier physical science differs categorically: physical reasoning underlies every methodology choice, toolchains are often underdocumented, and calibration must come from external literature anchors - which unscaffolded agents cite but do not confront, hallucinating plausible, unverifiable results from internal priors. We present a pipeline that runs end-to-end from a corpus of 11,083 recent condensed-matter physics arXiv papers to a publication-grade manuscript with three substantive physics findings (here on altermagnetic piezomagnetism): the agent autonomously conceives a research direction by mapping the corpus, calibrates methodology by reproducing published references, conducts novel first-principles computations, and writes the manuscript - grounded in literature throughout, across 47 fresh-context sessions in six phases sharing only on-disk state, with 2,162 literature-consultation events. Fault tolerance emerges from redundancy: fresh-context isolation, distributed grounding, and adversarial review catch what any single session misses; pre- and post-pilot stages are fully autonomous, and pilot requires bounded human intervention only at reproduction failures - operational knowledge curation, not scientific direction. Two paired failure modes - a pre-architecture baseline and a no-pilot ablation - isolate structurally enforced numerical confrontation at calibration checkpoints as the operative grounding mechanism. The primitives, characterized failure modes, and quantified intervention pattern lay a foundation for autonomous research in high-stakes scientific domains beyond computational physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents a fault-tolerant LLM pipeline that processes a corpus of 11,083 recent condensed-matter physics arXiv papers to autonomously conceive a research direction, calibrate via reference reproductions, perform novel first-principles DFT computations, and produce a publication-grade manuscript containing three substantive findings on altermagnetic piezomagnetism. The system uses 47 fresh-context sessions across six phases (sharing only on-disk state), 2,162 literature-consultation events, and redundancy mechanisms including adversarial review; fault tolerance is demonstrated via paired failure-mode ablations, with human intervention limited to operational curation at reproduction failures.

Significance. If the pipeline's novel computations are independently verifiable and the grounding mechanism demonstrably prevents hallucination of first-principles results, the work would establish a concrete, quantified foundation for autonomous research agents in domains where physical reasoning and external literature anchors are required, moving beyond sandbox ML settings.

major comments (3)
  1. [Abstract] Abstract and § on novel computations: the three claimed substantive physics findings on altermagnetic piezomagnetism are presented as publication-grade outputs of novel DFT calculations, yet no equations, numerical values (e.g., piezomagnetic tensor components, magnetic ordering energies), convergence criteria, or direct comparison to independent codes/experimental literature are supplied; without these, the central claim that the pipeline produces correct novel physics cannot be evaluated.
  2. [Calibration checkpoints] § on calibration checkpoints and fresh-context sessions: reproduction of published references in isolated sessions is asserted to enforce numerical confrontation and grounding, but this provides no external numerical anchor for the subsequent novel first-principles results; the agent could therefore generate internally consistent but physically incorrect values drawn from priors, and the paired ablation studies do not test this specific failure mode for the novel computations.
  3. [Fault tolerance] § on fault tolerance and 2,162 literature-consultation events: the manuscript claims the pipeline is grounded throughout via literature confrontation, yet the description supplies no independent external benchmark or reproduction of the three novel findings themselves; success therefore risks reducing to the agent's internal priors once reference reproduction is complete.
minor comments (2)
  1. [Abstract] The abstract and methods description would benefit from explicit listing of the six phases and the precise on-disk state-sharing protocol to allow replication.
  2. [Corpus processing] Clarify whether the 11,083-paper corpus is used only for direction mapping or also for ongoing grounding during novel computations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and substantive review. We respond to each major comment below, focusing on the manuscript's scope as a demonstration of the pipeline architecture and its grounding mechanisms.

read point-by-point responses
  1. Referee: [Abstract] Abstract and § on novel computations: the three claimed substantive physics findings on altermagnetic piezomagnetism are presented as publication-grade outputs of novel DFT calculations, yet no equations, numerical values (e.g., piezomagnetic tensor components, magnetic ordering energies), convergence criteria, or direct comparison to independent codes/experimental literature are supplied; without these, the central claim that the pipeline produces correct novel physics cannot be evaluated.

    Authors: The manuscript's primary contribution is the end-to-end pipeline and its quantified fault-tolerance properties rather than a standalone physics report. The three findings serve as an existence demonstration that the pipeline can generate novel, publication-grade content. We agree that the absence of specific numerical values, equations, and comparisons prevents direct evaluation of the physics outputs from this text. In the revised manuscript we will add an appendix containing the key DFT results (tensor components, energies, convergence parameters) together with literature comparisons. revision: yes

  2. Referee: [Calibration checkpoints] § on calibration checkpoints and fresh-context sessions: reproduction of published references in isolated sessions is asserted to enforce numerical confrontation and grounding, but this provides no external numerical anchor for the subsequent novel first-principles results; the agent could therefore generate internally consistent but physically incorrect values drawn from priors, and the paired ablation studies do not test this specific failure mode for the novel computations.

    Authors: Reference reproductions in fresh-context sessions establish that the agent can perform accurate numerical confrontation when external anchors are available. The same literature-consultation protocol (2,162 events) continues through the novel-computation phase, supplying methodology and expected-behavior anchors for the new calculations. The paired ablations isolate the effect of removing calibration checkpoints on overall output consistency. We acknowledge that the ablations do not directly probe hallucination on the specific novel results, because ground-truth values for those results are not known a priori; we will add explicit discussion of this scope limitation. revision: partial

  3. Referee: [Fault tolerance] § on fault tolerance and 2,162 literature-consultation events: the manuscript claims the pipeline is grounded throughout via literature confrontation, yet the description supplies no independent external benchmark or reproduction of the three novel findings themselves; success therefore risks reducing to the agent's internal priors once reference reproduction is complete.

    Authors: Literature confrontation is applied continuously, including during novel computations and manuscript drafting, with adversarial review and fresh-context isolation providing additional safeguards. The ablations demonstrate that outputs diverge when these mechanisms are removed. We do not supply an independent external reproduction of the novel findings, as that would constitute a separate verification study outside the pipeline demonstration. We will expand the text to clarify how the distributed grounding and redundancy mechanisms extend beyond the calibration phase. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an autonomous pipeline that maps a corpus, reproduces published references for calibration in fresh-context sessions, then performs novel first-principles computations on altermagnetic piezomagnetism before writing a manuscript. No quoted step reduces a claimed physics finding or pipeline success metric to its inputs by construction (e.g., no fitted parameter renamed as prediction, no self-definitional loop where the result is defined in terms of the reproduction outputs, and no load-bearing self-citation chain). The grounding mechanism is presented as external literature confrontation at checkpoints, with the novel results positioned as independent first-principles output rather than a statistical or definitional consequence of the calibration phase. The central claim therefore retains independent content outside any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that literature-anchored calibration checkpoints can enforce numerical confrontation in novel computations; no free parameters or invented physical entities are introduced, but the pipeline itself is a new constructed system whose effectiveness is asserted without external verification in the abstract.

axioms (1)
  • domain assumption Reproducing published references in isolated sessions is sufficient to calibrate the agent for subsequent novel first-principles computations without hallucination.
    Stated in the abstract as the mechanism that distinguishes the pipeline from unscaffolded agents that cite but do not confront literature.
invented entities (1)
  • Fault-tolerant LLM pipeline with 47 fresh-context sessions across six phases and 2,162 literature-consultation events no independent evidence
    purpose: To enable autonomous corpus-to-manuscript research while maintaining grounding
    New system architecture introduced by the paper; no independent evidence outside the described runs is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5802 in / 1541 out tokens · 57863 ms · 2026-07-03T13:38:49.336580+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 34 canonical work pages · 8 internal anchors

  1. [1]

    Evaluating Large Language Models Trained on Code

    arXiv:2107.03374. Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. ChemCrow: Augmenting large-language models with chemistry tools. Nature Machine Intelligence, 2024. arXiv:2304.05376. Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist: Towards fully automa...

  2. [2]

    From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven Computational Physics

    doi: 10.1088/1361-648X/aa8f79. Giovanni Pizzi, Valerio Vitale, Ryotaro Arita, Stefan Bl¨ugel, Frank Freimuth, Guillaume G´eranton, et al. Wannier90 as a community code: new features and applications.J. Phys.: Condens. Matter, 32:165902, 2020. doi: 10.1088/ 1361-648X/ab51ff. Stepan S. Tsirkin. High performance Wannier interpolation of Berry curvature and r...

  3. [3]

    grounded throughout

    doi: 10.1103/PhysRevX.14.011019. S. Hayami et al. Multipole framework for altermagnetic or- der parameters.arXiv preprint, 2025. arXiv:2512.17587. I. I. Mazin. Altermagnetism in MnTe: origin, predicted man- ifestations, and routes to detwinning.Phys. Rev. B, 107: L100418, 2023. doi: 10.1103/PhysRevB.107.L100418. arXiv:2301.08573. H. Radhakrishnan, B. Bell...

  4. [4]

    **Themes** -- dominant directions, with representative arXiv IDs and a sense of activity level

  5. [5]

    **Emerging patterns ** -- new bursts, methodology shifts, anomalies

  6. [6]

    **Underrepresented directions ** -- scattered but substantive , low competition

  7. [7]

    **Gaps** -- where a first-principles study could contribute

  8. [8]

    I read N papers in this area

    **Candidate directions for deep-dive ** -- a ranked shortlist (5--10) of directions worth dedicated investigation. Rank by scientific potential, not by heat. Heat and novelty both matter; explicitly note which drives each ranking. Use arXiv IDs inline as evidence. Do not fabricate citations. For each claim, distinguish between "I read N papers in this are...

  9. [9]

    Include dropped candidates and why they were dropped

    **Candidates considered ** -- the 2--3 directions you screened, with pipeline sketch and feasibility outcome for each. Include dropped candidates and why they were dropped

  10. [10]

    **Direction and rationale ** -- which candidate you committed to, why, how it differs from ‘chosen_topics.md‘ entries

  11. [11]

    **Literature foundation ** -- full-text papers read, 1--2 lines each on what you took from them

  12. [12]

    **Main research question ** -- specific, first-principles- addressable

  13. [13]

    **Sub-questions** (2--4) -- concrete enough for a pilot agent to estimate cost

  14. [14]

    **Fallback path ** -- if the main question fails, what salvageable story remains?

  15. [15]

    **Prior work + novelty ** -- per question: relevant prior work (URL, title, year, 1-line takeaway) + differentiation

  16. [16]

    **Computational feasibility ** -- per pipeline stage: named tool, has it been demonstrated on a similar system, hardware estimate

  17. [17]

    **Expected deliverable ** -- what the paper would argue, one paragraph

  18. [18]

    might work

    **Computational footprint ** -- classes of calculation, system sizes, 24h/48GB plausibility. ## Begin Read ‘chosen_topics.md‘, then breadth reports. Start with candidate generation and feasibility screening -- do NOT deep-read literature until a candidate has passed both gates. **No time limit, no token limit. ** The program you conceive here determines t...

  19. [19]

    ** What quantitative target was missed, and by how much?

    **Identify the gap precisely. ** What quantitative target was missed, and by how much?

  20. [20]

    ** What did the reproduce/ pilot_gate agents say caused the deviation? Do you agree with their reasoning? What alternative explanations exist ?

    **Review prior diagnoses. ** What did the reproduce/ pilot_gate agents say caused the deviation? Do you agree with their reasoning? What alternative explanations exist ?

  21. [21]

    **List concrete possibilities ** for closing the gap -- with your assessment of how likely each is to work and why

  22. [22]

    ** For each gap, search documentation, forums, mailing lists, tutorials, examples , and GitHub issues for the relevant tools and physics

    **REQUIRED: Search online. ** For each gap, search documentation, forums, mailing lists, tutorials, examples , and GitHub issues for the relevant tools and physics. Many problems in computational physics are well-known community issues with documented solutions. **Do this search before deciding what to work on -- it may completely change your assessment o...

  23. [23]

    **Assess tractability. ** Is there a clearly promising path, or has this been extensively attempted across multiple sessions without convergence? Do not go down rabbit holes -- if prior sessions have made extensive, well-reasoned attempts and the gap remains, it may be genuinely hard. Prioritize gaps where you see a concrete, evidence-based path to T4 ove...

  24. [24]

    A confidence table (one row per production pipeline stage)

  25. [25]

    For each non-HIGH stage: the full analysis above (gap, prior diagnoses, your assessment, online research findings, possibilities, tractability)

  26. [26]

    continue

    Overall verdict: **PASS** (all production-critical stages at HIGH) or **NOT PASS ** (list gaps ranked by combined impact x tractability). **If PASS, stop here. ** Write ‘pilot_gate_report.md‘ summarizing why and finalize. **If NOT PASS, proceed to Phase 2. ** ### Phase 2: Pick ONE targeted project From your NOT PASS gaps, pick **the single gap that best c...

  27. [27]

    Your custom code (conventions, signs, units, prefactors)

  28. [28]

    Your input parameters (compare against paper parameter by parameter)

  29. [29]

    Tool version or configuration mismatch

  30. [30]

    before" snapshot; the report is the

    The paper itself (last resort -- only after exhausting 1-3). Use the internet for physics conventions and tool documentation (house rules ?10). ### Custom script discipline Custom scripts require ?3 independent validations on published reference values before their outputs enter the verdict. Document validations in worklog. ### Phase 4: Updated assessment...

  31. [31]

    ** If production requires a pipeline step that has never been validated, or a quantity that was computed but is physically unsupported, this must be addressed

    **Big gaps that block production. ** If production requires a pipeline step that has never been validated, or a quantity that was computed but is physically unsupported, this must be addressed. New calculations are justified here

  32. [33]

    active access

    The three-way intersection (consensus core) is 27 IDs. The intersection-over-union ratio is 27/877 = 3.1% . By disjointness, 80% of the union (701 of 877) appears in ex- actly one report, confirming that the three breadth agents are 80% complementary rather than redundant. Concrete lineage examples.Three cases illustrate how single-channel breadth surfaci...

  33. [34]

    Codi- fies the recipe-replication anti-pattern observed in early pilot reproductions

    Pipeline sanity check rule.Verify basic physical quanti- ties at each calculation step before proceeding (e.g., does NSCF carry the SCF Hubbard card; do magnetic mo- ments survive the SCF→NSCF re-initialization). Codi- fies the recipe-replication anti-pattern observed in early pilot reproductions

  34. [35]

    Mandatory

    Wannier validation protocol.Mandates an ex- plicit fatband ( projwfc) diagnostic before basis de- sign (Step C, identified as “Mandatory” in house rules) and a dis froz max plateau scan before any Berry-observable production. Codifies the orbital- magnetization calibration trajectory’s discovered proto- col

  35. [36]

    a single dfroz value is NEVER sufficient justifi- cation

    Hubbard U non-transferability clause.Requires re- derivation when copying U values across DFT codes or projector conventions; the same nominal U produces different effective correlation depending on the projector, so a value calibrated in one code does not transfer cleanly to another. Reproducibility.The full INDEX.md and PSEUDOPOTENTIALS.md are avail- ab...

  36. [37]

    ˇSmejkal, J

    L. ˇSmejkal, J. Sinova, and T. Jungwirth, Phys. Rev. X 12, 031042 (2022), arXiv:2105.05820

  37. [38]

    ˇSmejkal, J

    L. ˇSmejkal, J. Sinova, and T. Jungwirth, Phys. Rev. X 12, 040501 (2022), arXiv:2204.10844

  38. [39]

    ˇSmejkal, A

    L. ˇSmejkal, A. H. MacDonald, J. Sinova, S. Nakat- suji, and T. Jungwirth, Nat. Rev. Mater.7, 482 (2022), arXiv:2107.03321

  39. [40]

    Jungwirth, J

    T. Jungwirth, J. Sinova, R. M. Fernandes, Q. Liu, H. Watanabe, S. Murakami, S. Nakatsuji, and L. ˇSmejkal, Nature 10.1038/s41586-025-09883-2 (2026), arXiv:2506.22860

  40. [41]

    P. G. Radaelli, Phys. Rev. B110, 214428 (2024), arXiv:2407.13548

  41. [42]

    Sheoran and P

    S. Sheoran and P. Dev, Phys. Rev. B111, 184407 (2025), arXiv:2502.21095

  42. [43]

    Khodas, S

    M. Khodas, S. Mu, I. I. Mazin, and K. D. Belashchenko, Phys. Rev. B113, 104422 (2026), arXiv:2506.06257

  43. [44]

    Takahashi, C

    K. Takahashi, C. R. W. Steward, M. Ogata, R. M. Fer- nandes, and J. Schmalian, Phys. Rev. B111, 184408 (2025), arXiv:2502.03517

  44. [45]

    Topological piezomagnetic effect in two-dimensional Dirac quadrupole altermagnets

    H. Radhakrishnan, B. Bell, C. Ortix, and J. W. F. Venderbos, arXiv preprint (2026), arXiv:2602.05894

  45. [46]

    Bell and J

    B. Bell and J. W. F. Venderbos, arXiv preprint (2026), arXiv:2602.10076

  46. [47]

    Smolenski, N

    S. Smolenski, N. Mao, D. Zhang, Y. Guo, A. K. M. A. Shawon, M. Xu, E. Downey, T. Musall, M. Yi, W. Xie, C. Jozwiak, A. Bostwick, N. Tamura, E. Rotenberg, L. Li, K. Sun, Y. Zhang, and N. H. Jo, arXiv preprint (2025), arXiv:2509.21481

  47. [48]

    Strain continuously rotates the N\'eel vector in altermagnetic MnTe

    A. Liebman-Pel´ aez, J. Kruppe, R. B. Regmi, N. J. Ghimire, Y. Sun, I. I. Mazin, H. M. L. Noad, J. Ana- lytis, V. Sunko, and J. Orenstein, arXiv preprint (2026), arXiv:2604.07653

  48. [49]

    Kimura, H

    S.-i. Kimura, H. Suwa, K. Yuan, H. Watanabe, T. Naka- mura, H. K. Yun, and M.-H. Jung, arXiv preprint (2026), arXiv:2603.21455

  49. [50]

    W. Yang, C. Won, C. Cress, M. Z. Franklin, X. Fang, S. Fields, N. Combs, S. Han, W. Lu, S. P. Bennett, S.-W. Cheong, and J. Xia, arXiv preprint (2026), arXiv:2604.21021

  50. [51]

    S. Bey, S. S. Fields, N. G. Combs, B. G. M´ arkus, J. Wang, L. Schmidt, L. Curtis, A. Dodd-Noble, A. Poulin, S. M. Shahed, R. Regmi, M. Holub, P. Ohresser, A. Bansil, H. Ambaye, V. Lauter, L. Forr´ o, C. D. Cress, J. C. Prestigiacomo, N. Ghimire, A. de la Torre, S. P. Ben- nett, X. Liu, and B. A. Assaf, arXiv preprint (2026), arXiv:2603.00242

  51. [52]

    R. D. Gonzalez Betancourt, J. Zub´ aˇ c, R. Gonzalez- Hernandez, K. Geishendorf, Z. ˇSob´ aˇ n, G. Springholz, 38 13 K. Olejn ´ ık, L.ˇSmejkal, J. Sinova, T. Jungwirth, S. T. B. Goennenwein, A. Thomas, H. Reichlov´ a, J. ˇZelezn´ y, and D. Kriegner, Phys. Rev. Lett.130, 036702 (2023), arXiv:2112.06805

  52. [53]

    O. J. Amin, A. Dal Din, E. Golias, Y. Niu, A. Zakharov, S. C. Fromage, C. J. B. Fields, S. L. Heywood, R. B. Cousins, J. Krempasky, J. H. Dil, D. Kriegner, B. Kiraly, R. P. Campion, A. W. Rushforth, K. W. Edmonds, S. S. Dhesi, L. ˇSmejkal, T. Jungwirth, and P. Wadley, Nature 636, 348 (2024), arXiv:2405.02409

  53. [54]

    Y. Zhao, S. Mandal, C.-X. Liu, and B. Yan, arXiv preprint (2026), arXiv:2603.12259

  54. [55]

    W. Chen, Z. Zhou, J. Meng, W. Wang, Y. Yang, and Z. Li, arXiv preprint (2026), arXiv:2601.02913

  55. [56]

    Z. Zhou, X. Cheng, M. Hu, R. Chu, H. Bai, L. Han, J. Liu, F. Pan, and C. Song, Nature638, 645 (2025)

  56. [57]

    T. Yu, I. Shahid, P. Liu, D.-F. Shao, X.-Q. Chen, and Y. Sun, npj Quantum Mater.10, 47 (2025), arXiv:2412.12882

  57. [58]

    Jiang, M

    B. Jiang, M. Hu, J. Bai, Z. Song, C. Mu, G. Qu, W. Li, W. Zhu, H. Pi, Z. Wei, Y. Sun, Y. Huang, X. Zheng, Y. Peng, L. He, S. Li, J. Luo, Z. Li, G. Chen, H. Li, H. Weng, and T. Qian, Nat. Phys.21, 754 (2025), arXiv:2408.00320

  58. [59]

    Thapa, P .-H

    B. Thapa, P.-H. Chang, K. Belashchenko, and I. I. Mazin, arXiv preprint (2026), arXiv:2602.18672

  59. [60]

    Sunet al., Phys

    Y. Sunet al., Phys. Rev. B112, 184416 (2025)

  60. [61]

    Guo and Y

    S.-D. Guo and Y. Liu, arXiv preprint (2026), arXiv:2603.25136

  61. [62]

    C. C. Ye, K. Tenzin, J. S/suppress lawi´ nska, and C. Autieri, Phys. Rev. B113, 014413 (2026), arXiv:2505.08675

  62. [63]

    I. I. Mazin, Phys. Rev. B107, L100418 (2023), arXiv:2301.08573

  63. [64]

    Magnetic anisotropy in antiferromagnetic hexagonal MnTe

    D. Kriegner, H. Reichlova, J. Grenzer, W. Schmidt, E. Ressouche, J. Godinho, T. Wagner, S. Y. Martin, A. B. Shick, V. V. Volobuev, G. Springholz, V. Hol´ y, J. Wunderlich, T. Jungwirth, and K. V´ yborn´ y, Phys. Rev. B96, 214418 (2017), arXiv:1710.08523

  64. [65]

    Supplemental Material, Supplemental material (2026)

  65. [66]

    S. S. Tsirkin, npj Comput. Mater.7, 33 (2021), arXiv:2008.07992

  66. [67]

    Giannozzi, O

    P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. Buongiorno Nardelli, M. Calandra, R. Car, C. Cavaz- zoni, D. Ceresoli, M. Cococcioni, N. Colonna, I. Carn- imeo, A. Dal Corso, S. de Gironcoli, P. Delugas, R. A. DiStasio, A. Ferretti, A. Floris, G. Fratesi, G. Fugallo, R. Gebauer, U. Gerstmann, F. Giustino, T. Gorni, J. Jia, M. Kawamura, H.-Y. Ko, A. Ko...

  67. [68]

    Pizzi, V

    G. Pizzi, V. Vitale, R. Arita, S. Bl¨ ugel, F. Freimuth, G. G´ eranton, M. Gibertini, D. Gresch, C. Johnson, T. Koretsune, J. Iba˜ nez Azpiroz, H. Lee, J.-M. Lihm, D. Marchand, A. Marrazzo, Y. Mokrousov, J. I. Mustafa, Y. Nohara, Y. Nomura, L. Paulatto, S. Ponc´ e, T. Pon- weiser, J. Qiao, F. Th¨ ole, S. S. Tsirkin, M. Wierzbowska, N. Marzari, D. Vanderbi...

  68. [69]

    M. G. Lopez, D. Vanderbilt, T. Thonhauser, and I. Souza, Phys. Rev. B85, 014435 (2012), arXiv:1112.1938

  69. [70]

    Thonhauser, D

    T. Thonhauser, D. Ceresoli, D. Vanderbilt, and R. Resta, Phys. Rev. Lett.95, 137205 (2005)

  70. [71]

    Ceresoli, T

    D. Ceresoli, T. Thonhauser, D. Vanderbilt, and R. Resta, Phys. Rev. B74, 024408 (2006)

  71. [72]

    D. Xiao, J. Shi, and Q. Niu, Phys. Rev. Lett.95, 137204 (2005), arXiv:cond-mat/0502340

  72. [73]

    M. J. van Setten, M. Giantomassi, E. Bousquet, M. J. Verstraete, D. R. Hamann, X. Gonze, and G.-M. Rig- nanese, Comput. Phys. Commun.226, 39 (2018)

  73. [74]

    J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett.77, 3865 (1996)

  74. [75]

    A. I. Liechtenstein, V. I. Anisimov, and J. Zaanen, Phys. Rev. B52, R5467 (1995)

  75. [76]

    Lukashev, R

    P. Lukashev, R. F. Sabirianov, and K. Belashchenko, Phys. Rev. B78, 184414 (2008)

  76. [77]

    Huang, Grounded autonomous research: a fault- tolerant LLM pipeline from corpus to manuscript in fron- tier computational physics (2026), ICML 2026 AI for Sci- ence Workshop

    H. Huang, Grounded autonomous research: a fault- tolerant LLM pipeline from corpus to manuscript in fron- tier computational physics (2026), ICML 2026 AI for Sci- ence Workshop. 39