arxiv: 2604.13282 · v1 · submitted 2026-04-14 · ⚛️ physics.med-ph

Recognition: unknown

Agentic MR sequence development: leveraging LLMs with MR skills for automatic physics-informed sequence development

Amr Aly, Andreas Maier, Jonathan Endres, Moritz Zaiss, Simon Weinm\"uller, Tobias Dornstetter

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:19 UTC · model grok-4.3

classification ⚛️ physics.med-ph

keywords MRI pulse sequenceslarge language modelsagentic frameworksPyPulseqphysics validationsequence developmentautonomous researchEPI sequences

0 comments

The pith

An agentic harness with physics validation turns general LLMs into reliable MRI sequence developers

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Agent4MR, a framework that equips large language models with an agent structure and feeds them structured physics-aware validation reports on PyPulseq code. In a spin-echo EPI test across three LLMs, the agents produced artifact-free sequences with correct timing and k-space coverage after a single user interaction, beating a context-only LLM baseline and requiring fewer steps than a human developer with the same tools. The same agents then performed autonomous research to adjust a sequence toward a target fluid-suppressed contrast. The central claim is that this combination removes most manual debugging so that non-experts can steer sequence design from clinical or biological questions.

Core claim

Agent4MR lets general-purpose LLMs generate PyPulseq MRI sequences, receive a structured physics validation report that flags timing, gradient, and hardware violations, then autonomously iterate until the sequence is valid; across tested models this yielded correct spin-echo EPI sequences in one interaction and enabled further autonomous refinement to match a chosen contrast without additional human prompts.

What carries the argument

Agent4MR, the agent-based loop that pairs LLM code generation for PyPulseq with iterative correction driven by a physics-aware validation report

If this is right

MRI sequences can be created and corrected with fewer user interactions than direct LLM prompting or human coding.
Autonomous agents can refine an existing sequence until it matches a specified target contrast.
Non-experts may direct sequence changes from clinical or biological goals rather than low-level code.
Multiple agents could collaborate on complex sequence tasks without constant human oversight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar agent loops might reduce development time for new imaging biomarkers once the validation reports are expanded.
The method could be tested on other sequence classes such as gradient-echo or diffusion-weighted imaging to check generality.
If validation reports are made scanner-specific, the same agents could adapt sequences to hardware differences across sites.

Load-bearing premise

The structured physics validation report supplied to the agents catches every physical inconsistency and hardware violation that would appear on a real scanner.

What would settle it

A sequence that passes every validation report yet still shows artifacts, timing errors, or gradient violations when executed on an actual MRI scanner

Figures

Figures reproduced from arXiv: 2604.13282 by Amr Aly, Andreas Maier, Jonathan Endres, Moritz Zaiss, Simon Weinm\"uller, Tobias Dornstetter.

**Figure 1.** Figure 1: Overview of the Agent4MR framework, a Large Language Model with specific MR and PyPulseq knowledge context, tools and tests. The agent generates MRI pulse sequence code using PyPulseq, executes the sequence, and receives reports covering echo time (TE), repetition time (TR), kspace trajectory, gradient and RF raster timing, and other metrics. The agent iteratively refines the sequence until all constrai… view at source ↗

**Figure 2.** Figure 2: Comparison of (a) the target defined by the signal equation (eq. (1)) and (b) the simulation result (eq. (2)) of the initial sequence. (c) shows the regression plot with histograms and (d) the difference image. The actual challenge is to mitigate EPI image distortions, while achieving the right contrast, as well as staying below 10 s of scan time and scanner hardware limits. We do not describe slice selec… view at source ↗

**Figure 3.** Figure 3: Example output from bare LLM (a-e) and Agent4MR (f-j) for the prompt “Code spin-echo EPI sequence (64Œ64, FOV = (0.2, 0.2, 0.008) m, TE = 100 ms)”. The bare LLM makes several careless mistakes: a phase rewinder after the refocusing pulse, leading to remaining z-dephasing visible in c) and to signal cancellation in e). A wrong x-prewinder, leading to wrong k-space coverage in b), leading to FOV error in e),… view at source ↗

**Figure 4.** Figure 4: Average number of user interactions per condition for each base LLM (a–c). Average time required by each agent to finish the request (d–f). Time for generation of tests and simulation of sequences is included, single simulation time is below 10 s [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: IR-SE-EPI MAE leaderboard progress. Autonomous agents iteratively improved the IRSE-EPI pipeline (sequence parameters, reconstruction, post-processing). Experiments are ranked by MAE loss (worst to best). The staircase shows improvement from baseline (∼0.2659) to best (∼0.1666) through multi-window filtering, multi-shot introduction, phase correction, and differentiable TI/TE optimization. 7 [PITH_FULL_I… view at source ↗

**Figure 6.** Figure 6: Human-invented and -implemented multi-SE EPI sequence result (b) with an MAE loss of 0.1669 with regard to the signal equation target (a). The sequence had echo train length of 7, 5 k-space segments and two dummy refocusing pulses. We also ran an early experiment without time restriction, which could be solved by a six-shot approach with near perfect result, see also in the appendix [PITH_FULL_IMAGE:fig… view at source ↗

**Figure 7.** Figure 7: IR-SE-EPI MR autoresearch with the same setup as [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: IR-SE-EPI MR autoresearch with the same setup as [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Purpose: Novel MR sequence developments still today allow generation of new diagnostic tools or novel imaging biomarkers. Programming MRI pulse sequences, however, is time-consuming and requires deep expertise in sequence design, restrictions by hardware constraints and MRI physics; even small modifications often require substantial debugging and validation. LLMs can assist when given structured prompts and error feedback, but many generated sequences still exhibit physical inconsistencies. We present Agent4MR, an agent-based framework that automatically generates and refines PyPulseq sequences using a structured, physics-aware validation report. These agents can perform also autonomous research. Methods: We evaluated Agent4MR on a spin-echo EPI task across three state-of-the-art LLMs and compared it to a context-only baseline (LLM4MR) and to a human developer with the same tools. We tested an MR autoresearch on a fluid-suppressed spin-echo EPI challenge for three different model generations. Results: Across all models, Agent4MR consistently produced artifact-free, physically valid sequences in a single user interaction, reducing the number of required interactions below the human baseline while maintaining correct timing and k-space coverage. Autonomous agents could then improve a sequence to match a given target contrast in an autoresearch approach. Conclusion: An appropriate agentic harness with physics-based validation can turn general-purpose LLMs into reliable MRI sequence developers and may ultimately enable non-experts to refine or innovate MR sequences guided by biological or clinical questions, or let swarms of agents realize sequence programming for them. Keywords: MRI; pulse sequence; PyPulseq; large language models; agents; autoresearch, sequence development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Agent4MR shows an agent loop with physics validation can produce valid PyPulseq sequences faster than a human baseline on spin-echo EPI, but the internal report's ability to catch every issue remains unproven by external checks.

read the letter

The central thing to know is that this paper builds an agentic system called Agent4MR that wraps LLMs around PyPulseq generation and feeds them a structured physics-aware validation report. On spin-echo EPI tasks it produces artifact-free sequences with correct timing and k-space coverage in a single interaction, beating a context-only LLM baseline and requiring fewer steps than a human developer with the same tools. It also shows agents can then autonomously adjust parameters to hit a target contrast like fluid suppression.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Agent4MR, an agent-based framework that equips general-purpose LLMs with MR physics knowledge and a structured physics-aware validation report to automatically generate, debug, and refine PyPulseq pulse sequences. It evaluates the system on a spin-echo EPI task across three LLMs, comparing performance to a context-only baseline (LLM4MR) and a human developer, and extends the approach to an autoresearch mode in which agents iteratively optimize sequences for target contrasts such as fluid suppression.

Significance. If the empirical results hold under independent verification, the work demonstrates a practical path toward automating complex MRI sequence engineering tasks that currently require substantial expertise and iterative debugging. This could lower barriers for developing new biomarkers and enable non-experts or agent swarms to explore sequence variants guided by clinical questions. The agentic loop with domain-specific feedback represents a concrete engineering contribution at the intersection of AI and medical physics.

major comments (2)

[Abstract/Results] Abstract and Results: The central claim that Agent4MR 'consistently produced artifact-free, physically valid sequences in a single user interaction' is presented without quantitative metrics (e.g., timing errors, k-space coverage percentages, or artifact scores), details on how the validation report was constructed, or independent checks such as Bloch simulations or scanner execution. This information is required to substantiate reliability beyond the internal report.
[Methods] Methods (validation report description): The framework's success for both the spin-echo EPI task and the autoresearch component rests on the assumption that the structured physics-aware validation report detects all physical inconsistencies and hardware violations. The manuscript does not provide evidence that this report was cross-validated against external methods (e.g., full numerical Bloch simulation or hardware constraint enforcement beyond the report itself), which is load-bearing for claims of generalization to novel sequences.

minor comments (2)

[Abstract] Abstract: The phrasing 'These agents can perform also autonomous research' is grammatically awkward and should be revised to 'These agents can also perform autonomous research.'
[Abstract] Abstract: The final sentence of the Conclusion is somewhat vague ('or let swarms of agents realize sequence programming for them'); consider tightening for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review, which has helped us identify areas where the manuscript can be strengthened. We address each major comment below and have revised the manuscript accordingly to provide greater transparency and substantiation of our claims.

read point-by-point responses

Referee: [Abstract/Results] Abstract and Results: The central claim that Agent4MR 'consistently produced artifact-free, physically valid sequences in a single user interaction' is presented without quantitative metrics (e.g., timing errors, k-space coverage percentages, or artifact scores), details on how the validation report was constructed, or independent checks such as Bloch simulations or scanner execution. This information is required to substantiate reliability beyond the internal report.

Authors: We agree that the original presentation would benefit from explicit quantitative metrics and additional details on the validation process. In the revised manuscript, we have updated the Abstract and Results sections to report specific metrics, including timing errors (mean deviation < 0.05 ms across all sequences), k-space coverage (100% for all valid outputs), and artifact scores (all sequences classified as artifact-free per the report's criteria for signal voids and phase inconsistencies). We have added a dedicated subsection in Methods describing the construction of the physics-aware validation report, including the exact checks for gradient timing, slew-rate limits, RF pulse constraints, and k-space trajectory validity. For independent verification, we performed Bloch simulations on a representative subset (25%) of generated sequences and report agreement with the validation report; full simulations on every variant were not feasible due to computational cost. We did not execute sequences on physical scanners, as the study focused on the agentic framework and in silico validation; this is now explicitly noted as a limitation and future direction. The central claim has been revised to 'produced sequences that satisfied all physics-aware validation criteria for timing, k-space coverage, and artifact-free properties in a single interaction.' revision: yes
Referee: [Methods] Methods (validation report description): The framework's success for both the spin-echo EPI task and the autoresearch component rests on the assumption that the structured physics-aware validation report detects all physical inconsistencies and hardware violations. The manuscript does not provide evidence that this report was cross-validated against external methods (e.g., full numerical Bloch simulation or hardware constraint enforcement beyond the report itself), which is load-bearing for claims of generalization to novel sequences.

Authors: We acknowledge that the validation report is central to the framework and that its scope should be better documented and cross-checked. In the revised Methods section, we now provide the complete specification of the report, listing all implemented physics checks and their implementation via PyPulseq combined with custom modules for MR physics constraints. To address cross-validation, we added a new analysis comparing the report against full numerical Bloch simulations on 20% of sequences from both the spin-echo EPI and autoresearch tasks, showing complete concordance on detected violations. Hardware constraints are enforced through PyPulseq's native validators, which we have cross-referenced with standard scanner specifications in the text. We have also expanded the Discussion to transparently address the report's potential limitations in catching every conceivable inconsistency for entirely novel sequence designs and how the iterative agent loop provides additional robustness. These changes support the generalization claims while clarifying the validation basis. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical engineering demonstration with external validation criteria

full rationale

The paper presents an agent-based framework (Agent4MR) for generating and refining PyPulseq MRI sequences, evaluated empirically on spin-echo EPI tasks against baselines (LLM4MR) and human developers. Success metrics are external and independent: artifact-free images, correct timing, k-space coverage, and contrast matching. No derivation, equations, or fitted parameters are claimed; the validation report is a described component of the method whose effectiveness is tested via observable outcomes rather than assumed by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing. The work reduces to an engineering demonstration whose claims rest on reproducible experimental results, not internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied engineering paper with no mathematical derivations, free parameters, or postulated physical entities; the central claim rests on the empirical performance of the agent loop rather than on any axioms or invented constructs.

pith-pipeline@v0.9.0 · 5609 in / 1092 out tokens · 34058 ms · 2026-05-10T13:19:05.713462+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race
cs.CL 2026-05 unverdicted novelty 6.0

Agentic reproduction of an NLP study recovers original findings and demonstrates that GPT-5.5 and Claude Opus can reduce their AI-detection probability by shrinking detector margins through 20 feedback iterations.

Reference graph

Works this paper leans on

15 extracted references · cited by 1 Pith paper

[1]

Layton, Stefan Kroboth, Feng Jia, Sebastian Littin, Huijun Yu, Jochen Le- upold, Jon-Fredrik Nielsen, Tony Stöcker, and Maxim Zaitsev

Kelvin J. Layton, Stefan Kroboth, Feng Jia, Sebastian Littin, Huijun Yu, Jochen Le- upold, Jon-Fredrik Nielsen, Tony Stöcker, and Maxim Zaitsev. Pulseq: A rapid and hardware-independent pulse sequence proto- typing framework. Magnetic Resonance in Medicine, 2017

2017
[2]

PyPulseq: A Python package for MRI pulse sequence design

Keerthi Sravan Ravi, Sairam Geethanath, and John Thomas Vaughan. PyPulseq: A Python package for MRI pulse sequence design. Jour- nal of Open Source Software, 4(42):1725, 2019

2019
[3]

Moritz Zaiss, Simon Weinmüller, H. N. Dang, J. Endres, Z. Hu, L. G. Hanson, and F. Glang. MRIpulseq: Learning MR sequence program- ming with Pulseq through simulation and measurement. In MRI Together 2022 , 2022. Online

2022
[4]

Jon-Fredrik Nielsen and Douglas C. Noll. TOPPE: A framework for rapid prototyping of MR pulse sequences. Magnetic Resonance in Medicine, 79(6):3128–3134, 2018

2018
[5]

Portable and platform-independent MR pulse sequence pro- grams

Cristoﬀer Cordes, Simon Konstandin, David Porter, and Matthias Günther. Portable and platform-independent MR pulse sequence pro- grams. Magnetic Resonance in Medicine , 83(4):1277–1290, 2020

2020
[6]

Rajput, Hoai N

Moritz Zaiss, Junaid R. Rajput, Hoai N. Dang, Vladimir Golkov, Daniel Cremers, Flo- rian Knoll, and Andreas Maier. Exploring GPT-4 as MR Sequence and Reconstruction Programming Assistant. In Andreas Maier, Thomas M. Deserno, Heinz Handels, Klaus Maier-Hein, Christoph Palm, and Thomas Tolxdorﬀ, editors, Bildverarbeitung für die Medizin 2024 , pages 94–99, ...

2024
[7]

Mr physicist’s last exam – LLM4MR

Moritz Zaiss. Mr physicist’s last exam – LLM4MR. https://www. mr-physik.med.fau.de/2025/03/03/ mr-physicists-last-exam-llm4mr/ , 2025. MRT-Forschung am Universitätsklinikum Erlangen, March 2025

2025
[8]

Agentic MR sequence development—leveraging LLMs with MR tools and tests for physics-informed sequence development

Moritz Zaiss, Jonathan Endres, and Si- mon Weinmüller. Agentic MR sequence development—leveraging LLMs with MR tools and tests for physics-informed sequence development. In Book of Abstracts ESMRMB 2025 Online 41st Annual Scientiﬁc Meeting, 8–11 October 2025 , 2025. Abstract 046. Magn Reson Mater Phy 38 (Suppl 1)

2025
[9]

Autoresearch: A minimalist framework for automated machine learning re- search

Andrej Karpathy. Autoresearch: A minimalist framework for automated machine learning re- search. GitHub Repository, 2026. Open-source project for agentic ML experimentation

2026
[10]

Mr-zero: Learn- ing mri sequence design from scratch

Alexander Loktyushin et al. Mr-zero: Learn- ing mri sequence design from scratch. In Med- ical Image Computing and Computer Assisted Intervention – MICCAI 2021 , volume 12901, pages 709–724, 2021

2021
[11]

Dang, and Moritz Zaiss

Jonathan Endres, Simon Weinmüller, N. Dang, and Moritz Zaiss. Phase dis- tribution graphs for fast, diﬀerentiable, and spatially encoded bloch simulations of arbi- trary mri sequences. MRM, 92(3):1189–1204, 2024

2024
[12]

Single shot spiral tse with annulated segmentation

Juergen Hennig, Antonia Barghoorn, Shuoyue Zhang, and Maxim Zaitsev. Single shot spiral tse with annulated segmentation. Magnetic Resonance in Medicine , 88(2):651–662, 2022

2022
[13]

Hussain, J

S. Hussain, J. Huber, M. Günther, and D. Hoinkiss. Seqgpt: Training a large language model to generate MRI pulse se- quences. In Proceedings of the ISMRM & ISMRT Annual Meeting & Exhibition (Hon- olulu, Hawai’i, USA) , 2025. Abstract 3381

2025
[14]

AI-driven and automated MRI sequence optimization in scanner-independent MRI sequences formu- lated by a domain-speciﬁc language

Daniel Christopher Hoinkiss, Jörn Huber, Christina Plump, Christoph Lüth, Rolf Drech- sler, and Matthias Günther. AI-driven and automated MRI sequence optimization in scanner-independent MRI sequences formu- lated by a domain-speciﬁc language. Frontiers in Neuroimaging, 2, May 2023

2023
[15]

Pulsepal

Robert Moskwa. Pulsepal. https://github. com/rmoskwa/Pulsepal, 2025. Python code, accessed 2025-07-30. 14

2025