arxiv: 2605.02575 · v1 · submitted 2026-05-04 · 💻 cs.CV · eess.SP

Recognition: 1 theorem link

Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI

Yinzhe Wu , Hongyu Rui , Fanwen Wang , Jiahao Huang , Zi Wang , Guang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:51 UTC · model grok-4.3

classification 💻 cs.CV eess.SP

keywords diffusion MRIsuper-resolutionimplicit neural representationsself-supervised learningangular super-resolutionq-spaceDTIFiLM conditioning

0 comments

The pith

A self-supervised implicit neural representation reconstructs high-resolution dMRI from single rotating views while synthesizing unseen angular samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Spatial-Angular Implicit Neural Representation (SA-INR) that learns a continuous function over both spatial coordinates and q-space from highly undersampled, anisotropic single-view diffusion MRI data. An MLP is conditioned on a b=0 structural image and the diffusion direction through FiLM layers and trained end-to-end without external supervision. This yields accurate spatial super-resolution on the acquired directions and zero-shot angular super-resolution on previously unseen directions. The synthesized data also produces more accurate downstream diffusion tensor imaging fits than the original sparse inputs. If the approach holds, it would remove the need for dense rotational sampling and thereby shorten scan times for mesoscale quantitative dMRI.

Core claim

The SA-INR framework, consisting of an MLP conditioned on a b=0 prior and the b-direction via FiLM, is trained end-to-end on single-view anisotropic inputs. It reconstructs high-fidelity volumes at the trained directions (34.82 dB) and, by learning a continuous q-space representation, synthesizes unseen directions at comparable fidelity (33.08 dB). The angular synthesis in turn improves the quantitative accuracy of DTI model fitting compared with the original single-view data.

What carries the argument

Spatial-Angular Implicit Neural Representation (SA-INR): a FiLM-conditioned MLP that maps 3-D spatial coordinates plus a diffusion direction to signal intensity, trained self-supervised on single-view rotating acquisitions.

Load-bearing premise

The continuous q-space function learned from single-view anisotropic data generalizes faithfully to arbitrary unseen b-directions without introducing artifacts that degrade quantitative measures such as DTI fitting.

What would settle it

If fractional anisotropy or mean diffusivity maps computed from DTI fits on the zero-shot synthesized full angular set differ significantly from those obtained on a fully sampled reference acquisition of the same subject, the generalization claim would be refuted.

read the original abstract

Rotating-view thick-slice acquisition is highly SNR-efficient for mesoscale diffusion MRI (dMRI) but requires numerous rotating views to satisfy Nyquist sampling, resulting in long scan time. We propose a self-supervised Spatial-Angular Implicit Neural Representation (SA-INR) that reconstructs high-resolution dMRI from a single view per diffusion direction, representing a massive acceleration. Our model, an MLP conditioned on a b=0 structural prior and the b-direction via FiLM, is trained end-to-end on the anisotropic input. The framework not only accurately reconstructs the trained b-directions (spatial SR) but also learns a continuous q-space representation, enabling high-fidelity "zero-shot" synthesis of unseen b-directions (angular SR). On simulated data, our method achieved high fidelity for both trained (34.82 dB) and unseen (33.08 dB) directions. Most importantly, the synthesized angular data also improved the quantitative accuracy of downstream DTI model fitting. Our SA-INR framework breaks the classical sampling limits, paving the way for fast, quantitative high-resolution dMRI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SA-INR gives a self-supervised INR route to joint spatial and zero-shot angular super-resolution from single-view rotating dMRI, with concrete simulated PSNR and DTI gains, but all validation stays on simulation.

read the letter

The main point is that this paper trains an MLP with FiLM conditioning on b=0 and diffusion direction to reconstruct high-resolution volumes from one rotating view per direction and to synthesize new angles it never saw. The training is fully self-supervised on the anisotropic input itself, which avoids the need for hard-to-get ground-truth multi-view data. On the simulated test cases it reports 34.82 dB PSNR on trained directions and 33.08 dB on unseen ones, plus measurable improvement in downstream DTI parameter accuracy. That combination of spatial and angular synthesis in one end-to-end model is the concrete advance over prior INR or dMRI super-resolution work cited in the abstract. The self-supervised setup is a reasonable match for the acquisition constraints in mesoscale dMRI. The soft spot is that every number and every DTI comparison comes from simulated data only. Real rotating-view scans add motion, eddy currents, Rician noise, and susceptibility effects that standard simulations usually omit. If the learned continuous q-space representation fits the smooth angular behavior in simulation but introduces bias on scanner data, the claim that the method breaks classical sampling limits for quantitative use would not hold. No real acquisition results are shown, so the practical acceleration story remains untested. The architecture itself is straightforward and the reported metrics are given separately for trained and unseen directions, which helps clarity. Citation coverage of INR and dMRI literature looks standard. This paper is for researchers working on dMRI acquisition speed or on neural representations for medical imaging. A reader who wants to see how self-supervision can be applied to angular interpolation in diffusion would get direct value from the method description and the simulated numbers. It deserves a serious referee because the problem is practical, the proposed architecture is coherent, and the simulated evidence is presented with specific numbers rather than vague claims. I would send it to peer review with the clear expectation that real-data experiments will be required in revision.

Referee Report

1 major / 0 minor

Summary. The paper proposes a self-supervised Spatial-Angular Implicit Neural Representation (SA-INR) framework for reconstructing high-resolution diffusion MRI from rotating-view thick-slice acquisitions using only a single view per diffusion direction. An MLP conditioned via FiLM on a b=0 structural prior and b-directions is trained end-to-end on anisotropic inputs to perform spatial super-resolution on trained directions and zero-shot angular super-resolution for unseen directions. On simulated data, the method reports PSNR of 34.82 dB for trained directions and 33.08 dB for unseen directions, with the synthesized angular data improving downstream DTI model fitting accuracy. The authors claim this breaks classical Nyquist sampling limits to enable fast, quantitative high-resolution dMRI.

Significance. If the zero-shot angular generalization and quantitative improvements hold beyond simulation, the SA-INR approach could substantially reduce scan times for mesoscale dMRI by allowing faithful reconstruction from far fewer rotating views while supporting accurate downstream analyses such as DTI. The self-supervised end-to-end training on single-view inputs and the explicit evaluation of both image fidelity metrics and task-specific DTI accuracy are notable strengths that provide some grounding independent of external labels.

major comments (1)

[Abstract] Abstract: All reported quantitative results (34.82 dB PSNR on trained directions, 33.08 dB on unseen directions, and improved DTI fitting) are obtained exclusively from simulated data. The manuscript provides no experiments on real rotating-view dMRI acquisitions, which include unmodeled effects (eddy currents, motion, Rician noise, susceptibility) that could bias the learned continuous q-space representation and degrade the fidelity of zero-shot angular synthesis or downstream quantitative measures.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their thoughtful review and positive comments on the potential impact of our SA-INR framework. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: All reported quantitative results (34.82 dB PSNR on trained directions, 33.08 dB on unseen directions, and improved DTI fitting) are obtained exclusively from simulated data. The manuscript provides no experiments on real rotating-view dMRI acquisitions, which include unmodeled effects (eddy currents, motion, Rician noise, susceptibility) that could bias the learned continuous q-space representation and degrade the fidelity of zero-shot angular synthesis or downstream quantitative measures.

Authors: We agree that all quantitative results are derived exclusively from simulated data, as stated in the abstract and methods. Simulations enable precise ground-truth evaluation of spatial SR, zero-shot angular SR, and downstream DTI accuracy under controlled conditions. We acknowledge that real acquisitions introduce unmodeled effects (eddy currents, motion, Rician noise, susceptibility) that could affect the learned q-space representation. In the revised manuscript we have added a dedicated paragraph in the Discussion section explicitly addressing these limitations, noting that the self-supervised end-to-end training and b=0 conditioning provide a basis for potential generalization, while clarifying that real-data validation remains future work. revision: partial

standing simulated objections not resolved

Validation on real rotating-view dMRI acquisitions

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The SA-INR is an MLP+FiLM implicit representation trained end-to-end self-supervised on the single-view anisotropic inputs to fit a continuous q-space function. Reconstruction of the trained b-directions is the direct training objective and therefore expected, but the zero-shot synthesis of unseen directions and the reported DTI fitting gains are presented as empirical generalization results on simulated data rather than quantities forced by construction from the fitted inputs. No self-citations, uniqueness theorems, or prior-author ansatzes are invoked as load-bearing steps; the method's central claim rests on the INR architecture and its measured performance, which remain independent of the target outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the expressivity of MLPs for continuous spatial-angular signals and the ability of FiLM conditioning to encode diffusion directions. Free parameters are the trained network weights. No new physical entities are postulated.

free parameters (1)

MLP weights and FiLM parameters
Learned during end-to-end self-supervised training on the anisotropic input data.

axioms (1)

domain assumption MLPs with sufficient capacity can represent continuous functions of spatial and angular coordinates
Invoked implicitly when the model is trained to output high-resolution values from coordinate inputs.

invented entities (1)

SA-INR no independent evidence
purpose: Continuous representation of spatial-angular dMRI data
New model introduced in the paper; no independent evidence outside this work.

pith-pipeline@v0.9.0 · 5519 in / 1419 out tokens · 81386 ms · 2026-05-08T18:51:32.040767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 1 canonical work pages

[1]

Recently, high -resolution (HR) dMRI techniques have advanced the field by enabling the investigation of fine-scale features such as intracortical layers and short-range U-fibers

INTRODUCTION The evolution of in -vivo diffusion MRI (dMRI) has progressively provided valuable insights into the structural connectivity and tissue microstructure of the human brain. Recently, high -resolution (HR) dMRI techniques have advanced the field by enabling the investigation of fine-scale features such as intracortical layers and short-range U-f...
[2]

METHOD Our proposed method reconstructs a HR isotropic dMRI volume from a set of highly undersampled, anisotropic, rotating-view thick -slice acquisitions. We achieve this by formulating the reconstruction as a self -supervised, multi - contrast super -resolution problem, solved using a spatial - angular implicit neural representation (SA-INR) (Fig.1). 2....
[3]

To provide a high- fidelity anatomical scaffold, we exploit the high -SNR 𝑏𝑏= 0 image 𝐼𝐼𝑏𝑏=0

Spatial Encoding and Structural Prior: The 2D spatial coordinates 𝐜𝐜 are first mapped to a high -dimensional space using a Fourier Feature embedding, 𝛾𝛾(𝒄𝒄), to enable the network to learn high -frequency details. To provide a high- fidelity anatomical scaffold, we exploit the high -SNR 𝑏𝑏= 0 image 𝐼𝐼𝑏𝑏=0. We pass 𝐼𝐼𝑏𝑏=0 through a Residual Dense Network (...
[4]

zero -shot

Angular Conditioning via FiLM: To make the network's output specific to the diffusion contrast, we condition it on the b- direction vector 𝐠𝐠. The vector 𝐠𝐠 is passed through its own Fourier Feature embedding, γ(𝐠𝐠). This embedding is then used to predict scaling ( β ) and shifting ( α) parameters via small MLPs. These parameters modulate the intermediate...
[5]

trained

RESULT AND DISCUSSION We evaluated our SA-INR framework by first training it on the 40 "trained " b-directions (self-supervised spatial SR) and then testing its ability to synthesize the 10 "unseen" held- out directions (zero-shot angular SR). 3.1. Self-Supervised Spatial Super-Resolution Our model successfully reconstructed high- resolution, isotropic DW...
[6]

two- for-one

DISCUSSION We have proposed a self -supervised, spatial -angular INR framework that successfully reconstructs high -resolution Table I. Quantitative Results of Self-Supervised Spatial Super-Resolution. Evaluation metrics comparing the bilinear baseline (LR) against our SA-INR method (SR) on the 40 trained b-directions. *: p<0.01 from all other rows Mean(S...
[7]

Our framework achieves acceleration by reconstructing high - resolution DWIs from a single thick -slice rotating view per diffusion direction , breaking the classical Nyquist limit

CONCLUSION We introduced a self -supervised, spatial -angular implicit neural representation (SA-INR) for rotating-view dMRI. Our framework achieves acceleration by reconstructing high - resolution DWIs from a single thick -slice rotating view per diffusion direction , breaking the classical Nyquist limit. It further provides a novel zero -shot angular su...
[8]

This study was supported in part by Imperial College London President’s PhD Scholarship and in part by Imperial College London I-X

ACKNOWLEDGMENTS This research has been conducted using the UK Biobank Resource under Application Number 100203. This study was supported in part by Imperial College London President’s PhD Scholarship and in part by Imperial College London I-X. G. Yang was supported by UKRI Future Leaders Fellowship (MR/V023799/1, UKRI2738)
[9]

Submillimeter diffusion MRI using an in-plane segmented 3D multi-slab acquisition and denoiser-regularized reconstruction,

Z. Li, S. Zhu, K. L. Miller, and W. Wu, “Submillimeter diffusion MRI using an in-plane segmented 3D multi-slab acquisition and denoiser-regularized reconstruction,” Med. Image Anal., vol. 107, p. 103834, Jan. 2026

2026
[10]

3D MERMAID: 3D Multi-shot enhanced recovery motion artifact insensitive diffusion for submillimeter, multi-shell, and SNR-efficient diffusion imaging,

S. Feizollah and C. L. Tardif, “3D MERMAID: 3D Multi-shot enhanced recovery motion artifact insensitive diffusion for submillimeter, multi-shell, and SNR-efficient diffusion imaging,” Magn. Reson. Med., vol. 93, no. 6, pp. 2311–2330, 2025

2025
[11]

Romer-EPTI: Rotating-view motion-robust super- resolution EPTI for SNR-efficient distortion-free in-vivo mesoscale diffusion MRI and microstructure imaging,

Z. Dong et al., “Romer-EPTI: Rotating-view motion-robust super- resolution EPTI for SNR-efficient distortion-free in-vivo mesoscale diffusion MRI and microstructure imaging,” Magn. Reson. Med., vol. 93, no. 4, pp. 1535–1555, 2025

2025
[12]

Super-resolution methods in MRI: Can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time?,

E. Plenge et al., “Super-resolution methods in MRI: Can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time?,” Magn. Reson. Med., vol. 68, no. 6, pp. 1983– 1993, 2012

1983
[13]

CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-Assisted Implicit Neural Representation Learning,

R. Wu et al., “CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-Assisted Implicit Neural Representation Learning,” in Machine Learning in Medical Imaging, X. Xu, Z. Cui, I. Rekik, X. Ouyang, and K. Sun, Eds., Cham: Springer Nature Switzerland, 2025, pp. 114–123

2025
[14]

FiLM: Visual Reasoning with a General Conditioning Layer

E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville, “FiLM: Visual Reasoning with a General Conditioning Layer,” Dec. 20, 2017, arXiv: arXiv:1709.07871. doi: 10.48550/arXiv.1709.07871. Fig.3. Downstream quantitative map fitting: Comparison of fitted Mean Diffusivity (MD), Fractional Anisotropy (FA) and FA modulated principal eigenvector (EV1) m...

work page Pith review doi:10.48550/arxiv.1709.07871 2017