arxiv: 2605.07955 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative Model

Vicent Caselles-Ballester , Eloy Mart\'inez-Heras , Giuseppe Pontillo , Zoe Mendelsohn , Elena M. Marr\'on , Juan Luis Garc\'ia Fern\'andez , Laia Subirats , Jon Stutters

show 4 more authors

Jeremy Chataway Frederik Barkhof Sara Llufriu Ferran Prados

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:19 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords MS lesion segmentationlongitudinal segmentationcontrast-agnosticstochastic generative modelmultiple sclerosisdeep learningcross-sectionallesion load dynamics

0 comments

The pith

TimeLesSeg uses one convolutional network to segment MS lesions from either single scans or longitudinal series while remaining robust to scanner contrast changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a single CNN can handle both cross-sectional and longitudinal MS lesion segmentation by treating lesion masks as priors and filling missing priors with empty masks. It generates synthetic prior timepoints by stochastically deforming each lesion individually with morphological operations, addressing the scarcity of real longitudinal data. Gaussian mixture model domain randomization exposes the network to varied intensity profiles, producing contrast-agnostic behavior. This unified approach yields higher overlap and distance metrics than prior contrast-agnostic methods on single-modality inputs and more accurate lesion-load tracking than SAMSEG or LST-AI on time-series data across five datasets.

Core claim

TimeLesSeg models pathological priors through lesion masks processed together with the current scan, enables cross-sectional use via empty masks, and trains on realistic longitudinal patterns by stochastically deforming individual lesions with morphological operations; combined with GMM-based domain randomization, the single network outperforms contrast-agnostic state-of-the-art methods on single-modality inputs and SAMSEG on longitudinal inputs while capturing lesion load dynamics more accurately than both SAMSEG and LST-AI.

What carries the argument

The stochastic generative pipeline that deforms each lesion separately via morphological operations to synthesize prior timepoints, paired with empty-mask handling for cross-sectional cases.

If this is right

The same network outperforms contrast-agnostic state-of-the-art methods on single-modality inputs using overlap and distance metrics.
Longitudinal processing exceeds SAMSEG accuracy and tracks lesion load changes more precisely than both SAMSEG and LST-AI.
Cross-sectional and longitudinal inputs are handled seamlessly by the identical model without retraining or architecture changes.
Domain randomization via Gaussian mixture models removes dependence on specific scanner intensity profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clinics could replace separate single-timepoint and follow-up tools with one deployed model, reducing workflow complexity.
The same lesion-deformation generator could augment scarce longitudinal datasets for other progressive brain conditions.
Extending the empty-mask mechanism to other missing-data scenarios, such as partial modality dropout, appears straightforward.

Load-bearing premise

Stochastic morphological deformations of individual lesions generate prior timepoints whose evolution patterns are realistic enough for the trained model to generalize to real patient lesion dynamics.

What would settle it

Performance on a held-out real longitudinal MS dataset with expert-tracked lesion load changes would fall below SAMSEG or LST-AI if the synthetic priors fail to match actual evolution statistics.

read the original abstract

Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distribution, e.g., changes in scanner; as well as the structure of inputs, evident in the current divide between cross-sectional and longitudinal approaches. We introduce TimeLesSeg, a unified contrast-agnostic framework designed to segment MS lesions regardless of the presence of a temporal dimension in its inputs, with a single convolutional neural network. Our approach models pathological priors through lesion masks, which are processed together with the current scan. Cross-sectional processing is enabled by exposing the model to training cases where no prior information is available, which are modeled with an empty mask, allowing it to operate seamlessly in both scenarios. To overcome the scarcity and inconsistency of longitudinal datasets, we propose a novel generative pipeline in which patterns of lesion evolution are simulated by stochastically deforming each individual lesion with morphological operations, producing realistic prior timepoints. In parallel, we achieve contrast agnosticism through Gaussian mixture model-based domain randomization, enabling the network to experience a wide spectrum of intensity profiles. Results on three publicly available and two in-house datasets show that TimeLesSeg outperforms the contrast-agnostic state of the art on single-modality inputs across overlap- and distance-based metrics. In longitudinal processing, our method outperforms SAMSEG, and captures lesion load dynamics more accurately than both the former and LST-AI. All source code related to the development of TimeLesSeg is available at https://github.com/NeuroADaS-Lab/TimeLesSeg.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TimeLesSeg unifies cross-sectional and longitudinal MS lesion segmentation via empty masks and stochastic morphological deformations, but the longitudinal claims rest on unvalidated synthetic priors.

read the letter

The paper puts forward a single CNN that handles both single-timepoint and multi-timepoint MS lesion segmentation. It feeds an empty mask when no prior scan exists and generates synthetic prior masks by stochastically deforming individual lesions with morphological operations; GMM randomization is added to make the model contrast-agnostic. This concrete training recipe is the main new piece beyond standard contrast-agnostic segmentation networks, and releasing the code on GitHub is useful for anyone who wants to test it on their own data. The approach directly tackles the practical split between cross-sectional and longitudinal pipelines that exists in current MS tools. The experiments cover three public and two in-house datasets and report outperformance on overlap and distance metrics for single-modality inputs plus better lesion-load tracking than SAMSEG and LST-AI. That said, the abstract contains no numerical values, confidence intervals, or statistical tests, and the description of the generative pipeline gives no quantitative check that the deformed lesions match real patient evolution patterns in volume change, shape, or spatial distribution. If those synthetic priors diverge from actual dynamics, the longitudinal gains become hard to trust even if the cross-sectional results hold. The method itself is not circular; the augmentation is defined independently of the test data. This work is aimed at MS neuroimaging groups that need one model for mixed scan types. It shows honest engagement with the data-scarcity problem and deserves a serious referee to examine the full experimental details and the realism validation that is missing from the abstract.

Referee Report

2 major / 1 minor

Summary. The paper introduces TimeLesSeg, a single CNN for MS lesion segmentation that operates in a contrast-agnostic manner on both cross-sectional inputs (modeled with empty prior masks) and longitudinal inputs. It uses lesion masks as pathological priors and addresses longitudinal data scarcity via a stochastic generative pipeline that deforms individual lesions with morphological operations to synthesize prior timepoints; contrast invariance is achieved through GMM-based domain randomization. The central claims are that the method outperforms contrast-agnostic SOTA on single-modality inputs across overlap- and distance-based metrics on three public and two in-house datasets, and that in longitudinal mode it outperforms SAMSEG while capturing lesion load dynamics more accurately than SAMSEG and LST-AI. All source code is released.

Significance. If the synthetic priors are shown to be realistic and the performance gains are supported by quantitative metrics and statistical tests, the work would offer a practical unification of cross-sectional and longitudinal MS lesion segmentation, directly addressing data scarcity and the current methodological divide. The public release of the code is a clear strength that supports reproducibility and further development.

major comments (2)

[§3] §3 (stochastic generative pipeline): The longitudinal outperformance claims versus SAMSEG and LST-AI rest on training with synthetic prior timepoints generated by stochastically deforming lesion masks via morphological operations. No quantitative validation is reported (e.g., Kolmogorov-Smirnov tests or Wasserstein distances on lesion volume deltas, Dice overlap between synthetic and real follow-up pairs, or shape descriptors) demonstrating that the simulated evolution patterns statistically match real patient dynamics in the target datasets. This is load-bearing for the generalization argument.
[Results] Results section (and abstract): The manuscript states superior performance on multiple datasets across overlap- and distance-based metrics but supplies no numerical values, confidence intervals, or statistical tests (e.g., paired t-tests or Wilcoxon tests with p-values) in the provided description. Without these, the cross-sectional and longitudinal superiority claims cannot be evaluated for effect size or reliability.

minor comments (1)

[Abstract] The abstract refers to 'realistic prior timepoints' without specifying the quantitative criteria or metrics used to judge realism of the morphological deformations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: [§3] §3 (stochastic generative pipeline): The longitudinal outperformance claims versus SAMSEG and LST-AI rest on training with synthetic prior timepoints generated by stochastically deforming lesion masks via morphological operations. No quantitative validation is reported (e.g., Kolmogorov-Smirnov tests or Wasserstein distances on lesion volume deltas, Dice overlap between synthetic and real follow-up pairs, or shape descriptors) demonstrating that the simulated evolution patterns statistically match real patient dynamics in the target datasets. This is load-bearing for the generalization argument.

Authors: We agree that explicit quantitative validation of the synthetic priors would strengthen the claims regarding their realism and the method's generalization. While the current manuscript validates the approach primarily through downstream segmentation performance on real longitudinal data, we will add a new subsection to §3 in the revised manuscript. This will include Kolmogorov-Smirnov tests on lesion volume deltas, Wasserstein distances, and Dice overlaps between synthetic and available real follow-up pairs from the in-house datasets, along with shape descriptor comparisons. These additions will directly address the statistical matching to real patient dynamics. revision: yes
Referee: [Results] Results section (and abstract): The manuscript states superior performance on multiple datasets across overlap- and distance-based metrics but supplies no numerical values, confidence intervals, or statistical tests (e.g., paired t-tests or Wilcoxon tests with p-values) in the provided description. Without these, the cross-sectional and longitudinal superiority claims cannot be evaluated for effect size or reliability.

Authors: The full manuscript includes detailed results tables with all numerical metric values (Dice, HD95, etc.), standard deviations, and statistical tests (paired t-tests and Wilcoxon signed-rank tests with exact p-values) comparing TimeLesSeg against the baselines on each dataset. To improve readability and address the concern, we will revise the abstract and the opening paragraphs of the Results section to explicitly include key numerical values, confidence intervals, and p-values in the text, while retaining the full tables for completeness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained supervised learning with independent augmentation.

full rationale

The paper defines a standard CNN segmentation model trained on real scans paired with either empty masks (cross-sectional) or synthetically generated prior masks. The generative pipeline uses stochastic morphological operations on individual lesion masks as an explicit data-augmentation step to address longitudinal data scarcity; this step is not derived from or fitted to the target evaluation metrics or test-set distributions. Contrast agnosticism is achieved via separate GMM-based intensity randomization. All performance claims (outperformance vs. baselines on public and in-house datasets) are external comparisons on held-out real data and do not reduce by construction to quantities fitted from those same data. No self-citations are used as load-bearing uniqueness theorems, and no equations or claims equate the final outputs to the inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that morphological deformations of lesion masks produce training examples whose temporal statistics match real MS lesion evolution and on the assumption that Gaussian-mixture intensity randomization covers the range of real scanner contrasts.

axioms (1)

domain assumption Stochastic morphological operations on lesion masks generate sufficiently realistic patterns of lesion evolution for training purposes
Invoked to overcome scarcity of longitudinal datasets; appears in the description of the generative pipeline.

invented entities (1)

Stochastic generative pipeline for lesion deformation no independent evidence
purpose: To synthesize prior timepoint lesion masks when real longitudinal data are unavailable
The pipeline is introduced as a novel component; no independent evidence of realism beyond downstream segmentation performance is provided in the abstract.

pith-pipeline@v0.9.0 · 5662 in / 1402 out tokens · 44173 ms · 2026-05-11T03:19:00.167635+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

visual thoughts,

“Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In arXiv [cs.LG] . https://doi.org/10.48550/ARXIV.1506.03099. Billot, Benjamin, Douglas N. Greve, Oula Puonti, et al

work page doi:10.48550/arxiv.1506.03099
[2]

Geodesic Information Flows: Spatially-Variant Graphs and Their Application to Segmentation and Fusion

“Geodesic Information Flows: Spatially-Variant Graphs and Their Application to Segmentation and Fusion.” IEEE Transactions on Medical Imaging 34 (9): 1976–1988. Cerri, Stefano, Douglas N. Greve, Andrew Hoopes, et al

work page 1976
[3]

HeMIS: Hetero-Modal Image Segmentation

“HeMIS: Hetero-Modal Image Segmentation.” In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 . Lecture Notes in Computer Science. Springer International Publishing. He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li

work page 2016
[4]

Isensee, Fabian, Paul F

http://arxiv.org/abs/1812.01187. Isensee, Fabian, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein

work page arXiv
[5]

Lesjak, Žiga, Alfiia Galimzianova, Aleš Koren, et al

http://arxiv.org/abs/2312.05119. Lesjak, Žiga, Alfiia Galimzianova, Aleš Koren, et al

work page arXiv
[6]

Pasini, Marco, Javier Nistal, Stefan Lattner, and George Fazekas

http://arxiv.org/abs/2405.14714. Pasini, Marco, Javier Nistal, Stefan Lattner, and George Fazekas

work page arXiv
[7]

Continuous autoregressive models with noise augmentation avoid error accumulation.arXiv preprint arXiv:2411.18447, 2024

http://arxiv.org/abs/2411.18447. Puonti, Oula, Juan Eugenio Iglesias, and Koen Van Leemput

work page arXiv