pith. machine review for the scientific record. sign in

arxiv: 2604.04127 · v1 · submitted 2026-04-05 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

Fenghao Song, Shaojing Yang, Xi Zhou

Pith reviewed 2026-05-13 17:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords SAR ship detectionmixture of expertsDETRspeckle noisesmall target detectionobject detectionSAR imagerywavelet experts
0
0 comments X

The pith

A DETR-based detector with sparse expert routing filters SAR speckle noise while preserving small-ship details for higher accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SAR ship detection struggles with speckle noise, coastal clutter, and tiny targets that get lost in downsampling. The paper builds SARES-DEIM on the DETR framework and adds SARESMoE, a mixture-of-experts block whose sparse gate sends features only to frequency and wavelet specialists that suppress noise without erasing ship signatures. A separate Space-to-Depth Enhancement Pyramid neck keeps high-resolution cues from early layers to help localize small objects. On the HRSID benchmark the model reports 76.4 percent mAP50:95 and 93.8 percent mAP50, beating both general YOLO detectors and earlier SAR-specific methods.

Core claim

SARES-DEIM grounds ship detection in the DETR paradigm and introduces SARESMoE, a sparsely-activated module that routes input features via a gating network to frequency-domain and wavelet-domain expert networks. These experts filter coherent speckle and coastal clutter. The SDEP neck augments the feature pyramid by space-to-depth operations on shallow features to retain spatial cues lost in downsampling, enabling accurate localization of small ships.

What carries the argument

SARESMoE module that uses a sparse gating network to activate only frequency and wavelet expert sub-networks, paired with the SDEP neck that applies space-to-depth transformations to shallow features for high-resolution preservation.

If this is right

  • Real-world SAR surveillance can run at higher accuracy without paying the full cost of dense expert activation because only a few experts fire per token.
  • Small-target performance in cluttered radar scenes improves when shallow high-resolution maps are explicitly protected from downsampling losses.
  • The same sparse routing pattern could be inserted into other transformer detectors that face coherent noise or scale imbalance.
  • Training cost stays modest because the gating network learns to ignore irrelevant experts rather than training every expert on every sample.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The expert specialization might transfer to other coherent-noise domains such as ultrasound or synthetic aperture sonar if the frequency and wavelet banks are kept fixed.
  • Edge deployment becomes more practical because sparsity keeps FLOPs low even when the total expert pool grows.
  • Multi-temporal SAR stacks could be handled by adding a temporal expert that the gate learns to activate only on change-rich frames.
  • If the learned experts turn out to be largely domain-specific, swapping in a new SAR sensor would require only retraining the gate rather than the entire backbone.

Load-bearing premise

The gating network can learn to send ship features to the right experts so that speckle and clutter are removed without discarding the ship signatures themselves, and the space-to-depth step in SDEP adds localization value beyond ordinary feature pyramids.

What would settle it

An ablation on the HRSID dataset in which the MoE gating is replaced by ordinary convolutions or the SDEP neck is removed, yet the mAP50:95 remains above 76.4 percent with all other training details fixed.

Figures

Figures reproduced from arXiv: 2604.04127 by Fenghao Song, Shaojing Yang, Xi Zhou.

Figure 1
Figure 1. Figure 1: SARES-DEIM overview. The architecture focuses on domain-specific feature enhancement and high-resolution spatial [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative detection comparisons on the HRSID dataset. The rows from top to bottom represent the Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Module-level ablation visualizations on HRSID, cor [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Expert-level CAM visualizations on HRSID (Pure MoE [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Ship detection in Synthetic Aperture Radar (SAR) imagery is fundamentally challenged by inherent coherent speckle noise, complex coastal clutter, and the prevalence of small-scale targets. Conventional detectors, primarily designed for optical imagery, often exhibit limited robustness against SAR-specific degradation and suffer from the loss of fine-grained ship signatures during spatial downsampling. To address these limitations, we propose SARES-DEIM, a domain-aware detection framework grounded in the DEtection TRansformer (DETR) paradigm. Central to our approach is SARESMoE (SAR-aware Expert Selection Mixture-of-Experts), a module leveraging a sparse gating mechanism to selectively route features toward specialized frequency and wavelet experts. This sparsely-activated architecture effectively filters speckle noise and semantic clutter while maintaining high computational efficiency. Furthermore, we introduce the Space-to-Depth Enhancement Pyramid (SDEP) neck to preserve high-resolution spatial cues from shallow stages, significantly improving the localization of small targets. Extensive experiments on two benchmark datasets demonstrate the superiority of SARES-DEIM. Notably, on the challenging HRSID dataset, our model achieves a mAP50:95 of 76.4% and a mAP50 of 93.8%, outperforming state-of-the-art YOLO-series and specialized SAR detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SARES-DEIM, a DETR-based detection framework for SAR ship detection. It introduces SARESMoE, a sparse mixture-of-experts module with a gating mechanism that routes features to specialized frequency and wavelet experts to suppress speckle noise and coastal clutter, and the SDEP neck, a space-to-depth enhancement pyramid that preserves high-resolution cues from shallow layers to improve small-target localization. On the HRSID dataset the model reports mAP50:95 of 76.4% and mAP50 of 93.8%, outperforming YOLO-series and other specialized SAR detectors.

Significance. If the empirical results are reproducible, the combination of sparse MoE routing for SAR-specific noise handling and resolution-preserving neck design could meaningfully advance transformer-based detectors in challenging remote-sensing domains. The approach is internally consistent with the stated architecture and loss formulation, and the reported HRSID numbers align with the described components; however, the absence of training protocols, ablation tables, and statistical measures limits immediate assessment of robustness and contribution.

major comments (2)
  1. [Results] Results section: the headline performance numbers (76.4% mAP50:95, 93.8% mAP50 on HRSID) are presented without ablation studies isolating the contribution of the sparse gating in SARESMoE versus standard DETR attention, without error bars across multiple runs, and without explicit baseline re-implementations, which are required to substantiate the claim of outperforming SOTA YOLO and SAR detectors.
  2. [Methods] Methods section (SARESMoE description): the sparse gating mechanism is described at a high level but lacks the precise mathematical definition of the gating function, the frequency/wavelet expert architectures, or the routing loss term, making it impossible to verify that the experts reliably preserve ship signatures while suppressing speckle without additional assumptions.
minor comments (2)
  1. [Figures] Figure captions and architecture diagram: the SDEP neck diagram would benefit from explicit annotation of the space-to-depth operations and how they feed into the DETR decoder to clarify the resolution-preservation path.
  2. [Introduction] Notation: the acronym SARES-DEIM is introduced in the title and abstract but the expansion is not restated in the introduction, which could confuse readers unfamiliar with the component names.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the empirical validation and technical clarity of SARES-DEIM. We address each major point below and will incorporate revisions to improve reproducibility and substantiation of the claims.

read point-by-point responses
  1. Referee: [Results] Results section: the headline performance numbers (76.4% mAP50:95, 93.8% mAP50 on HRSID) are presented without ablation studies isolating the contribution of the sparse gating in SARESMoE versus standard DETR attention, without error bars across multiple runs, and without explicit baseline re-implementations, which are required to substantiate the claim of outperforming SOTA YOLO and SAR detectors.

    Authors: We agree that the results section would be strengthened by additional analyses. In the revised manuscript we will add ablation tables that isolate the contribution of the sparse gating mechanism in SARESMoE relative to standard DETR attention, report standard deviations or error bars from multiple independent training runs, and provide explicit details on baseline re-implementations (including training protocols and hyper-parameters used for fair comparison). revision: yes

  2. Referee: [Methods] Methods section (SARESMoE description): the sparse gating mechanism is described at a high level but lacks the precise mathematical definition of the gating function, the frequency/wavelet expert architectures, or the routing loss term, making it impossible to verify that the experts reliably preserve ship signatures while suppressing speckle without additional assumptions.

    Authors: We acknowledge the need for greater mathematical precision. The revised methods section will include the exact equations for the sparse gating function, the detailed architectures of the frequency and wavelet experts, and the formulation of the routing loss term, enabling direct verification of how ship signatures are preserved while speckle and clutter are suppressed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical architecture (SARES-DEIM) extending DETR with a sparse MoE gating module (SARESMoE) and a Space-to-Depth Enhancement Pyramid (SDEP) neck. No derivation chain, equations, or fitted parameters are presented that reduce to inputs by construction. Central claims rest on reported benchmark metrics (mAP50:95 = 76.4% on HRSID) and standard comparisons to YOLO baselines; these are externally falsifiable and do not rely on self-definitional steps, self-citation load-bearing, or ansatz smuggling. The architecture description and loss formulation are internally consistent without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The new modules SARESMoE and SDEP are architectural inventions whose independent evidence would require full paper validation.

pith-pipeline@v0.9.0 · 5527 in / 1088 out tokens · 36500 ms · 2026-05-13T17:00:13.892336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

  1. [1]

    Lsdformer: Lightweight sar ship detection enhanced with efficient multiattention and structural reparameterization,

    R. Jiang, H. Shi, J. Ni, J. Li, Y . Feng, X. Chen, and Y . Li, “Lsdformer: Lightweight sar ship detection enhanced with efficient multiattention and structural reparameterization,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 19, pp. 1359– 1377, 2026

  2. [2]

    Msf- sar: A multiscale fusion method for small ship detection in sar images,

    H. Cui, T. Li, N. Su, Y . Yan, S. Feng, C. Zhao, J. He, and F. Gu, “Msf- sar: A multiscale fusion method for small ship detection in sar images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 19, pp. 2200–2212, 2026

  3. [3]

    Dbw-yolo: A high-precision sar ship detection method for complex environments,

    X. Tang, J. Zhang, Y . Xia, and H. Xiao, “Dbw-yolo: A high-precision sar ship detection method for complex environments,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 7029–7039, 2024. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING, EARLY ACCESS, 2026 10

  4. [4]

    Deformable feature fusion and accurate anchors prediction for lightweight sar ship detector based on dynamic hierarchical model pruning,

    Y . Guo, S. Chen, R. Zhan, W. Wang, and J. Zhang, “Deformable feature fusion and accurate anchors prediction for lightweight sar ship detector based on dynamic hierarchical model pruning,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 15 019–15 036, 2025

  5. [5]

    Fevt-sar: Multicategory oriented sar ship detection based on feature enhancement vision transformer,

    M. Fang, Y . Gu, and D. Peng, “Fevt-sar: Multicategory oriented sar ship detection based on feature enhancement vision transformer,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 2704–2717, 2025

  6. [6]

    What is YOLOv8: An in-depth exploration of the in- ternal features of the next-generation object detector,

    M. Yaseen, “What is YOLOv8: An in-depth exploration of the in- ternal features of the next-generation object detector,”arXiv preprint arXiv:2408.15857, 2024

  7. [7]

    YOLOv11: An Overview of the Key Architectural Enhancements

    R. Khanam and M. Hussain, “YOLOv11: An overview of the key architectural enhancements,”arXiv preprint arXiv:2410.17725, 2024

  8. [8]

    YOLOv12: Attention-Centric Real-Time Object Detectors

    Y . Tian, Q. Ye, and D. Doermann, “YOLO12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025

  9. [9]

    Xinlei Chen, Hao Fang, Tsung-yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,”arXiv preprint arXiv:2005.12872, 2020

  10. [10]

    DETRs beat YOLOs on real-time object detection,

    W. Lv, S. Xu, Y . Zhao, G. Wang, J. Wei, C. Cui, Y . Du, Q. Dang, and Y . Liu, “DETRs beat YOLOs on real-time object detection,”arXiv preprint arXiv:2304.08069, 2023

  11. [11]

    DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

    H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.- Y . Shum, “DINO: DETR with improved denoising anchor boxes for end-to-end object detection,”arXiv preprint arXiv:2203.03605, 2022

  12. [12]

    DEIM: DETR with improved matching for fast convergence,

    S. Huang, Z. Lu, X. Cun, Y . Yu, X. Zhou, and X. Shen, “DEIM: DETR with improved matching for fast convergence,”arXiv preprint arXiv:2412.04234, 2025

  13. [13]

    D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement,

    Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement,”arXiv preprint arXiv:2410.13842, 2024

  14. [14]

    A novel multi-frequency coordinated module for SAR ship detection,

    C. Qiao, F. Shen, X. Wang, R. Wang, F. Cao, S. Zhao, and C. Li, “A novel multi-frequency coordinated module for SAR ship detection,” in Proceedings of the IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), 2022, pp. 804–811

  15. [15]

    A novel cross frequency- domain interaction learning for aerial oriented object detection,

    W. Weng, W. Lin, F. Lin, J. Ren, and F. Shen, “A novel cross frequency- domain interaction learning for aerial oriented object detection,” inChi- nese Conference on Pattern Recognition and Computer Vision (PRCV), 2023, pp. 292–305

  16. [16]

    Enhancing aerial object detec- tion with selective frequency interaction network,

    W. Weng, M. Wei, J. Ren, and F. Shen, “Enhancing aerial object detec- tion with selective frequency interaction network,”IEEE Transactions on Artificial Intelligence, vol. 1, no. 01, pp. 1–12, 2024

  17. [17]

    Adaptive mixtures of local experts,

    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

  18. [18]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,”arXiv preprint arXiv:1701.06538, 2017

  19. [19]

    YOLO-Master: MoE- accelerated with specialized transformers for enhanced real-time detec- tion,

    X. Lin, J. Peng, Z. Gan, J. Zhu, and J. Liu, “YOLO-Master: MoE- accelerated with specialized transformers for enhanced real-time detec- tion,”arXiv preprint arXiv:2512.23273, 2025

  20. [20]

    Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network,

    H. Li, R. Zhang, Y . Pan, J. Ren, and F. Shen, “Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network,” inProceedings of the International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

  21. [21]

    No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects,

    R. A. Sahi, H. Goyal, Y . Akhtar, and S. Kumar, “No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects,”arXiv preprint arXiv:2208.03641, 2022

  22. [22]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944

  23. [23]

    ASTRA: Let Arbitrary Subjects Transform in Video Editing

    F. Shen, W. Xu, R. Yan, D. Zhang, X. Shu, and J. Tang, “IMAGEdit: Let any subject transform,”arXiv preprint arXiv:2510.01186, 2025

  24. [24]

    IMAGHar- mony: Controllable image editing with consistent object quantity and layout,

    F. Shen, X. Du, Y . Gao, J. Yu, Y . Cao, X. Lei, and J. Tang, “IMAGHar- mony: Controllable image editing with consistent object quantity and layout,”arXiv preprint arXiv:2506.01949, 2025

  25. [25]

    IMAGGarment-1: Fine-grained garment generation for controllable fashion design,

    F. Shen, J. Yu, C. Wang, X. Jiang, X. Du, and J. Tang, “IMAGGarment-1: Fine-grained garment generation for controllable fashion design,”arXiv preprint arXiv:2504.13176, 2025

  26. [26]

    Advancing pose-guided image synthesis with progressive conditional diffusion models,

    F. Shen, H. Ye, J. Zhang, C. Wang, X. Han, and Y . Wei, “Advancing pose-guided image synthesis with progressive conditional diffusion models,” inProceedings of the International Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://openreview.net/forum?id=rHzapPnCgT

  27. [27]

    Imagpose: A unified conditional framework for pose-guided person generation,

    F. Shen and J. Tang, “Imagpose: A unified conditional framework for pose-guided person generation,”Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 6246–6266, 2024

  28. [28]

    Imagdressing-v1: Customizable virtual dressing,

    F. Shen, X. Jiang, X. He, H. H. Ye, C. Wang, X. Du, Z. Li, and J. Tang, “Imagdressing-v1: Customizable virtual dressing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 6795–6804

  29. [29]

    Long-term TalkingFace generation via motion-prior conditional diffusion model,

    F. Shen, C. Wang, J. Gao, Q. Guo, J. Dang, J. Tang, and T.-S. Chua, “Long-term TalkingFace generation via motion-prior conditional diffusion model,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  30. [30]

    Spatial-frequency dual progressive attention network for medical image segmentation,

    Z. Zhou, A. He, Y . Wu, R. Yao, X. Xie, and T. Li, “Spatial-frequency dual progressive attention network for medical image segmentation,” arXiv preprint arXiv:2406.07952, 2024

  31. [31]

    Wavelet con- volutions for large receptive fields,

    S. E. Finder, R. Amoyal, E. Treister, and O. Freifeld, “Wavelet con- volutions for large receptive fields,”arXiv preprint arXiv:2407.05848, 2024

  32. [32]

    GhostNet: More features from cheap operations,

    K. Han, Y . Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,”arXiv preprint arXiv:1911.11907, 2020

  33. [33]

    Frequency- adaptive dilated convolution for semantic segmentation,

    Z. Li, Y . Chen, Q. Xu, Y . Liu, and H. Zhao, “Frequency- adaptive dilated convolution for semantic segmentation,”arXiv preprint arXiv:2403.05369, 2024

  34. [34]

    HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation,

    S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation,”IEEE Access, vol. 8, pp. 120 234–120 254, 2020

  35. [35]

    A SAR dataset of ship detection for deep learning under complex backgrounds,

    Y . Wang, C. Wang, H. Zhang, Y . Dong, and S. Wei, “A SAR dataset of ship detection for deep learning under complex backgrounds,”Remote Sensing, vol. 11, no. 7, p. 765, 2019

  36. [36]

    Squeeze and excitation rank Faster R-CNN for ship detection in SAR images,

    Z. Lin, K. Ji, X. Leng, and G. Kuang, “Squeeze and excitation rank Faster R-CNN for ship detection in SAR images,”IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 5, pp. 751–755, 2019

  37. [37]

    SSD: Single shot multibox detector,

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “SSD: Single shot multibox detector,” inProceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 21–37

  38. [38]

    FCOS: Fully convolutional one- stage object detection,

    Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one- stage object detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9626–9635

  39. [39]

    Cross-scale context-aware ship detection in SAR images using CSCF-Net,

    L. Qi, C. Huang, and Q. Guo, “Cross-scale context-aware ship detection in SAR images using CSCF-Net,”IEEE Geoscience and Remote Sensing Letters, vol. 23, pp. 1–5, 2026

  40. [40]

    SAR-D-FINE: A context-aware detector for small and densely packed ship detection in SAR imagery,

    X. Fan, B. Xing, X. Wang, H. Liu, C. Yan, and P. Zhi, “SAR-D-FINE: A context-aware detector for small and densely packed ship detection in SAR imagery,”IEEE Geoscience and Remote Sensing Letters, vol. 23, pp. 1–5, 2026

  41. [41]

    Ultralytics YOLOv8,

    G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,”GitHub repository, 2023, https://github.com/ultralytics/ultralytics, v8.0.0

  42. [42]

    Ultralytics YOLO11,

    G. Jocher and J. Qiu, “Ultralytics YOLO11,”GitHub repository, 2024, https://github.com/ultralytics/ultralytics, v11.0.0

  43. [43]

    Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection,

    X. Yuan, Z. Zheng, Y . Li, X. Liu, L. Liu, X. Li, Q. Hou, and M.-M. Cheng, “Strip r-cnn: Large strip convolution for remote sensing object detection,” 2025. [Online]. Available: https://arxiv.org/abs/2501.03775