pith. sign in

arxiv: 2512.09806 · v2 · pith:E2IARPAMnew · submitted 2025-12-10 · 💻 cs.CV · cs.AI

CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing

Pith reviewed 2026-05-21 17:34 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords hallucinationdeep learningimage reconstructionwaveletshearletconformal predictionU-Netastronomical imaging
0
0 comments X

The pith

CHEM identifies hallucination-prone regions in deep learning image reconstructions using wavelet and shearlet features plus conformal quantile regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework to quantify and locate hallucinations, or unrealistic artifacts, that deep learning models produce during image reconstruction. Such artifacts matter in safety-critical uses like astronomy because they can distort analysis of real data. The approach projects predictions into wavelet and shearlet bases to isolate feature-level regions likely to contain hallucinations. Conformalized quantile regression then supplies distribution-free estimates of hallucination severity. The authors further show through approximation theory why U-shaped networks commonly used for these tasks tend to generate hallucinated outputs.

Core claim

The paper establishes that the Conformal Hallucination Estimation Metric (CHEM) localizes hallucination-prone regions at the level of image features by means of wavelet and shearlet representations and assesses hallucination levels via conformalized quantile regression in a distribution-free manner. A theoretical analysis characterizes CHEM's sensitivity to hallucinated artifacts and its connection to mean squared error. Adopting an approximation-theory viewpoint, the work explains why U-shaped networks are prone to hallucination-prone predictions.

What carries the argument

The Conformal Hallucination Estimation Metric (CHEM), which combines wavelet and shearlet representations for feature localization with conformalized quantile regression for distribution-free hallucination assessment.

If this is right

  • CHEM can highlight specific regions within a model's output image that are most likely to contain hallucinations.
  • The method supplies a distribution-free score for comparing hallucination tendencies across different reconstruction architectures.
  • Approximation theory analysis indicates that U-shaped networks inherently favor predictions containing hallucinations in reconstruction settings.
  • The framework applies to both astronomical image deconvolution on datasets such as CANDELS and natural-image super-resolution on datasets such as DIV2K.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • CHEM could be inserted into training loops to penalize hallucination-prone regions and encourage more reliable architectures.
  • The same wavelet-shearlet plus conformal pipeline might transfer to other inverse imaging problems such as denoising or inpainting.
  • Linking CHEM scores to downstream task performance could yield practical uncertainty maps for scientific image analysis pipelines.

Load-bearing premise

Hallucinated artifacts remain distinguishable from true signal features once projected into wavelet and shearlet bases, allowing conformal quantile regression to isolate them without being confounded by model biases or dataset artifacts.

What would settle it

In experiments on images with synthetically inserted known hallucinations, CHEM fails to assign high scores to the modified regions or its scores do not correlate with the size of the introduced artifacts.

Figures

Figures reproduced from arXiv: 2512.09806 by Gitta Kutyniok, Ines Rosellon-Inclan, Jean-Luc Starck, Jianfei Li.

Figure 1
Figure 1. Figure 1: An example of hallucinations in astronomical image [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: U-shaped network architectures. The foundational com [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quantifying hallucinations of a U-Net trained with [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Quantifying hallucinations of U-shaped networks trained with different loss functions using db8. The predicted images are [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: MSE/CHEM-FWHM curves under different dictionaries. This figure illustrates the effect of the chosen representation. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Image deconvolution task on the CANDELS dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MSE/CHEM-FWHM curves. We analyze the changes in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of the db8-based CHEM and training loss over [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Point spread functions (PSFs) with varying FWHM val [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example: Tikhonet with a U-shaped denoising module. For SUNet, blue filled rectangles indicate Swin-Transformer blocks, [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Reproduction of the anlysis in Figure 4, now including coarse-scale coefficients. Incorporating all coefficients produces broader, [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Deep learning-based methods have recently achieved significant success in image reconstruction problems. However, challenges have emerged, as these methods may generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a framework for quantifying and characterizing hallucinated artifacts in image reconstruction models. The proposed method, termed the Conformal Hallucination Estimation Metric (CHEM), enables the identification of hallucination-prone regions in model predictions. It leverages wavelet and shearlet representations to localize such regions at the level of image features, and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner. A theoretical analysis is provided, characterizing the sensitivity of CHEM to hallucinated artifacts and its relationship to the mean squared error. Building on these insights and adopting a viewpoint grounded in approximation theory, we investigate why U-shaped networks, widely used architectures for image reconstruction, tend to hallucination-prone predictions. We assess the effectiveness of the proposed approach on astronomical image deconvolution using the CANDELS dataset with architectures such as U-Net, SwinUNet, and Learnlets, and on natural image super-resolution using the DIV2K dataset with models such as DRUNet, Unfolded DRS, RAM, and DPS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Conformal Hallucination Estimation Metric (CHEM) for quantifying and localizing hallucinations in deep learning models for image reconstruction. CHEM combines wavelet and shearlet representations to identify hallucination-prone regions at the feature level with conformalized quantile regression to provide distribution-free hallucination level assessment. It includes a theoretical analysis characterizing CHEM's sensitivity to hallucinations and its relation to mean squared error, plus an approximation-theory explanation for why U-shaped networks tend to produce hallucination-prone outputs. The approach is evaluated on astronomical image deconvolution using the CANDELS dataset with U-Net, SwinUNet, and Learnlets, and on natural image super-resolution using DIV2K with DRUNet, Unfolded DRS, RAM, and DPS.

Significance. If the central claims hold, CHEM offers a principled, distribution-free tool for detecting and understanding hallucinations in image processing models, which is valuable for safety-critical domains such as astronomy. The combination of standard multiscale bases with conformal quantile regression provides a concrete way to localize issues without strong parametric assumptions, and the theoretical links to MSE and approximation theory supply useful insight into architectural tendencies. The dual-domain evaluation (astronomical and natural images) strengthens the empirical grounding.

major comments (3)
  1. [Theoretical analysis] The abstract states a theoretical analysis relating CHEM to mean squared error and an approximation-theory explanation for U-Net hallucinations, but without the full derivations or explicit error bounds it is impossible to verify whether the central claims hold.
  2. [Method and sensitivity analysis] The central claim requires that hallucinated artifacts produce reliably separable signatures in wavelet and shearlet coefficients so that conformal quantile regression can isolate them; the sensitivity analysis links CHEM to MSE but does not derive or empirically test a separation margin against ground-truth artifacts.
  3. [Experiments] Post-hoc dataset choices on CANDELS appear in the experimental setup; this undermines the cross-dataset robustness claim for hallucination identification.
minor comments (2)
  1. [§3] Clarify the precise definition of the conformal quantile regression thresholds and how they are computed from the calibration set.
  2. [Discussion] Add a short discussion of computational overhead for the wavelet/shearlet transforms and conformal step relative to baseline inference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below, providing clarifications and indicating planned revisions where appropriate to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Theoretical analysis] The abstract states a theoretical analysis relating CHEM to mean squared error and an approximation-theory explanation for U-Net hallucinations, but without the full derivations or explicit error bounds it is impossible to verify whether the central claims hold.

    Authors: We thank the referee for highlighting this point. The relation between CHEM and MSE is derived in Section 4 using the properties of conformal quantile regression applied to the multiscale coefficients, and the approximation-theory argument for U-shaped networks is developed from the perspective of how such architectures approximate high-frequency components. To make verification straightforward, we will insert the key derivation steps and explicit error bounds into the main text of the revised manuscript (with full proofs remaining in the appendix). revision: yes

  2. Referee: [Method and sensitivity analysis] The central claim requires that hallucinated artifacts produce reliably separable signatures in wavelet and shearlet coefficients so that conformal quantile regression can isolate them; the sensitivity analysis links CHEM to MSE but does not derive or empirically test a separation margin against ground-truth artifacts.

    Authors: The referee correctly notes that separability in the coefficient domain is central to the method. The existing sensitivity analysis shows how CHEM increases under perturbations that mimic hallucinations and connects this increase to MSE. We will strengthen the revision by adding both a theoretical derivation of a separation margin in the wavelet/shearlet domain and an empirical evaluation on synthetic ground-truth artifacts with controlled hallucination locations. revision: yes

  3. Referee: [Experiments] Post-hoc dataset choices on CANDELS appear in the experimental setup; this undermines the cross-dataset robustness claim for hallucination identification.

    Authors: We respectfully disagree with the characterization of post-hoc selection. The CANDELS dataset was selected a priori as a standard benchmark for astronomical deconvolution tasks, with the full experimental protocol (including model architectures, training procedures, and evaluation metrics) fixed before any results were obtained. The dual evaluation on CANDELS and DIV2K was designed from the outset to demonstrate applicability across domains. To improve clarity, we will expand the experimental section with an explicit statement of the pre-specified dataset rationale and protocol. revision: partial

Circularity Check

0 steps flagged

No significant circularity in CHEM derivation chain

full rationale

The paper defines CHEM by combining standard wavelet and shearlet bases for feature localization with conformalized quantile regression for distribution-free hallucination assessment. The theoretical sensitivity analysis relates CHEM to MSE as a characterization rather than deriving the metric itself from fitted parameters or self-referential inputs. Empirical evaluations on CANDELS and DIV2K datasets with multiple architectures provide external validation. No load-bearing steps reduce predictions to inputs by construction, and no self-citation chains or ansatzes are invoked to force uniqueness. The derivation remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that hallucinations manifest as detectable deviations in wavelet/shearlet coefficients and that conformal quantile regression can be applied directly to these coefficients without additional modeling assumptions. No free parameters are explicitly named in the abstract. No new entities are postulated.

axioms (2)
  • domain assumption Hallucinated artifacts are sufficiently localized and distinguishable in wavelet and shearlet representations.
    Invoked when the method uses these bases to localize hallucination-prone regions.
  • standard math Conformalized quantile regression yields valid coverage for hallucination scores without distributional assumptions.
    Standard property of conformal prediction used to claim distribution-free assessment.

pith-pipeline@v0.9.0 · 5759 in / 1421 out tokens · 40991 ms · 2026-05-21T17:34:44.147539+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods

    stat.ML 2026-05 unverdicted novelty 7.0

    Hallucinations in inverse problem reconstructions are fundamental to ill-posedness, with necessary and sufficient conditions plus computable bounds depending only on the forward model.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Deep learning-based galaxy image deconvolution.Frontiers in Astronomy and Space Sciences, 9, 2022

    Utsav Akhaury, Jean-Luc Starck, Pascale Jablonka, Fr ´ed´eric Courbin, and Kevin Michalewicz. Deep learning-based galaxy image deconvolution.Frontiers in Astronomy and Space Sciences, 9, 2022. Publisher: Frontiers

  2. [2]

    Akhaury, P

    U. Akhaury, P. Jablonka, J.-L. Starck, and F. Courbin. Ground-based image deconvolution with Swin Transformer UNet.Astronomy & Astrophysics, 688:A6, 2024

  3. [3]

    Image-to-image regres- sion with distribution-free uncertainty quantification and ap- plications in imaging

    Anastasios N Angelopoulos, Amit Pal Kohli, Stephen Bates, Michael Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, and Yaniv Romano. Image-to-image regres- sion with distribution-free uncertainty quantification and ap- plications in imaging. InInternational Conference on Ma- chine Learning, pages 717–730. PMLR, 2022

  4. [4]

    John Wiley & Sons, 2013

    Harrison H Barrett and Kyle J Myers.Foundations of Image Science. John Wiley & Sons, 2013

  5. [5]

    Nearly-tight VC-dimension and pseudodimen- sion bounds for piecewise linear neural networks.Journal of Machine Learning Research, 20(63):1–17, 2019

    Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight VC-dimension and pseudodimen- sion bounds for piecewise linear neural networks.Journal of Machine Learning Research, 20(63):1–17, 2019

  6. [6]

    Principal Uncertainty Quantifi- cation with Spatial Correlation for Image Restoration Prob- lems, 2024

    Omer Belhasin, Yaniv Romano, Daniel Freedman, Ehud Rivlin, and Michael Elad. Principal Uncertainty Quantifi- cation with Spatial Correlation for Image Restoration Prob- lems, 2024. arXiv:2305.10124 [cs]

  7. [7]

    Chinmay Belthangady and Loic A. Royer. Applications, promises, and pitfalls of deep learning for fluorescence im- age reconstruction.Nature Methods, 16(12):1215–1225, 2019

  8. [8]

    Kelkar, Frank J

    Sayantan Bhadra, Varun A. Kelkar, Frank J. Brooks, and Mark A. Anastasio. On hallucinations in tomographic im- age reconstruction.IEEE Transactions on Medical Imaging, 40(11):3249–3260, 2021

  9. [9]

    Swin-Unet: Unet-like pure transformer for medical image segmentation

    Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xi- aopeng Zhang, Qi Tian, and Manning Wang. Swin-Unet: Unet-like pure transformer for medical image segmentation. InComputer Vision – ECCV 2022 Workshops, pages 205– 218, 2023

  10. [10]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou. TransUNet: Transformers make strong encoders for medical image segmentation, 2021. arXiv:2102.04306 [cs]

  11. [11]

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018

  12. [12]

    Regev Cohen, Idan Kligvasser, Ehud Rivlin, and Daniel Freedman. Looks too good to be true: An information- theoretic analysis of hallucinations in generative restoration models.Advances in Neural Information Processing Sys- tems, 37:22596–22623, 2024

  13. [13]

    On the relationship between self-attention and convo- lutional layers

    Jean Baptiste Cordonnier, Andreas Loukas, and Martin Jaggi. On the relationship between self-attention and convo- lutional layers. In8th International Conference on Learning Representations, ICLR 2020, 2020

  14. [14]

    Society for Industrial and Applied Mathematics, 1992

    Ingrid Daubechies.Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992

  15. [15]

    Courier Corporation, 1975

    Philip J Davis.Interpolation and approximation. Courier Corporation, 1975

  16. [16]

    SUNet: Swin transformer UNet for image denoising

    Chi-Mao Fan, Tsung-Jung Liu, and Kuan-Hsien Liu. SUNet: Swin transformer UNet for image denoising. In2022 IEEE International Symposium on Circuits and Systems (ISCAS), pages 2333–2337. IEEE, 2022

  17. [17]

    Gottschling, Vegard Antun, Anders C

    Nina M. Gottschling, Vegard Antun, Anders C. Hansen, and Ben Adcock. The troublesome kernel: On hallucinations, no free lunches, and the accuracy-stability tradeoff in inverse problems.SIAM Review, 67(1):73–104, 2025. Publisher: Society for Industrial and Applied Mathematics. 9

  18. [18]

    Grogin, Dale D

    Norman A. Grogin, Dale D. Kocevski, S. M. Faber, Henry C. Ferguson, Anton M. Koekemoer, Adam G. Riess, Viviana Acquaviva, David M. Alexander, Omar Almaini, Matthew L. N. Ashby, Marco Barden, Eric F. Bell, Fr ´ed´eric Bour- naud, Thomas M. Brown, Karina I. Caputi, Stefano Caser- tano, Paolo Cassata, Marco Castellano, Peter Challis, Ranga- Ram Chary, Edmond...

  19. [19]

    Error bounds for approximations with deep relu neural networks inw s,p norms.Analysis and Applications, 18(05):803–859, 2020

    Ingo G ¨uhring, Gitta Kutyniok, and Philipp Petersen. Error bounds for approximations with deep relu neural networks inw s,p norms.Analysis and Applications, 18(05):803–859, 2020

  20. [20]

    Sparse multidimensional representations using anisotropic dilation and shear operators.Wavelets and Splines, 14:189–201, 2006

    Kanghui Guo, Gitta Kutyniok, and Demetrio Labate. Sparse multidimensional representations using anisotropic dilation and shear operators.Wavelets and Splines, 14:189–201, 2006

  21. [21]

    On the rate of convergence of a classifier based on a transformer encoder.IEEE Transactions on Information Theory, 68(12): 8139–8155, 2022

    Iryna Gurevych, Michael Kohler, and G ¨ozde G¨ul S ¸ahin. On the rate of convergence of a classifier based on a transformer encoder.IEEE Transactions on Information Theory, 68(12): 8139–8155, 2022

  22. [22]

    Framing U-Net via deep con- volutional framelets: Application to sparse-view CT.IEEE Transactions on Medical Imaging, 37(6):1418–1429, 2018

    Yoseob Han and Jong Chul Ye. Framing U-Net via deep con- volutional framelets: Application to sparse-view CT.IEEE Transactions on Medical Imaging, 37(6):1418–1429, 2018

  23. [23]

    Advancing trans- former architecture in long-context large language models: A comprehensive survey, 2024

    Yunpeng Huang, Jingwei Xu, Junyu Lai, Zixu Jiang, Taolue Chen, Zenan Li, Yuan Yao, Xiaoxing Ma, Lijuan Yang, Hao Chen, Shupeng Li, and Penghao Zhao. Advancing trans- former architecture in long-context large language models: A comprehensive survey, 2024. arXiv:2311.12351 [cs]

  24. [24]

    Jaeger, Simon A

    Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self- configuring method for deep learning-based biomedical im- age segmentation.Nature Methods, 18(2):203–211, 2021

  25. [25]

    Estimating the hallucination rate of genera- tive AI.Advances in Neural Information Processing Systems, 37:31154–31201, 2024

    Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P Cunningham, and David Blei. Estimating the hallucination rate of genera- tive AI.Advances in Neural Information Processing Systems, 37:31154–31201, 2024

  26. [26]

    Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12):1–38, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12):1–38, 2023

  27. [27]

    CANDELS: The Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey - The Hubble Space Telescope Observations, Imaging Data Products and Mosaics

    Anton M. Koekemoer, S. M. Faber, Henry C. Ferguson, Nor- man A. Grogin, Dale D. Kocevski, David C. Koo, Kamson Lai, Jennifer M. Lotz, Ray A. Lucas, Elizabeth J. McGrath, Sara Ogaz, Abhijith Rajan, Adam G. Riess, Steve A. Rodney, Louis Strolger, Stefano Casertano, Marco Castellano, Tomas Dahlen, Mark Dickinson, Timothy Dolch, Adriano Fontana, Mauro Giavali...

  28. [28]

    Explaining image classifiers with multiscale directional image representation

    Stefan Kolek, Robert Windesheim, Hector Andrade-Loarca, Gitta Kutyniok, and Ron Levie. Explaining image classifiers with multiscale directional image representation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18600–18609, 2023. 10

  29. [29]

    Advances in deep learning for medical image analysis: A comprehen- sive investigation.Journal of Statistical Theory and Practice, 19(1):9, 2025

    Rajeev Ranjan Kumar, S Vishnu Shankar, Ronit Jaiswal, Mrinmoy Ray, Neeraj Budhlakoti, and KN Singh. Advances in deep learning for medical image analysis: A comprehen- sive investigation.Journal of Statistical Theory and Practice, 19(1):9, 2025

  30. [30]

    Conformal prediction masks: Visualiz- ing uncertainty in medical imaging

    Gilad Kutiel, Regev Cohen, Michael Elad, Daniel Freedman, and Ehud Rivlin. Conformal prediction masks: Visualiz- ing uncertainty in medical imaging. InTrustworthy Machine Learning for Healthcare, pages 163–176, 2023

  31. [31]

    Springer Science & Business Media, 2012

    Gitta Kutyniok and Demetrio Labate.Shearlets: Multiscale analysis for multivariate data. Springer Science & Business Media, 2012

  32. [32]

    Shearlab 3D: Faithful digital shearlet transforms based on compactly supported shearlets.ACM Transactions on Math- ematical Software (TOMS), 42(1):1–42, 2016

    Gitta Kutyniok, Wang-Q Lim, and Rafael Reisenhofer. Shearlab 3D: Faithful digital shearlet transforms based on compactly supported shearlets.ACM Transactions on Math- ematical Software (TOMS), 42(1):1–42, 2016

  33. [33]

    Distribution- free uncertainty quantification for inverse problems: Appli- cation to weak lensing mass mapping.Astronomy & Astro- physics, 694:A267, 2025

    Hubert Leterme, Jalal Fadili, and J-L Starck. Distribution- free uncertainty quantification for inverse problems: Appli- cation to weak lensing mass mapping.Astronomy & Astro- physics, 694:A267, 2025

  34. [34]

    Jianfei Li, Han Feng, and Xiaosheng Zhuang. Convolutional neural networks for spherical signal processing via area- regular spherical haar tight framelets.IEEE Transactions on Neural Networks and Learning Systems, 35(4):4400–4410, 2022

  35. [35]

    Approximation analysis of CNNs from a feature extraction view.Analysis and Applications, 22(03):635–654, 2024

    Jianfei Li, Han Feng, and Ding-Xuan Zhou. Approximation analysis of CNNs from a feature extraction view.Analysis and Applications, 22(03):635–654, 2024

  36. [36]

    A survey on hallucination in large vision-language models,

    Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiu- tian Zhao, Ke Wang, Liping Hou, Rongjun Li, and Wei Peng. A survey on hallucination in large vision-language models,

  37. [37]

    arXiv:2402.00253 [cs]

  38. [38]

    Multi-level wavelet-CNN for image restoration

    Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo. Multi-level wavelet-CNN for image restoration. InProceedings of the IEEE conference on com- puter vision and pattern recognition workshops, pages 773– 782, 2018

  39. [39]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 9992–10002, 2021

  40. [40]

    U-Mamba: Enhancing long-range dependency for biomedical image segmentation,

    Jun Ma, Feifei Li, and Bo Wang. U-Mamba: Enhancing long-range dependency for biomedical image segmentation,

  41. [41]

    arXiv:2401.04722 [eess]

  42. [42]

    El- sevier, 1999

    St ´ephane Mallat.A Wavelet Tour of Signal Processing. El- sevier, 1999

  43. [43]

    A review of deep learn- ing techniques for speech processing.Information Fusion, 99:101869, 2023

    Ambuj Mehrish, Navonil Majumder, Rishabh Bharadwaj, Rada Mihalcea, and Soujanya Poria. A review of deep learn- ing techniques for speech processing.Information Fusion, 99:101869, 2023

  44. [44]

    U-Nets as belief propagation: Efficient classi- fication, denoising, and diffusion in generative hierarchical models, 2024

    Song Mei. U-Nets as belief propagation: Efficient classi- fication, denoising, and diffusion in generative hierarchical models, 2024. arXiv:2404.18444 [cs]

  45. [45]

    Neural networks for functional approximation and system identifi- cation.Neural Computation, 9(1):143–159, 1997

    Hrushikesh Narhar Mhaskar and Nahmwoo Hahm. Neural networks for functional approximation and system identifi- cation.Neural Computation, 9(1):143–159, 1997

  46. [46]

    Matthew J. Muckley, Bruno Riemenschneider, Alireza Radmanesh, Sunwoo Kim, Geunu Jeong, Jingyu Ko, Yohan Jun, Hyungseob Shin, Dosik Hwang, Mahmoud Mostapha, Simon Arberet, Dominik Nickel, Zaccharie Ramzi, Philippe Ciuciu, Jean-Luc Starck, Jonas Teuwen, Dimitrios Karkalousos, Chaoping Zhang, Anuroop Sriram, Zhengnan Huang, Nafissa Yakubova, Yvonne W. Lui, a...

  47. [47]

    Attention U-Net: Learning Where to Look for the Pancreas

    Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y . Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention U-Net: Learning where to look for the pancreas, 2018. arXiv:1804.03999 [cs]

  48. [48]

    U2-Net: Go- ing deeper with nested U-structure for salient object detec- tion.Pattern recognition, 106:107404, 2020

    Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood De- hghan, Osmar R Zaiane, and Martin Jagersand. U2-Net: Go- ing deeper with nested U-structure for salient object detec- tion.Pattern recognition, 106:107404, 2020

  49. [49]

    Wavelets in the deep learning era.Journal of Mathematical Imaging and Vision, 65(1):240–251, 2023

    Zaccharie Ramzi, Kevin Michalewicz, Jean-Luc Starck, Thomas Moreau, and Philippe Ciuciu. Wavelets in the deep learning era.Journal of Mathematical Imaging and Vision, 65(1):240–251, 2023

  50. [50]

    Conformalized quantile regression.Advances in Neural In- formation Processing Systems, 32, 2019

    Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression.Advances in Neural In- formation Processing Systems, 32, 2019

  51. [51]

    U- Net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention, pages 234–241. Springer, 2015

  52. [52]

    A comprehensive sur- vey of hallucination in large language, image, video and au- dio foundation models, 2024

    Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, and Aman Chadha. A comprehensive sur- vey of hallucination in large language, image, video and au- dio foundation models, 2024. arXiv:2405.09589 [cs]

  53. [53]

    Angelopoulos, Stephen Bates, Yaniv Romano, and Phillip Isola

    Swami Sankaranarayanan, Anastasios N. Angelopoulos, Stephen Bates, Yaniv Romano, and Phillip Isola. Seman- tic uncertainty intervals for disentangled latent spaces, 2022. arXiv:2207.10074 [cs]

  54. [54]

    SIAM Journal on Numerical Analysis, 6(2):161–183, 1969

    Martin H Schultz.L ∞-multivariate approximation theory. SIAM Journal on Numerical Analysis, 6(2):161–183, 1969

  55. [55]

    Optimal approximation rate of ReLU networks in terms of width and depth.Journal de Math ´ematiques Pures et Appliqu´ees, 157: 101–135, 2022

    Zuowei Shen, Haizhao Yang, and Shijun Zhang. Optimal approximation rate of ReLU networks in terms of width and depth.Journal de Math ´ematiques Pures et Appliqu´ees, 157: 101–135, 2022

  56. [56]

    Starck and F

    J.-L. Starck and F. Murtagh.Astronomical Image and Data Analysis. Springer, 2006

  57. [57]

    Deep learning for a space-variant deconvolution in galaxy surveys.Astron- omy & Astrophysics, 641:A67, 2020

    Florent Sureau, Alexis Lechat, and J-L Starck. Deep learning for a space-variant deconvolution in galaxy surveys.Astron- omy & Astrophysics, 641:A67, 2020

  58. [58]

    A mathematical explanation of UNet.Mathe- matical Foundations of Computing, 8(5):874–889, 2025

    Xue-Cheng TAI, Hao LIU, Raymond H CHAN, and Lingfeng LI. A mathematical explanation of UNet.Mathe- matical Foundations of Computing, 8(5):874–889, 2025

  59. [59]

    Conformal risk control for semantic uncertainty quantifica- tion in computed tomography

    Jacopo Teneggi, J Webster Stayman, and Jeremias Sulam. Conformal risk control for semantic uncertainty quantifica- tion in computed tomography. InInternational Conference on Medical Image Computing and Computer-Assisted Inter- vention, pages 45–55, 2025. 11

  60. [60]

    Hallucination index: An im- age quality metric for generative reconstruction models

    Matthew Tivnan, Siyeop Yoon, Zhennong Chen, Xiang Li, Dufan Wu, and Quanzheng Li. Hallucination index: An im- age quality metric for generative reconstruction models. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 449–458. Springer, 2024

  61. [61]

    A unified framework for U-Net design and analysis.Advances in Neural Information Processing Systems, 36:27745–27782, 2023

    Christopher Williams, Fabian Falck, George Deligiannidis, Chris C Holmes, Arnaud Doucet, and Saifuddin Syed. A unified framework for U-Net design and analysis.Advances in Neural Information Processing Systems, 36:27745–27782, 2023

  62. [62]

    On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks.Applied and Computational Harmonic Analysis, page 101797, 2025

    Yunfei Yang. On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks.Applied and Computational Harmonic Analysis, page 101797, 2025

  63. [63]

    Siren’s song in the ai ocean: A survey on hallucination in large language models.Computational Linguistics, pages 1–46, 2025

    Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the ai ocean: A survey on hallucination in large language models.Computational Linguistics, pages 1–46, 2025

  64. [64]

    Road extraction by deep residual U-Net.IEEE Geoscience and Remote Sensing Letters, 15(5):749–753, 2018

    Zhengxin Zhang, Qingjie Liu, and Yunhong Wang. Road extraction by deep residual U-Net.IEEE Geoscience and Remote Sensing Letters, 15(5):749–753, 2018

  65. [65]

    Unet++: A nested U-Net architecture for medical image segmentation

    Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: A nested U-Net architecture for medical image segmentation. InInterna- tional Workshop on Deep Learning in Medical Image Anal- ysis, pages 3–11, 2018. Appendix The appendix provides additional explanations regarding the experimental setup and the theoretical results. I...

  66. [66]

    During the first three epochs, the learning rate grad- ually increases from zero to the initial value of2×10 −4 (warm-up phase)

    The learning rate is adjusted using a two-stage scheduling strategy that combines a linear warm-up and cosine anneal- ing. During the first three epochs, the learning rate grad- ually increases from zero to the initial value of2×10 −4 (warm-up phase). Subsequently, a cosine annealing sched- uler progressively reduces the learning rate to a minimum value o...

  67. [67]

    However,Krep- resent the overall number of parameters in U-shaped net- works

    If this tensor has more thanKchannels, the convolu- tion layer to produce this tensor has at leastKconvolu- tion kernels which have2Kparameters. However,Krep- resent the overall number of parameters in U-shaped net- works. This presents a contradiction, suggesting that the output dimension of each layer does not exceedmax{t, K}. Consequently, the fully co...