pith. machine review for the scientific record. sign in

arxiv: 2604.25408 · v1 · submitted 2026-04-28 · 💻 cs.CV

Recognition: unknown

Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing

Chang Wen Chen, Runjie Wang, Tiesong Zhao, Weiling Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:59 UTC · model grok-4.3

classification 💻 cs.CV
keywords semantic similaritylow-level image processingimage quality assessmentsemantic entitiesforeground-background disentanglementdeep learningevaluation metricsimage semantics
0
0 comments X

The pith

Low-level image processing needs evaluation for semantic preservation, not just visual fidelity, via a new triplet-based score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that conventional image quality assessment falls short when processed images retain looks but alter meaning, as often happens with deep learning models. It defines semantic similarity as the task of checking if semantic content survives processing and structures image semantics around entities and their relations. The authors introduce the Triplet-based Semantic Similarity Score, or T3S, built on extracting foreground and background entities plus modeling relations in an open setting. Tests on COCO and SPA datasets indicate T3S tracks semantic shifts better than fidelity metrics or other semantic baselines during various degradations. This matters for ensuring algorithms do not unintentionally change the intended content of images.

Core claim

We formalize Semantic Similarity as a new evaluation task for low-level image processing, aimed at measuring whether semantic content is preserved after processing. We present a structured formulation of image semantics based on semantic entities and their relations, and discuss the desired properties and constraints of a valid semantic similarity index. Based on this, we propose T3S, which models image semantics through foreground entities, background entities, and relations by combining semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling.

What carries the argument

Triplet-based Semantic Similarity Score (T3S) that assesses semantic preservation by extracting foreground entities, background entities, and their relations.

If this is right

  • T3S better reflects progressive semantic changes under diverse degradations compared to existing metrics.
  • It consistently outperforms fidelity-oriented metrics and semantic-level baselines in experiments on COCO and SPA-Data.
  • Semantic assessment becomes important for evaluating modern low-level vision methods that use generative models.
  • Valid semantic similarity indices must satisfy specific properties and constraints derived from the entity-relation formulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could lead to new loss functions in training that penalize semantic drift explicitly.
  • Similar triplet modeling might apply to assessing semantic consistency in video processing or multimodal data.
  • Integration with existing IQA tools could create hybrid metrics balancing fidelity and semantics.

Load-bearing premise

Reliable extraction of semantic entities, foreground-background separation, and open-world class/relation modeling can be done without errors that invalidate the similarity score.

What would settle it

Finding image pairs where T3S gives a high score but human observers see major semantic differences, or where it misses clear semantic loss under degradation.

Figures

Figures reproduced from arXiv: 2604.25408 by Chang Wen Chen, Runjie Wang, Tiesong Zhao, Weiling Chen.

Figure 1
Figure 1. Figure 1: How to evaluate Semantic Similarity? Existing IQA view at source ↗
Figure 2
Figure 2. Figure 2: General formulation of semantic similarity based view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed T3S framework. Given an input image pair, SAM first extracts SEs and decouples them into view at source ↗
Figure 4
Figure 4. Figure 4: The FBD module performs feature disentanglement view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of score curves of different methods view at source ↗
Figure 7
Figure 7. Figure 7: This figure summarizes our evaluation protocol under different levels of semantic change. To assess the proposed view at source ↗
Figure 8
Figure 8. Figure 8: Visual examples of the 20 image degradations considered in this work. Different degradations affect semantic entities, view at source ↗
Figure 9
Figure 9. Figure 9: Performance curves of different metrics on the remaining 16 degradation types in SPA-Data over five severity levels. view at source ↗
Figure 10
Figure 10. Figure 10: Performance curves of different metrics on the remaining 16 degradation types in COCO over five severity levels. view at source ↗
Figure 11
Figure 11. Figure 11: Case study on entity-level semantic changes. Each group contains three comparison levels: Level 1 applies strong view at source ↗
Figure 12
Figure 12. Figure 12: Examples of image pairs with relation-level semantic changes. In each pair, the main semantic entities are largely view at source ↗
read the original abstract

Low-level image processing has long been evaluated mainly from the perspective of visual fidelity. However, with the rise of deep learning and generative models, processed images may preserve perceptual quality while altering semantic content, making conventional Image Quality Assessment (IQA) insufficient for semantic-level assessment. In this paper, we formalize \textit{Semantic Similarity} as a new evaluation task for low-level image processing, aimed at measuring whether semantic content is preserved after processing. We further present a structured formulation of image semantics based on semantic entities and their relations, and discuss the desired properties and constraints of a valid semantic similarity index. Based on this formulation, we propose Triplet-based Semantic Similarity Score (T3S), which models image semantics through foreground entities, background entities, and relations. T3S combines semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling. Experiments on COCO and SPA-Data show that T3S consistently outperforms existing fidelity-oriented metrics and representative semantic-level baselines, while better reflecting progressive semantic changes under diverse degradations. These results highlight the importance of semantic assessment in modern low-level vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper formalizes Semantic Similarity as a new evaluation task for low-level image processing to measure semantic content preservation (distinct from visual fidelity), presents a structured formulation of image semantics via entities and relations, proposes the Triplet-based Semantic Similarity Score (T3S) that integrates semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling, and reports that experiments on COCO and SPA-Data show T3S outperforming fidelity-oriented metrics and semantic baselines while better capturing progressive semantic changes under degradations.

Significance. If the robustness concerns are addressed, T3S could fill an important gap in evaluating modern low-level vision and generative models where perceptual quality is maintained but semantics are altered, providing a concrete metric grounded in entity-relation semantics that aligns better with application needs than traditional IQA.

major comments (2)
  1. [Abstract and Experiments] Abstract and Experiments section: the claim that T3S 'consistently outperforms' existing metrics and 'better reflecting progressive semantic changes' is presented without any details on the exact T3S computation formula, statistical tests, variance across runs, or controls for errors introduced by the (presumably pre-trained) entity/relation extraction model on degraded inputs.
  2. [T3S formulation] T3S formulation and desired properties discussion: the listed properties and constraints for a valid semantic similarity index omit any quantitative bound on extraction error tolerance. Since T3S scores are computed from foreground/background entities and relations extracted from the same degraded images used to demonstrate progressive semantic change, the outperformance result is load-bearing on the untested assumption that the extraction pipeline remains accurate as degradation strength increases.
minor comments (1)
  1. [Abstract] The abstract introduces T3S without immediately expanding the acronym, which appears later in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater transparency in our presentation of T3S and for raising important robustness considerations. We address each major comment below and have revised the manuscript to incorporate additional details, formulas, statistical analyses, and empirical robustness checks.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract and Experiments section: the claim that T3S 'consistently outperforms' existing metrics and 'better reflecting progressive semantic changes' is presented without any details on the exact T3S computation formula, statistical tests, variance across runs, or controls for errors introduced by the (presumably pre-trained) entity/relation extraction model on degraded inputs.

    Authors: We agree that the abstract and experiments would benefit from more explicit details. The T3S formula is defined in Section 3.3 as a weighted combination of foreground entity similarity, background entity similarity, and relation similarity (Equation 3), using open-world embeddings from a pre-trained model. To address the concern, we have expanded the abstract to include a concise description of this formulation and revised the experiments section to report paired statistical significance tests (Wilcoxon signed-rank), standard deviations over three independent runs of the extraction pipeline, and a dedicated control experiment measuring entity/relation extraction precision/recall on progressively degraded images from COCO and SPA-Data. revision: yes

  2. Referee: [T3S formulation] T3S formulation and desired properties discussion: the listed properties and constraints for a valid semantic similarity index omit any quantitative bound on extraction error tolerance. Since T3S scores are computed from foreground/background entities and relations extracted from the same degraded images used to demonstrate progressive semantic change, the outperformance result is load-bearing on the untested assumption that the extraction pipeline remains accurate as degradation strength increases.

    Authors: The properties in Section 3.2 are formulated at the semantic level and intentionally abstract from implementation-specific extraction errors. We acknowledge that no quantitative error-tolerance bound was previously provided. In the revision we add an analysis (new Figure 7 and Appendix C) that reports extraction accuracy versus degradation strength and shows that T3S remains superior to baselines even after injecting the observed extraction error rates; we also derive a simple first-order sensitivity bound relating extraction error to T3S deviation. These additions directly test and bound the assumption underlying the progressive-change experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in T3S derivation chain

full rationale

The paper introduces T3S as a new construction that combines semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling to assess semantic similarity. No equations, definitions, or experimental steps in the manuscript reduce the proposed score to a fitted parameter, self-defined quantity, or load-bearing self-citation from the authors' prior work. The formulation starts from an external structured semantics model and relies on pre-trained extractors whose outputs are treated as independent inputs rather than derived from the current paper's data or claims. Experimental validation on COCO and SPA-Data therefore measures an independent metric rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full formulation details unavailable. The central claim rests on the assumption that image semantics decompose cleanly into extractable entities and relations.

axioms (1)
  • domain assumption Image semantics can be decomposed into foreground entities, background entities, and relations between them.
    This decomposition is the foundation for the T3S formulation stated in the abstract.
invented entities (1)
  • T3S score no independent evidence
    purpose: Quantify semantic similarity via entity-relation triplets
    Newly proposed metric combining extraction, disentanglement, and open-world modeling.

pith-pipeline@v0.9.0 · 5497 in / 1252 out tokens · 38315 ms · 2026-05-07T16:59:12.955800+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Sebastian Bosse, Dominique Maniry, Klaus-Robert Müller, Thomas Wiegand, and Wojciech Samek. 2018. Deep Neural Networks for No-Reference and Full- Reference Image Quality Assessment.IEEE Transactions on Image Processing27, 1 (2018), 206–219. doi:10.1109/TIP.2017.2760518

  2. [2]

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jegou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging Properties in Self-Supervised Vision Transformers. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9630–9640. doi:10.1109/ICCV48922.2021.00951

  3. [3]

    Baoliang Chen, Hanwei Zhu, Lingyu Zhu, Shanshe Wang, Jingshan Pan, and Shiqi Wang. 2025. Debiased Mapping for Full-Reference Image Quality Assessment. IEEE Transactions on Multimedia27 (2025), 2638–2649. doi:10.1109/TMM.2025. 3535280

  4. [4]

    Weiling Chen, Honggang Liao, Rongfu Lin, Tiesong Zhao, Ke Gu, and Patrick Le Callet. 2025. Utility-Centered Underwater Image Quality Evaluation.IEEE Jour- nal of Oceanic Engineering50, 2 (2025), 743–757. doi:10.1109/JOE.2024.3498273

  5. [5]

    Weiling Chen, Weitao Lin, Xiaoyi Xu, Liqun Lin, and Tiesong Zhao. 2024. Face Super-Resolution Quality Assessment Based on Identity and Recognizability. IEEE Transactions on Biometrics, Behavior, and Identity Science6, 3 (2024), 364–373. doi:10.1109/TBIOM.2024.3389982

  6. [6]

    Weiling Chen, Ranwen Zhuang, Weitao Lin, Keke Zhang, Xuejin Wang, and Tiesong Zhao. 2026. Machine Vision-Oriented Image Quality Evaluation for Face Super-Resolution.IEEE Transactions on Biometrics, Behavior, and Identity Science (2026), 1–1. doi:10.1109/TBIOM.2026.3658611

  7. [7]

    Simoncelli

    Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. 2022. Image Quality Assessment: Unifying Structure and Texture Similarity.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 5 (2022), 2567–2581. doi:10.1109/ TPAMI.2020.3045810

  8. [8]

    Alexey Dosovitskiy et al. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InICLR

  9. [9]

    Minhao Fan, Wenjing Wang, Wenhan Yang, and Jiaying Liu. 2020. Integrating Semantic Segmentation and Retinex Model for Low-Light Image Enhancement. InProceedings of the 28th ACM International Conference on Multimedia(Seattle, WA, USA)(MM ’20). Association for Computing Machinery, New York, NY, USA, 2317–2325. doi:10.1145/3394171.3413757

  10. [10]

    Senran Fan, Zhicheng Bao, Chen Dong, Haotai Liang, Xiaodong Xu, and Ping Zhang. 2025. Semantic Similarity Score for Measuring Visual Similarity at Semantic Level.IEEE Internet of Things Journal12, 9 (2025), 12034–12047. doi:10.1109/JIOT.2024.3518543

  11. [11]

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi

  12. [12]

    Clipscore: A reference-free evaluation metric for image captioning

    CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen- tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 7514–7528. doi:10.18653/v1/20...

  13. [13]

    Jingwen Hou, Henghui Ding, Weisi Lin, Weide Liu, and Yuming Fang. 2022. Distilling Knowledge From Object Classification to Aesthetics Assessment.IEEE Transactions on Circuits and Systems for Video Technology32, 11 (2022), 7386–7402. doi:10.1109/TCSVT.2022.3186307

  14. [14]

    Danlan Huang, Feifei Gao, Xiaoming Tao, Qiyuan Du, and Jianhua Lu. 2023. Toward Semantic Communications: Deep Learning-Based Image Semantic Cod- ing.IEEE Journal on Selected Areas in Communications41, 1 (2023), 55–71. doi:10.1109/JSAC.2022.3221999

  15. [15]

    Yipo Huang, Leida Li, Pengfei Chen, Haoning Wu, Weisi Lin, and Guangming Shi

  16. [16]

    doi:10.1109/TPAMI.2024.3492259

    Multi-Modality Multi-Attribute Contrastive Pre-Training for Image Aes- thetics Computing.IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 2 (2025), 1205–1218. doi:10.1109/TPAMI.2024.3492259

  17. [17]

    Ruixiang Jiang and Chang Wen Chen. 2025. Multimodal LLMs Can Reason about Aesthetics in Zero-Shot(MM ’25). Association for Computing Machinery, New York, NY, USA, 6634–6643. doi:10.1145/3746027.3754961

  18. [18]

    Zhi Jin, Yuwei Qiu, Kaihao Zhang, Hongdong Li, and Wenhan Luo. 2025. MB- TaylorFormer V2: Improved Multi-Branch Linear Transformer Expanded by Taylor Formula for Image Restoration.IEEE Transactions on Pattern Analysis and Machine Intelligence47, 7 (2025), 5990–6005. doi:10.1109/TPAMI.2025.3559891

  19. [19]

    Mingye Ju and Xinyang Yu. 2026. Semantic-Aware Low-Light Image Enhance- ment Network for Recognizing Semantics in Intelligent Transportation Systems. IEEE Transactions on Intelligent Transportation Systems27, 2 (2026), 2683–2694. doi:10.1109/TITS.2025.3540257

  20. [20]

    Jongyoo Kim and Sanghoon Lee. 2017. Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1969–1977. doi:10.1109/CVPR.2017.213

  21. [21]

    Qi Lang, Minjuan Wang, Minghao Yin, Shuang Liang, and Wenzhuo Song. 2025. Transforming Education With Generative AI (GAI): Key Insights and Future Prospects.IEEE Transactions on Learning Technologies18 (2025), 230–242. doi:10. 1109/TLT.2025.3537618

  22. [22]

    Chao Li et al. 2025. Image Quality Assessment From Human to Machine Prefer- ence. InCVPR

  23. [23]

    Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, Weisi Lin, and Guangtao Zhai. 2025. Image Quality Assessment: From Human to Machine Preference. In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7570–7581. doi:10.1109/CVPR52734.2025.00709

  24. [24]

    Leida Li, Xiangfei Sheng, Pengfei Chen, Jinjian Wu, and Weisheng Dong. 2025. Towards Explainable Image Aesthetics Assessment With Attribute-Oriented Cri- tiques Generation.IEEE Transactions on Circuits and Systems for Video Technology 35, 2 (2025), 1464–1477. doi:10.1109/TCSVT.2024.3470870

  25. [25]

    Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xing Kui Wang, Wenjun Zeng, Xinchao Wang, and Zhibo Chen. 2023. Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey.International Journal of Computer Vision133 (2023), 8078 – 8108

  26. [26]

    Dong Liang, Ling Li, Mingqiang Wei, Shuo Yang, Liyan Zhang, Wenhan Yang, Yun Du, and Huiyu Zhou. 2022. Semantically Contrastive Learning for Low-Light Image Enhancement.Proceedings of the AAAI Conference on Artificial Intelligence 36 (06 2022), 1555–1563. doi:10.1609/aaai.v36i2.20046

  27. [27]

    Lawrence Zitnick

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InComputer Vision – ECCV 2014. Springer International Publishing, Cham, 740–755. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

  28. [28]

    Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, and Rynson W.H. Lau. 2024. Diff-Plugin: Revitalizing Details for Diffusion-Based Low-Level Tasks. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4197–

  29. [29]

    doi:10.1109/CVPR52733.2024.00402

  30. [30]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

  31. [31]

    Qi Qi, Kunqian Li, Haiyong Zheng, Xiang Gao, Guojia Hou, and Kun Sun. 2022. SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement With Multi-Scale Perception.Trans. Img. Proc.31 (Jan. 2022), 6816–6830. doi:10.1109/ TIP.2022.3216208

  32. [32]

    David Rouse, Romuald Pepion, Sheila Hemami, and Patrick Le Callet. 2009. Image Utility Assessment and a Relationship with Image Quality Assessment.Proc SPIE 7240 (02 2009). doi:10.1117/12.811664

  33. [33]

    Bingnan Wang, Bin Qin, Jiangmeng Li, Fanjiang Xu, Fuchun Sun, and Hui Xiong. 2026. All-in-One Image Restoration via Causal-Deconfounding Wavelet- Disentangled Prompt Network.IEEE Transactions on Image Processing(2026), 1–1. doi:10.1109/TIP.2026.3675478

  34. [34]

    Di Wang, Long Ma, Risheng Liu, and Xin Fan. 2022. Semantic-aware Texture- Structure Feature Collaboration for Underwater Image Enhancement. In2022 International Conference on Robotics and Automation (ICRA). 4592–4598. doi:10. 1109/ICRA46639.2022.9812457

  35. [35]

    Mo Wang, Minjuan Wang, Xin Xu, Lanqing Yang, Dunbo Cai, and Minghao Yin

  36. [36]

    doi:10.1109/TLT.2023.3324714

    Unleashing ChatGPT’s Power: A Case Study on Optimizing Information Retrieval in Flipped Classrooms via Prompt Engineering.IEEE Transactions on Learning Technologies17 (2024), 629–641. doi:10.1109/TLT.2023.3324714

  37. [37]

    Weizhi Xian, Mingliang Zhou, Bin Fang, Tao Xiang, Weijia Jia, and Bin Chen

  38. [38]

    doi:10.1109/TMM.2023.3293730

    Perceptual Quality Analysis in Deep Domains Using Structure Separation and High-Order Moments.IEEE Transactions on Multimedia26 (2024), 2219–2234. doi:10.1109/TMM.2023.3293730

  39. [39]

    Ce Zhang, Simon Stepputtis, Joseph Campbell, Katia Sycara, and Yaqi Xie. 2024. HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28233–28243. doi:10.1109/CVPR52733.2024.02667

  40. [40]

    Keke Zhang, Weiling Chen, Tiesong Zhao, and Zhou Wang. 2026. Structural Similarity in Deep Features: Unified Image Quality Assessment Robust to Geomet- rically Disparate Reference.IEEE Transactions on Pattern Analysis and Machine Intelligence48, 3 (2026), 2581–2595. doi:10.1109/TPAMI.2025.3627285

  41. [41]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang

  42. [42]

    In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 586–595. doi:10.1109/CVPR.2018.00068

  43. [43]

    Ronghui Zhang, Jiongze Yu, Junzhou Chen, Guofa Li, Liang Lin, and Danwei Wang. 2024. A Prior Guided Wavelet-Spatial Dual Attention Transformer Frame- work for Heavy Rain Image Restoration.IEEE Transactions on Multimedia26 (2024), 7043–7057. doi:10.1109/TMM.2024.3359480

  44. [44]

    Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, and Lefei Zhang

  45. [45]

    Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration.IEEE Transactions on Image Processing35 (2026), 2018–2033. doi:10. 1109/TIP.2025.3566300

  46. [46]

    Changmeng Zheng, Zhiwei Wu, Tao Wang, Yi Cai, and Qing Li. 2021. Object- Aware Multimodal Named Entity Recognition in Social Media Posts With Ad- versarial Learning.IEEE Transactions on Multimedia23 (2021), 2520–2532. doi:10.1109/TMM.2020.3013398

  47. [47]

    Haibin Zhu. 2021. E-CARGO and Role-Based Collaboration. In2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). iii–iii. doi:10.1109/CSCWD49262.2021.9437766

  48. [48]

    Tingting Zhu, Bo Peng, Jifan Liang, Tingchen Han, Hai Wan, Jing Fu, and Jun- jie Chen. 2023. How to Evaluate Semantic Communications for Images With ViTScore Metric?IEEE Transactions on Cognitive Communications and Networking 10 (2023), 1744–1758. https://api.semanticscholar.org/CorpusID:261682367 Beyond Fidelity: Semantic Similarity Assessment in Low-Lev...

  49. [49]

    Specifically, T3S follows a clear monotonic pattern, i.e., Level 1 > Level 2 > Level 3, in every group

    Among the compared methods, only T3S consistently preserves this ordering across all three groups of examples. Specifically, T3S follows a clear monotonic pattern, i.e., Level 1 > Level 2 > Level 3, in every group. This indicates that T3S can correctly recognize that heavy snow mainly affects visual quality rather than semantics, assign an intermediate si...