pith. machine review for the scientific record. sign in

arxiv: 2605.00885 · v1 · submitted 2026-04-27 · 💻 cs.CV

Recognition: unknown

Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion

Qing Xiao, Wuqi Su, Yingming Zhang, Yonggang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:59 UTC · model grok-4.3

classification 💻 cs.CV
keywords image dehazingnon-homogeneous hazemulti-branch networkimage fusiondeep learninghaze removalcomputer vision
0
0 comments X

The pith

A two-stage network restores non-homogeneous haze by training separate branches on uniform concentration levels and fusing their region-specific strengths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that non-homogeneous dehazing reduces to a set of simpler homogeneous sub-problems when the input image is treated as a composite of local regions with roughly constant haze density. It introduces CPIFNet, which trains multiple IENet branches on synthetic datasets of different haze concentrations and then uses an IFNet stage to stack and merge the best local restorations. A single model trained on mixed haze often fails at abrupt transitions, so this partitioning avoids that mismatch. The approach is supervised by a joint loss covering reconstruction, perceptual, structural, and color fidelity. If the claim holds, it supplies a practical route to higher-quality results on real-world scenes where haze varies sharply across the frame.

Core claim

Non-homogeneous hazy images can be decomposed into tractable homogeneous sub-problems by training independent IENet branches on datasets of distinct haze concentrations; an IFNet stage then aggregates the locally optimal restorations from these branches through deep feature stacking and merging to produce one unified dehazed output.

What carries the argument

CPIFNet, a two-stage architecture in which multiple IENet branches specialize in different haze concentrations and IFNet performs deep feature stacking and merging to combine their outputs.

If this is right

  • Each IENet branch produces superior restoration inside regions whose haze concentration matches its training distribution.
  • IFNet yields a single high-quality image by intelligently selecting and blending the strongest local results from all branches.
  • The combined reconstruction, perceptual, structural, and color losses jointly improve fidelity across both stages.
  • The method directly targets abrupt density transitions that defeat single-branch or single-model dehazers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same concentration-partitioning idea could be tested on other spatially varying degradations such as uneven illumination or mixed noise levels.
  • Performance may depend on choosing the right number and spacing of concentration levels; too few branches could leave gaps in coverage.
  • Attention or uncertainty maps inside IFNet might further reduce boundary artifacts without changing the core two-stage design.

Load-bearing premise

A non-homogeneous hazy image can be treated as a patchwork of local regions each having roughly uniform haze density that matches one of the branch training sets, and the branch outputs can be fused without introducing new artifacts at the boundaries.

What would settle it

If the fused output shows visible seams, color shifts, or new distortions precisely at the locations where haze density changes abruptly in the input, the decomposition-and-fusion premise does not hold.

Figures

Figures reproduced from arXiv: 2605.00885 by Qing Xiao, Wuqi Su, Yingming Zhang, Yonggang Yang.

Figure 2
Figure 2. Figure 2: The proposed overall CPIFNet architecture. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Existing single image dehazing methods have demonstrated satisfactory performance on homogeneous thin-haze images; however, they often struggle with non-homogeneous hazy images that exhibit spatially varying haze concentrations and abrupt density transitions across different regions. To address this fundamental limitation, we propose a novel multi-branch deep neural network framework, termed Concentration Partitioning and Image Fusion Network (CPIFNet), which decomposes the challenging non-homogeneous dehazing problem into a set of tractable homogeneous sub-problems. Our key insight is that a single non-homogeneous hazy image can be viewed as a composite of multiple local regions, each exhibiting approximately homogeneous haze characteristics. CPIFNet employs a two-stage architecture consisting of an Image Enhancement Network (IENet) stage and an Image Fusion Network (IFNet) stage. In the first stage, multiple IENet branches are independently trained on homogeneous haze datasets of different concentration levels, producing enhancement models that excel at restoring regions matching their respective haze densities. In the second stage, the IFNet intelligently aggregates the advantageous regions from all enhancement outputs through deep feature stacking and merging, yielding a unified high-quality dehazed result. Furthermore, we introduce a comprehensive loss function incorporating reconstruction, perceptual, structural, and color losses to jointly supervise both stages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes CPIFNet, a two-stage multi-branch network for single-image non-homogeneous dehazing. Multiple IENet branches are trained independently on homogeneous haze datasets of different concentration levels to specialize in restoring regions of matching density; their outputs are then aggregated by IFNet through deep feature stacking and merging to produce a final dehazed image. A joint loss combining reconstruction, perceptual, structural, and color terms supervises both stages.

Significance. If empirically validated, the concentration-partitioning insight and learned fusion could meaningfully extend dehazing to real-world images with spatially varying haze densities and abrupt transitions, where single homogeneous models typically fail. The multi-branch specialization plus composite loss offers a structured way to combine region-specific advantages without requiring explicit haze-density estimation at test time.

major comments (3)
  1. [§3.1] §3.1 (IENet stage): each branch is trained on a single fixed concentration and applied to the whole image; the manuscript supplies no per-branch performance analysis, concentration histograms, or visualizations on mixed-haze inputs, so it is unclear whether specialization actually occurs or whether branches produce conflicting restorations that IFNet must resolve.
  2. [§3.2] §3.2 (IFNet stage): fusion is performed by deep feature stacking and merging with no explicit spatial attention, boundary-aware loss term, or transition penalty; because every branch processes the entire image, abrupt density changes must be handled implicitly by the learned merger, yet the central claim that this yields artifact-free results rests on this unverified assumption.
  3. [§4] §4 (Experimental section): the manuscript contains no quantitative results, ablation studies, or comparisons against prior dehazing methods on non-homogeneous benchmarks; without these data the load-bearing claim that the two-stage architecture outperforms baselines cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract and method description refer to a 'comprehensive loss function' but provide neither the mathematical formulation of each term nor the weighting coefficients used during joint training.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the analysis of branch specialization, the fusion mechanism, and the experimental validation on non-homogeneous cases.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (IENet stage): each branch is trained on a single fixed concentration and applied to the whole image; the manuscript supplies no per-branch performance analysis, concentration histograms, or visualizations on mixed-haze inputs, so it is unclear whether specialization actually occurs or whether branches produce conflicting restorations that IFNet must resolve.

    Authors: We agree that explicit evidence of specialization is needed. The manuscript describes independent training on homogeneous datasets of varying concentrations but does not provide the requested per-branch analysis. In the revision we will add concentration histograms of test images, visualizations of individual IENet outputs on mixed-haze inputs, and per-branch quantitative metrics to demonstrate that each branch performs best on regions matching its training concentration and that IFNet resolves any residual conflicts. revision: yes

  2. Referee: [§3.2] §3.2 (IFNet stage): fusion is performed by deep feature stacking and merging with no explicit spatial attention, boundary-aware loss term, or transition penalty; because every branch processes the entire image, abrupt density changes must be handled implicitly by the learned merger, yet the central claim that this yields artifact-free results rests on this unverified assumption.

    Authors: The current design relies on implicit learning within the deep feature merger. We acknowledge the absence of explicit spatial attention or boundary penalties. The revision will include visualizations of transition regions, a discussion of how the joint loss encourages smooth merging, and, if needed, an additional boundary-aware term to further reduce artifacts at abrupt density changes. revision: partial

  3. Referee: [§4] §4 (Experimental section): the manuscript contains no quantitative results, ablation studies, or comparisons against prior dehazing methods on non-homogeneous benchmarks; without these data the load-bearing claim that the two-stage architecture outperforms baselines cannot be assessed.

    Authors: We accept that the experimental section must be expanded to support the claims. The revised manuscript will add quantitative results (PSNR, SSIM, LPIPS) on non-homogeneous benchmarks, ablation studies on branch count and loss terms, and comparisons against recent single-image dehazing methods. These additions will allow direct assessment of the two-stage architecture. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with no derivation chain

full rationale

The paper proposes an empirical two-stage multi-branch neural network (CPIFNet) for non-homogeneous dehazing, with IENet branches trained independently on homogeneous datasets of varying concentrations and an IFNet fusion stage. No mathematical equations, first-principles derivations, or predictions appear in the abstract or description. The central claims rest on architectural design choices and joint loss supervision rather than any quantity that reduces to its own inputs by construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations are present. The method is self-contained as a data-driven proposal whose performance is evaluated externally on image datasets.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus one domain-specific premise about local haze homogeneity. No new physical entities are postulated and the free parameters are the usual network weights learned from data.

free parameters (2)
  • IENet branch weights
    Large set of learned parameters for each concentration-specific enhancement network.
  • IFNet fusion weights
    Learned parameters controlling feature stacking and merging.
axioms (2)
  • domain assumption Local regions of a non-homogeneous hazy image exhibit approximately homogeneous haze characteristics
    Invoked to justify decomposition into tractable sub-problems.
  • domain assumption A network trained exclusively on homogeneous haze of one concentration level will excel at restoring regions of matching density
    Basis for training separate IENet branches.

pith-pipeline@v0.9.0 · 5528 in / 1583 out tokens · 73999 ms · 2026-05-09T19:59:05.893252+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Cosmin O Ancuti, Codruta O Ancuti, Radu Timofte, and Christophe De Vleeschouwer. NH-HAZE: An image dehaz- ing benchmark with non-homogeneous hazy and haze-free images.Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition Workshops, pages 444– 445, 2020

  2. [2]

    NTIRE 2020 challenge on nonhomogeneous dehazing

    Cosmin O Ancuti, Codruta O Ancuti, Florin-Alexandru Vasluianu, and Radu Timofte. NTIRE 2020 challenge on nonhomogeneous dehazing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 490–491, 2020

  3. [3]

    NTIRE 2021 nonhomoge- neous dehazing challenge report

    Cosmin O Ancuti, Codruta O Ancuti, Florin-Alexandru Vasluianu, and Radu Timofte. NTIRE 2021 nonhomoge- neous dehazing challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 627–646, 2021

  4. [4]

    Self-guided image dehazing using progressive feature fu- sion

    Haoran Bai, Jinshan Pan, Xinguang Xiang, and Jinhui Tang. Self-guided image dehazing using progressive feature fu- sion. InIEEE Transactions on Image Processing, volume 31, pages 1217–1229. IEEE, 2022

  5. [5]

    DehazeNet: An end-to-end system for single image haze removal.IEEE Transactions on Image Process- ing, 25(11):5187–5198, 2016

    Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. DehazeNet: An end-to-end system for single image haze removal.IEEE Transactions on Image Process- ing, 25(11):5187–5198, 2016

  6. [6]

    Fast image defogging algorithm based on lu- minance contrast enhancement and saturation compensation

    Xumin Cao, Chunxiao Liu, Jindong Zhang, Yuhang Lin, and Jinwei Zhao. Fast image defogging algorithm based on lu- minance contrast enhancement and saturation compensation. Journal of Computer-Aided Design and Computer Graphics, 30(10):1925–1934, 2018

  7. [7]

    DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention.IEEE Transactions on Image Pro- cessing, 33:1002–1015, 2024

    Zixuan Chen, Zewei He, and Zhiming Lu. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention.IEEE Transactions on Image Pro- cessing, 33:1002–1015, 2024

  8. [8]

    PSD: Principled synthetic-to-real dehazing guided by phys- ical priors

    Zeyuan Chen, Yangchao Wang, Yang Yang, and Dong Liu. PSD: Principled synthetic-to-real dehazing guided by phys- ical priors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7180– 7189, 2021

  9. [9]

    Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015

    Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015

  10. [10]

    Fast deep multi-patch hi- erarchical network for nonhomogeneous image dehazing

    Sourya Dipta Das and Saikat Dutta. Fast deep multi-patch hi- erarchical network for nonhomogeneous image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition Workshops, pages 1994–2001, 2020

  11. [11]

    Multi-scale boosted de- hazing network with dense feature fusion

    Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. Multi-scale boosted de- hazing network with dense feature fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2157–2167, 2020

  12. [12]

    An image is worth 16x16 words: Trans- formers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. InInternational Con- ference on Learning Representations, 2021

  13. [13]

    Advancing sequential numerical prediction in autoregressive models.Proceedings of the Annual Meeting of the Associa- tion for Computational Linguistics, 2025

    Xiaohan Fei, Jinghui Lu, Qirui Sun, Hao Feng, Yanjie Wang, Wenqiang Shi, Aolong Wang, Jingqun Tang, and Can Huang. Advancing sequential numerical prediction in autoregressive models.Proceedings of the Annual Meeting of the Associa- tion for Computational Linguistics, 2025

  14. [14]

    DocPedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding.Science China Infor- mation Sciences, 2023

    Hao Feng, Qi Liu, Hao Liu, Jingqun Tang, Wei Zhou, Hongqiu Li, and Can Huang. DocPedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding.Science China Infor- mation Sciences, 2023

  15. [15]

    UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding.arXiv preprint arXiv:2308.11592, 2023

    Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wei Zhou, Hongqiu Li, and Can Huang. UniDoc: A univer- sal large multimodal model for simultaneous text detec- tion, recognition, spotting and understanding.arXiv preprint arXiv:2308.11592, 2023

  16. [16]

    Dolphin: Document image parsing via heterogeneous anchor prompting.Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 21919–21936, 2025

    Hao Feng, Shuangping Wei, Xiaohan Fei, Wenqiang Shi, Yi Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, et al. Dolphin: Document image parsing via heterogeneous anchor prompting.Findings of the As- sociation for Computational Linguistics: ACL 2025, pages 21919–21936, 2025

  17. [17]

    OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

    Ling Fu, Zhuosheng Kuang, Jianing Song, Mingxin Huang, Bang Yang, Yongbin Li, Leigang Zhu, Qizhang Luo, Xinyu Wang, Jingqun Tang, et al. OCRBench v2: An im- proved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

  18. [18]

    Fast R-CNN

    Ross Girshick. Fast R-CNN. InProceedings of the IEEE International Conference on Computer Vision, pages 1440– 1448, 2015

  19. [19]

    Generative adversarial nets.Advances in Neural Information Processing Systems, 27, 2014

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, 27, 2014

  20. [20]

    SCANet: Self-paced semi- curricular attention network for non-homogeneous image de- hazing

    Yu Guo, Yuan Gao, Ryan Wen Liu, Yuxu Lu, Jianxin Qu, Shengfeng He, and Wenqi Ren. SCANet: Self-paced semi- curricular attention network for non-homogeneous image de- hazing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1885–1894, 2023

  21. [21]

    Single image haze removal using dark channel prior

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1956–1963, 2009

  22. [22]

    Single image haze removal using dark channel prior.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 33(12):2341–2353, 2011

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 33(12):2341–2353, 2011

  23. [23]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- 8 ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  24. [24]

    Multi-scale selective resid- ual learning for non-homogeneous dehazing

    Eunsung Jo and Jae-Young Sim. Multi-scale selective resid- ual learning for non-homogeneous dehazing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition Workshops, pages 507–515, 2021

  25. [25]

    Perceptual losses for real-time style transfer and super-resolution

    Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vi- sion, pages 694–711. Springer, 2016

  26. [26]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2017

  27. [27]

    You only look yourself: Un- supervised and untrained single image dehazing neural net- work.International Journal of Computer Vision, 129:1754– 1767, 2021

    Boyun Li, Yuanbiao Gou, Shuhang Gu, Jerry Zitao Liu, Joey Tianyi Zhou, and Xi Peng. You only look yourself: Un- supervised and untrained single image dehazing neural net- work.International Journal of Computer Vision, 129:1754– 1767, 2021

  28. [28]

    AOD-Net: All-in-one dehazing network

    Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. AOD-Net: All-in-one dehazing network. InPro- ceedings of the IEEE International Conference on Computer Vision, pages 4780–4788, 2017

  29. [29]

    Benchmarking single- image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single- image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

  30. [30]

    Dual-scale single image dehazing via neural augmen- tation.IEEE Transactions on Image Processing, 31:6213– 6223, 2022

    Zhengguo Li, Chaobing Zheng, Haiyan Shu, and Shiqian Wu. Dual-scale single image dehazing via neural augmen- tation.IEEE Transactions on Image Processing, 31:6213– 6223, 2022

  31. [31]

    SwinIR: Image restoration us- ing swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision Workshops, pages 1833–1844, 2021

  32. [32]

    A remote sensing image dehazing method based on heteroge- neous priors.IEEE Transactions on Geoscience and Remote Sensing, 62:5619513, 2024

    Shuang Liang, Tianyuan Gao, Tao Chen, and Peng Cheng. A remote sensing image dehazing method based on heteroge- neous priors.IEEE Transactions on Geoscience and Remote Sensing, 62:5619513, 2024

  33. [33]

    Non-homogeneous haze data synthesis based real-world image dehazing with enhancement-and-restoration fused CNNs.Computers and Graphics, 106:45–57, 2022

    Chunxiao Liu, Shuangshuang Ye, Lidong Zhang, Hanyu Bao, Xin Wang, and Fangda Wu. Non-homogeneous haze data synthesis based real-world image dehazing with enhancement-and-restoration fused CNNs.Computers and Graphics, 106:45–57, 2022

  34. [34]

    Grid- DehazeNet: Attention-based multi-scale network for image dehazing

    Xiaohong Liu, Yongrui Ma, Zhihao Shi, and Jun Chen. Grid- DehazeNet: Attention-based multi-scale network for image dehazing. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7313–7322, 2019

  35. [35]

    SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12), 2023

    Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, et al. SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12), 2023

  36. [36]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

  37. [37]

    Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, et al. A bounding box is worth one token— interleaving layout and text in a large language model for document understanding.Findings of the Association for Computational Linguistics: ACL 2025, pages 7252–7273, 2025

  38. [38]

    Chromatic framework for vision in bad weather

    Srinivasa G Narasimhan and Shree K Nayar. Chromatic framework for vision in bad weather. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 598–605, 2000

  39. [39]

    ChineseVideoBench: Benchmarking multi-modal large models for chinese video question answering.arXiv preprint arXiv:2511.18399, 2025

    Yuxiang Nie, Han Wang, Yongjie Ye, Haiyang Yu, Wen- jia Jia, Tianshu Zeng, Hao Feng, Xiaohan Fei, Yongbin Li, et al. ChineseVideoBench: Benchmarking multi-modal large models for Chinese video question answering.arXiv preprint arXiv:2511.18399, 2025

  40. [40]

    FFA-Net: Feature fusion attention network for single image dehazing

    Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. FFA-Net: Feature fusion attention network for single image dehazing. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 34, pages 11908– 11915, 2020

  41. [41]

    Enhanced pix2pix dehazing network

    Yanyun Qu, Yuanzhuo Chen, Jianying Huang, and Yuan Xie. Enhanced pix2pix dehazing network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 8152–8160, 2019

  42. [42]

    Single image dehazing via multi- scale convolutional neural networks

    Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. Single image dehazing via multi- scale convolutional neural networks. InProceedings of the European Conference on Computer Vision, pages 154–169. Springer, 2016

  43. [43]

    U- Net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. InProceedings of the International Conference on Medical Image Computing and Computer-Assisted Interven- tion, pages 234–241. Springer, 2015

  44. [44]

    MCTBench: Multimodal cognition towards text-rich visual scenes bench- mark.arXiv preprint arXiv:2410.11538, 2024

    Bin Shan, Xiaohan Fei, Wenqiang Shi, Aolong Wang, Guozhi Tang, Lei Liao, Jingqun Tang, Xiang Bai, and Can Huang. MCTBench: Multimodal cognition towards text-rich visual scenes benchmark.arXiv preprint arXiv:2410.11538, 2024

  45. [45]

    Atmospheric light correction and trans- mission optimization based robust image dehazing.Jour- nal of Computer-Aided Design and Computer Graphics, 29(9):1604–1612, 2017

    Yiyun Shen, Chunxiao Liu, Jindong Zhang, Yaqi Shao, and Jinwei Zhao. Atmospheric light correction and trans- mission optimization based robust image dehazing.Jour- nal of Computer-Aided Design and Computer Graphics, 29(9):1604–1612, 2017

  46. [46]

    Integrating sky detection with texture smooth- ing for image defogging.Journal of Image and Graphics, 22(7):897–905, 2017

    Yiyun Shen, Yaqi Shao, Chunxiao Liu, Huajian Zhou, and Jinwei Zhao. Integrating sky detection with texture smooth- ing for image defogging.Journal of Image and Graphics, 22(7):897–905, 2017

  47. [47]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2015

  48. [48]

    Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

  49. [49]

    Character recognition 9 competition for street view shop signs.National Science Re- view, 10(6):nwad141, 2023

    Jingqun Tang, Weijia Du, Bing Wang, Wei Zhou, Song Mei, Tao Xue, Xiang Xu, and Hao Zhang. Character recognition 9 competition for street view shop signs.National Science Re- view, 10(6):nwad141, 2023

  50. [50]

    TextSquare: Scaling up text-centric visual instruction tuning.arXiv preprint arXiv:2404.12803, 2024

    Jingqun Tang, Chunhui Lin, Zhen Zhao, Shuangping Wei, Binghong Wu, Qi Liu, Yuning He, Kai Lu, Hao Feng, Yong- bin Li, et al. TextSquare: Scaling up text-centric visual in- struction tuning.arXiv preprint arXiv:2404.12803, 2024

  51. [51]

    MTVQA: Benchmarking multilingual text-centric vi- sual question answering.Findings of the Association for Computational Linguistics: ACL 2025, pages 7748–7763, 2025

    Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shuangping Wei, Aolong Wang, Chunhui Lin, Hao Feng, Zhen Zhao, et al. MTVQA: Benchmarking multilingual text-centric vi- sual question answering.Findings of the Association for Computational Linguistics: ACL 2025, pages 7748–7763, 2025

  52. [52]

    You can even annotate text with voice: Transcription-only-supervised text spotting

    Jingqun Tang, Shaobo Qiao, Benlei Cui, Yuhang Ma, Shun- ping Zhang, and Dimitrios Kanoulas. You can even annotate text with voice: Transcription-only-supervised text spotting. InProceedings of the 30th ACM International Conference on Multimedia, pages 4154–4163, 2022

  53. [53]

    Few could be better than all: Feature sampling and grouping for scene text detection

    Jingqun Tang, Wenqing Zhang, Hao Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, and Xiang Bai. Few could be better than all: Feature sampling and grouping for scene text detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563– 4572, 2022

  54. [54]

    DIVFusion: Darkness-free infrared and visible image fusion.Information Fusion, 91:477–493, 2023

    Linfeng Tang, Xinyu Xiang, Hao Zhang, Maoguo Gong, and Jiayi Ma. DIVFusion: Darkness-free infrared and visible image fusion.Information Fusion, 91:477–493, 2023

  55. [55]

    Light-DehazeNet: A novel lightweight CNN architecture for single image dehazing

    Hayat Ullah, Khan Muhammad, Muhammad Irfan, Saeed Anwar, Muhammad Sajjad, Ali Shariq Imran, and Vic- tor Hugo C De Albuquerque. Light-DehazeNet: A novel lightweight CNN architecture for single image dehazing. IEEE Transactions on Image Processing, 30:8968–8982, 2021

  56. [56]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, 2017

  57. [57]

    PARGO: Bridging vision-language with partial and global views.Proceedings of the AAAI Conference on Artifi- cial Intelligence, 2024

    Aolong Wang, Bin Shan, Wenqiang Shi, Kevin Y Lin, Xiao- han Fei, Guozhi Tang, Lei Liao, Jingqun Tang, Can Huang, et al. PARGO: Bridging vision-language with partial and global views.Proceedings of the AAAI Conference on Artifi- cial Intelligence, 2024

  58. [58]

    Aolong Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiaohan Fei, Jinghui Lu, Han Wang, Hao Liu, Yuliang Liu, et al. WildDoc: How far are we from achieving comprehen- sive and robust document understanding in the wild?Pro- ceedings of the Conference on Empirical Methods in Natural Language Processing, 2025

  59. [59]

    Vision as LoRA.arXiv preprint arXiv:2503.20680, 2025

    Han Wang, Yongjie Ye, Bingru Li, Yuxiang Nie, Jinghui Lu, Jingqun Tang, Yanjie Wang, and Can Huang. Vision as LoRA.arXiv preprint arXiv:2503.20680, 2025

  60. [60]

    Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Im- age Processing, 13(4):600–612, 2004

    Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Im- age Processing, 13(4):600–612, 2004

  61. [61]

    Con- trastive learning for compact single image dehazing

    Haiyan Wu, Yanyun Qu, Shaozi Lin, Jilan Zhou, Ruizhi Qiao, Zhizhong Zhang, Yuan Xie, and Lizhuang Ma. Con- trastive learning for compact single image dehazing. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10551–10560, 2021

  62. [62]

    A two-branch neural network for non-homogeneous dehazing via ensemble learning

    Yankun Yu, Huan Liu, Minghan Fu, Jun Chen, Xueying Wang, and Keyan Wang. A two-branch neural network for non-homogeneous dehazing via ensemble learning. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 193–202, 2021

  63. [63]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5728– 5739, 2022

  64. [64]

    HazDesNet: An end-to-end network for haze density predic- tion.IEEE Transactions on Intelligent Transportation Sys- tems, 23(4):3087–3102, 2022

    Junhua Zhang, Xiongkuo Min, Yucheng Zhu, Guangtao Zhai, Joey Tianyi Zhou, Xiaokang Yang, and Wenjun Zhang. HazDesNet: An end-to-end network for haze density predic- tion.IEEE Transactions on Intelligent Transportation Sys- tems, 23(4):3087–3102, 2022

  65. [65]

    Dark channel prior-based image dehazing with atmospheric light validation and halo elimination.Journal of Image and Graphics, 21(9):1221–1228, 2016

    Jinwei Zhao, Yiyun Shen, Chunxiao Liu, and Yi Ouyang. Dark channel prior-based image dehazing with atmospheric light validation and halo elimination.Journal of Image and Graphics, 21(9):1221–1228, 2016

  66. [66]

    TabPedia: Towards comprehensive visual table understanding with concept synergy

    Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shuang- ping Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Wei Zhou, et al. TabPedia: Towards comprehensive visual table understanding with concept synergy. InAdvances in Neural Information Processing Systems, 2024

  67. [67]

    Multi-modal in-context learning makes an ego-evolving scene text recognizer

    Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Hao Liu, Zhizhong Zhang, Xiangyu Tan, Can Huang, and Yuan Xie. Multi-modal in-context learning makes an ego-evolving scene text recognizer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023

  68. [68]

    Curricular contrastive regularization for physics- aware single image dehazing

    Yu Zheng, Jiahui Zhan, Shengfeng He, Junyu Dong, and Yong Du. Curricular contrastive regularization for physics- aware single image dehazing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5785–5794, 2023

  69. [69]

    Ultra-high-definition image dehazing via multi-guided bilateral learning

    Zhuoran Zheng, Wenqi Ren, Xiaochun Cao, Xiaobin Hu, Tao Wang, Fenglong Song, and Xiuyi Jia. Ultra-high-definition image dehazing via multi-guided bilateral learning. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16180–16189, 2021

  70. [70]

    A fast single image haze removal algorithm using color attenuation prior

    Qingsong Zhu, Jiaming Mai, and Ling Shao. A fast single image haze removal algorithm using color attenuation prior. IEEE Transactions on Image Processing, 24(11):3522–3533, 2015. 10