arxiv: 2605.07810 · v1 · submitted 2026-05-08 · ⚛️ physics.optics · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Pre-training Enables Extraordinary All-optical Image Denoising

Xudong Lv , Yuxiang Sun , Shuo Wang , Nanxing Chen , Jun Guan , Jingtian Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:50 UTC · model grok-4.3

classification ⚛️ physics.optics cs.CV

keywords optical neural networksdiffractive networksimage denoisingpre-trainingtransfer learningall-optical computingfree-space optics

0 comments

The pith

Pre-training diffractive networks enables all-optical denoising of severely noisy images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Optical neural networks process information with light for potential gains in speed and efficiency over electronic systems. This paper shows that pre-training a diffractive network on 3.45 million simple images before fine-tuning it on noisy task-specific data produces better denoising than direct training or standard Fourier filtering. The method handles extreme noise levels that drop input PSNR below 8 dB and raises output quality above 18 dB while keeping fine details. It works across image types from digits and X-rays to natural scenes and faces, and improves downstream tasks such as face detection and object localization in noise.

Core claim

The authors establish that a two-step pre-training and fine-tuning process for diffractive networks, using a massive dataset of 3.45 million diverse simple images followed by task-specific fine-tuning, enables snapshot all-optical image denoising that outperforms conventional Fourier-domain filtering and directly trained networks, particularly for severe noise with input PSNR below 8 dB, achieving output PSNR above 18 dB while preserving fine image features, and generalizes consistently across highly diverse image domains including MNIST, ChestMNIST, CIFAR-10, and CelebA.

What carries the argument

The two-step transfer learning process on the diffractive network, with pre-training on a large set of simple images to build general capability before adapting to specific noisy datasets.

If this is right

The method improves denoising quality for images with severe noise while preserving fine details better than direct training or Fourier filtering.
The same pre-trained network can be fine-tuned consistently for image styles ranging from handwritten digits and medical scans to natural scenes and faces.
It enhances accuracy in vision applications such as face detection, license plate recognition, and UAV localization under noisy conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Physical optical computing devices may need large-scale pre-training stages to reach performance levels comparable to digital models.
The approach points toward compact, low-energy optical front-ends that could perform denoising directly at the sensor before any electronic processing.
The same pre-training strategy might extend to other all-optical tasks such as classification or feature extraction in free-space systems.

Load-bearing premise

Simulated performance of the pre-trained and fine-tuned diffractive network will translate directly to a physical free-space optical system without major degradation from fabrication errors, alignment issues, or unmodeled aberrations.

What would settle it

A physical free-space implementation of the pre-trained diffractive network that fails to raise PSNR from below 8 dB to above 18 dB on severely noisy test images or loses fine features would falsify the claimed advantage of the transfer learning approach.

read the original abstract

Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre-training-driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free-space optical denoising by a diffractive network optimized by a two-step process including (1) pre-training using a massive dataset of 3.45 million diverse but simple images and (2) fine-tuning with the corresponding task-specific datasets. Compared to conventional Fourier-domain filtering and directly trained diffractive networks, such a transfer learning process exhibited prominent advantages for denoising images degraded by severe noise, peak signal-to-noise ratio (PSNR) below 8 dB, while preserving fine image features and improving the PSNR to above 18 dB. Importantly, the same pre-trained optical network could be consistently fine-tuned to process degraded images from highly diverse styles ranging from handwritten digits (MNIST) and chest X-rays (ChestMNIST) to CIFAR-10 images and human faces (CelebA). We further demonstrated the critical role of our optical denoisers in vision-based applications, including face detection, plate recognition, and localization of UAVs in noisy conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pre-training on millions of simple images lets a diffractive network fine-tune for cross-domain denoising and beat direct training in simulation, but the physical hardware results lack the controls needed to back the PSNR claims.

read the letter

The main thing to know is that this work shows a two-step training trick—pre-train a diffractive optical network on 3.45 million basic images, then fine-tune on the target noisy dataset—produces better all-optical denoising than training the same network from scratch or using Fourier filters. The same pre-trained layers adapt to MNIST, ChestMNIST, CIFAR-10, and CelebA, lifting PSNR from below 8 dB to above 18 dB while keeping fine details. That cross-domain transfer is the concrete new piece; prior optical-network papers have not reported this scale of pre-training plus fine-tuning for denoising tasks. The comparisons to direct training and conventional filters are straightforward and show the advantage clearly on the reported metrics. The paper also ties the denoiser into downstream tasks like face detection and UAV localization, which helps show practical relevance. The soft spot is the physical implementation. The abstract states a free-space optical demonstration, yet supplies no measured PSNR values from hardware, no error bars, no alignment tolerances, and no side-by-side sim-versus-real tables. Without those, the central performance numbers rest on the assumption that the simulated phase modulation and propagation match the fabricated layers closely enough. Fabrication errors, diffraction efficiency losses, or unmodeled aberrations would change the interference patterns that do the denoising, so the claimed versatility across datasets could shrink in a real setup. This is aimed at groups working on diffractive processors and optical machine learning. The empirical protocol is fresh enough and the numbers are specific enough that a serious referee should see it, even if the hardware section needs substantial tightening before acceptance.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a two-step pre-training and fine-tuning strategy for diffractive optical neural networks to perform all-optical image denoising in free space. Pre-training is performed on 3.45 million simple images, followed by fine-tuning on task-specific datasets including MNIST, ChestMNIST, CIFAR-10, and CelebA. The approach is reported to achieve substantial PSNR improvements from below 8 dB to above 18 dB for severely noisy images, outperforming Fourier-domain filtering and directly trained networks, while preserving fine features, and is applied to downstream vision tasks like face detection and UAV localization.

Significance. If the physical implementation is validated, this result would highlight the potential of transfer learning in optical computing systems, offering a path to more robust and versatile all-optical processors that could operate at high speed and low energy for image processing applications.

major comments (1)

[Abstract] Abstract: The claim of demonstrating effective free-space optical denoising by a diffractive network optimized via the two-step process is not supported by any quantitative hardware validation data, such as side-by-side simulated vs. measured PSNR values, error bars from multiple trials, alignment tolerance analysis, or an error budget for fabrication and unmodeled aberrations. This directly undermines assessment of the central claim that the simulated performance (PSNR rise from <8 dB to >18 dB) translates to physical hardware across the reported datasets.

minor comments (1)

The description of the pre-training dataset composition and the exact fine-tuning hyperparameters could be expanded for reproducibility, though this does not affect the core claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of demonstrating effective free-space optical denoising by a diffractive network optimized via the two-step process is not supported by any quantitative hardware validation data, such as side-by-side simulated vs. measured PSNR values, error bars from multiple trials, alignment tolerance analysis, or an error budget for fabrication and unmodeled aberrations. This directly undermines assessment of the central claim that the simulated performance (PSNR rise from <8 dB to >18 dB) translates to physical hardware across the reported datasets.

Authors: We agree that the manuscript presents only numerical simulation results and does not include experimental hardware validation data such as measured PSNR values, error bars, or detailed alignment/fabrication error budgets. The work focuses on the algorithmic innovation of pre-training on 3.45 million images followed by fine-tuning, with all quantitative claims (e.g., PSNR rising from below 8 dB to above 18 dB) derived from simulated propagation through the diffractive layers. In the revised manuscript we will explicitly revise the abstract and introduction to state that the demonstrations are numerical simulations of the physical optical model. We will add a dedicated subsection on practical hardware considerations, including a first-order error budget based on typical diffractive optics fabrication tolerances (phase errors ~0.05-0.2 rad, lateral alignment ~5-20 um) and how these would propagate to output PSNR. This revision will clarify the scope of the current claims while preserving the central result on the benefits of the two-step training strategy. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical pre-training and fine-tuning procedure

full rationale

The paper presents an empirical two-step optimization process for a diffractive optical network—pre-training on 3.45 million simple images followed by fine-tuning on domain-specific datasets—without any mathematical derivations, equations, or load-bearing self-citations that reduce the reported PSNR gains (from below 8 dB to above 18 dB) to fitted parameters or prior results by construction. Performance claims rest on simulation results and physical demonstrations that are independent of the training workflow itself; no self-definitional loops, fitted-input predictions, or ansatz smuggling via citation appear in the described method. The approach is therefore self-contained as a standard transfer-learning application to all-optical denoising.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard diffraction physics and the empirical effectiveness of transfer learning when applied to physical optical networks; no new entities are postulated and no free parameters are fitted to the target denoising performance.

axioms (2)

standard math Light propagation through diffractive layers can be modeled by the Fresnel diffraction integral or angular-spectrum method.
Standard assumption in diffractive optics and optical neural network design.
domain assumption The physical optical network can be accurately optimized in simulation before fabrication.
Implicit in all diffractive-network papers; required for the pre-training step to be useful.

pith-pipeline@v0.9.0 · 5551 in / 1455 out tokens · 43598 ms · 2026-05-11T01:50:14.403671+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The all-optical diffractive denoiser consists of a sequence of diffractive layers... modeled using the angular spectrum method... tl(x,y)=exp(j ϕl(x,y))
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

pre-training using a massive dataset of 3.45 million diverse but simple images and (2) fine-tuning with the corresponding task-specific datasets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

[1]

In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp

Gondara, L.: Medical image denoising using convolutional denoising autoen- coders. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 241–246 (2016). IEEE

work page 2016
[2]

Scientific reports9(1), 14454 (2019)

Devalla, S.K., Subramanian, G., Pham, T.H., Wang, X., Perera, S., Tun, T.A., Aung, T., Schmetterer, L., Thi´ ery, A.H., Girard, M.J.: A deep learning approach to denoise optical coherence tomography images of the optic nerve head. Scientific reports9(1), 14454 (2019)

work page 2019
[3]

Scientific Reports13(1), 5760 (2023)

Nienhaus, J., Matten, P., Britten, A., Scherer, J., H¨ ock, E., Freytag, A., Drexler, W., Leitgeb, R.A., Schlegl, T., Schmoll, T.: Live 4d-oct denoising with self-supervised deep learning. Scientific Reports13(1), 5760 (2023)

work page 2023
[4]

In: European Conference on Computer Vision, pp

Qi, D., Tan, W., Yao, Q., Liu, J.: Yolo5face: Why reinventing a face detector. In: European Conference on Computer Vision, pp. 228–244 (2022). Springer

work page 2022
[5]

Pattern Recognition155, 110714 (2024)

Yu, Z., Huang, H., Chen, W., Su, Y., Liu, Y., Wang, X.: Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognition155, 110714 (2024)

work page 2024
[6]

nature521(7553), 460–466 (2015)

Floreano, D., Wood, R.J.: Science, technology and the future of small autonomous drones. nature521(7553), 460–466 (2015)

work page 2015
[7]

Science robotics7(66), 5954 (2022)

Zhou, X., Wen, X., Wang, Z., Gao, Y., Li, H., Wang, Q., Yang, T., Lu, H., Cao, Y., Xu, C.,et al.: Swarm of micro flying robots in the wild. Science robotics7(66), 5954 (2022)

work page 2022
[8]

Engineering46, 186–213 (2025)

Xu, D., Ma, Y., Jin, G., Cao, L.: Intelligent photonics: a disruptive technology to shape the present and redefine the future. Engineering46, 186–213 (2025)

work page 2025
[9]

eLight5(1), 5 (2025)

Wu, N., Sun, Y., Hu, J., Yang, C., Bai, Z., Wang, F., Cui, X., He, S., Li, Y., Zhang, C.,et al.: Intelligent nanophotonics: when machine learning sheds light. eLight5(1), 5 (2025)

work page 2025
[10]

PhotoniX2(1), 5 (2021)

Liu, J., Wu, Q., Sui, X., Chen, Q., Gu, G., Wang, L., Li, S.: Research progress in optical neural networks: theory, applications and developments. PhotoniX2(1), 5 (2021)

work page 2021
[11]

Light: Science & Applications 13(1), 263 (2024)

Fu, T., Zhang, J., Sun, R., Huang, Y., Xu, W., Yang, S., Zhu, Z., Chen, H.: Optical neural networks: progress and challenges. Light: Science & Applications 13(1), 263 (2024)

work page 2024
[12]

Nature photonics11(7), 441–446 (2017)

Shen, Y., Harris, N.C., Skirlo, S., Prabhu, M., Baehr-Jones, T., Hochberg, M., Sun, X., Zhao, S., Larochelle, H., Englund, D.,et al.: Deep learning with coherent nanophotonic circuits. Nature photonics11(7), 441–446 (2017)

work page 2017
[13]

Nature640(8058), 361–367 (2025)

Hua, S., Divita, E., Yu, S., Peng, B., Roques-Carmes, C., Su, Z., Chen, Z., Bai, 23 Y., Zou, J., Zhu, Y.,et al.: An integrated large-scale photonic accelerator with ultralow latency. Nature640(8058), 361–367 (2025)

work page 2025
[14]

Nature640(8058), 368–374 (2025)

Ahmed, S.R., Baghdadi, R., Bernadskiy, M., Bowman, N., Braid, R., Carr, J., Chen, C., Ciccarella, P., Cole, M., Cooke, J.,et al.: Universal photonic artificial intelligence acceleration. Nature640(8058), 368–374 (2025)

work page 2025
[15]

Nature645(8080), 354–361 (2025)

Kalinin, K.P., Gladrow, J., Chu, J., Clegg, J.H., Cletheroe, D., Kelly, D.J., Rah- mani, B., Brennan, G., Canakci, B., Falck, F.,et al.: Analog optical computer for ai inference and combinatorial optimization. Nature645(8080), 354–361 (2025)

work page 2025
[16]

Nature Communications14(1), 70 (2023)

Fu, T., Zang, Y., Huang, Y., Du, Z., Huang, H., Hu, C., Chen, M., Yang, S., Chen, H.: Photonic machine learning with on-chip diffractive optics. Nature Communications14(1), 70 (2023)

work page 2023
[17]

ACS Photonics10(7), 2001–2010 (2023)

Liao, K., Dai, T., Yan, Q., Hu, X., Gong, Q.: Integrated photonic neural networks: Opportunities and challenges. ACS Photonics10(7), 2001–2010 (2023)

work page 2001
[18]

Science 361(6406), 1004–1008 (2018)

Lin, X., Rivenson, Y., Yardimci, N.T., Veli, M., Luo, Y., Jarrahi, M., Ozcan, A.: All-optical machine learning using diffractive deep neural networks. Science 361(6406), 1004–1008 (2018)

work page 2018
[19]

Nature Photonics15(5), 367–373 (2021)

Zhou, T., Lin, X., Wu, J., Chen, Y., Xie, H., Li, Y., Fan, J., Wu, H., Fang, L., Dai, Q.: Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nature Photonics15(5), 367–373 (2021)

work page 2021
[20]

elight2(1), 4 (2022)

Luo, Y., Zhao, Y., Li, J., C ¸ etinta¸ s, E., Rivenson, Y., Jarrahi, M., Ozcan, A.: Computational imaging without a computer: seeing through random diffusers at the speed of light. elight2(1), 4 (2022)

work page 2022
[21]

Light: Science & Applications13(1), 43 (2024)

I¸ sıl, C ¸ ., Gan, T., Ardic, F.O., Mentesoglu, K., Digani, J., Karaca, H., Chen, H., Li, J., Mengu, D., Jarrahi, M.,et al.: All-optical image denoising using a diffractive visual processor. Light: Science & Applications13(1), 43 (2024)

work page 2024
[22]

eLight4(1), 8 (2024)

Hu, J., Liao, K., Din¸ c, N.U., Gigli, C., Bai, B., Gan, T., Li, X., Chen, H., Yang, X., Li, Y.,et al.: Subwavelength imaging using a solid-immersion diffractive optical processor. eLight4(1), 8 (2024)

work page 2024
[23]

Science, 9404 (2026)

Guo, Y., Zhang, H., Li, M., Yu, F., Wu, Y., Hao, Y., Huang, S., Liang, Y., Lin, X., Li, X., et al.: Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising. Science, 9404 (2026)

work page 2026
[24]

Nature Photonics19(5), 486–493 (2025) 24

Yu, H., Huang, Z., Lamon, S., Wang, B., Ding, H., Lin, J., Wang, Q., Luan, H., Gu, M., Zhang, Q.: All-optical image transportation through a multimode fibre using a miniaturized diffractive neural network on the distal facet. Nature Photonics19(5), 486–493 (2025) 24

work page 2025
[25]

Light: Science & Applications15(1), 101 (2026)

Huang, Z., Liu, Y., Zhang, N., Zhang, Z., Liao, Q., He, C., Liu, S., Liu, Y., Wang, H., Qiao, X.,et al.: Anti-interference diffractive deep neural networks for multi-object recognition. Light: Science & Applications15(1), 101 (2026)

work page 2026
[26]

Opto-Electronic Advances7(2), 230005–1 (2023)

He, C., Zhao, D., Fan, F., Zhou, H., Li, X., Li, Y., Li, J., Dong, F., Miao, Y.- X., Wang, Y.,et al.: Pluggable multitask diffractive neural networks based on cascaded metasurfaces. Opto-Electronic Advances7(2), 230005–1 (2023)

work page 2023
[27]

Nature623(7985), 48–57 (2023)

Chen, Y., Nazhamaiti, M., Xu, H., Meng, Y., Zhou, T., Li, G., Fan, J., Wei, Q., Wu, J., Qiao, F.,et al.: All-analog photoelectronic chip for high-speed vision tasks. Nature623(7985), 48–57 (2023)

work page 2023
[28]

Nature632(8024), 280–286 (2024)

Xue, Z., Zhou, T., Xu, Z., Yu, S., Dai, Q., Fang, L.: Fully forward mode training for optical neural networks. Nature632(8024), 280–286 (2024)

work page 2024
[29]

Nature Communications14(1), 2535 (2023)

Huo, Y., Bao, H., Peng, Y., Gao, C., Hua, W., Yang, Q., Li, H., Wang, R., Yoon, S.-E.: Optical neural network via loose neuron array and functional learning. Nature Communications14(1), 2535 (2023)

work page 2023
[30]

Light: Science & Applications13(1), 173 (2024)

Li, Y., Li, J., Ozcan, A.: Nonlinear encoding in diffractive information processing using linear optical materials. Light: Science & Applications13(1), 173 (2024)

work page 2024
[31]

Nature Photonics18(10), 1076–1082 (2024)

Yildirim, M., Dinc, N.U., Oguz, I., Psaltis, D., Moser, C.: Nonlinear processing with linear optics. Nature Photonics18(10), 1076–1082 (2024)

work page 2024
[32]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Zhou, X., Lee, Z., Ye, W., Xie, R., Zhang, W., Peng, G., Li, Z., Zhang, S.: All-optical nonlinear diffractive deep network for ultrafast image denoising. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 28221–28231 (2025)

work page 2025
[33]

Nature Photonics17(8), 723–730 (2023)

Chen, Z., Sludds, A., Davis III, R., Christen, I., Bernstein, L., Ateshian, L., Heuser, T., Heermeier, N., Lott, J.A., Reitzenstein, S.,et al.: Deep learning with coherent vcsel neural networks. Nature Photonics17(8), 723–730 (2023)

work page 2023
[34]

Nature Communications16(1), 10332 (2025)

Ning, Y.M., Ma, Q., Xiao, Q., Gao, X.X., Wu, Q.W., Gu, Z., Li, R.S., Chen, L., You, J.W., Cui, T.J.: Multilayer nonlinear diffraction neural networks with programmable and fast relu activation function. Nature Communications16(1), 10332 (2025)

work page 2025
[35]

Nature588(7836), 39–47 (2020)

Wetzstein, G., Ozcan, A., Gigan, S., Fan, S., Englund, D., Soljaˇ ci´ c, M., Denz, C., Miller, D.A., Psaltis, D.: Inference in artificial intelligence with deep optics and photonics. Nature588(7836), 39–47 (2020)

work page 2020
[36]

Nature601(7894), 549–555 (2022) 25

Wright, L.G., Onodera, T., Stein, M.M., Wang, T., Schachter, D.T., Hu, Z., McMahon, P.L.: Deep physical neural networks trained with backpropagation. Nature601(7894), 549–555 (2022) 25

work page 2022
[37]

Science 382(6676), 1297–1303 (2023)

Momeni, A., Rahmani, B., Mall´ ejac, M., Del Hougne, P., Fleury, R.: Backpropagation-free training of deep physical neural networks. Science 382(6676), 1297–1303 (2023)

work page 2023
[38]

Nature Reviews Materials6(3), 207–225 (2021)

Zangeneh-Nejad, F., Sounas, D.L., Al` u, A., Fleury, R.: Analogue computing with metamaterials. Nature Reviews Materials6(3), 207–225 (2021)

work page 2021
[39]

Nature645(8079), 53–61 (2025)

Momeni, A., Rahmani, B., Scellier, B., Wright, L.G., McMahon, P.L., Wanjura, C.C., Li, Y., Skalli, A., Berloff, N.G., Onodera, T.,et al.: Training of physical neural networks. Nature645(8079), 53–61 (2025)

work page 2025
[40]

nature communications15(1), 1525 (2024)

Hu, J., Mengu, D., Tzarouchis, D.C., Edwards, B., Engheta, N., Ozcan, A.: Diffractive optical computing in free space. nature communications15(1), 1525 (2024)

work page 2024
[41]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

He, K., Girshick, R., Doll´ ar, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4918–4927 (2019)

work page 2019
[42]

Advances in neural information processing systems 33, 3833–3845 (2020)

Zoph, B., Ghiasi, G., Lin, T.-Y., Cui, Y., Liu, H., Cubuk, E.D., Le, Q.: Rethinking pre-training and self-training. Advances in neural information processing systems 33, 3833–3845 (2020)

work page 2020
[43]

In: International Conference on Machine Learning, pp

Hendrycks, D., Lee, K., Mazeika, M.: Using pre-training can improve model robustness and uncertainty. In: International Conference on Machine Learning, pp. 2712–2721 (2019). PMLR

work page 2019
[44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299– 12310 (2021)

work page 2021
[45]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). Ieee

work page 2009
[46]

In: International Conference on Learning Representations (2018)

Ha, D., Eck, D.: A neural representation of sketch drawings. In: International Conference on Learning Representations (2018)

work page 2018
[47]

In: 2017 International Joint Conference on Neural Networks (IJCNN), pp

Cohen, G., Afshar, S., Tapson, J., Van Schaik, A.: Emnist: Extending mnist to handwritten letters. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2921–2926 (2017). IEEE

work page 2017
[48]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 26

work page internal anchor Pith review arXiv 2017
[49]

In: European Conference on Computer Vision, pp

Zhang, Y., Yin, Z., Li, Y., Yin, G., Yan, J., Shao, J., Liu, Z.: Celeba-spoof: Large- scale face anti-spoofing dataset with rich annotations. In: European Conference on Computer Vision, pp. 70–85 (2020). Springer

work page 2020
[50]

https://github

L, S.: China-Balanced-License-Plate-Recognition-Dataset-330k. https://github. com/SunlifeV/CBLPRD-330k (2023)

work page 2023
[51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: Retinaface: Single- shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5203–5212 (2020)

work page 2020
[52]

IEEE transactions on intelligent transportation systems24(5), 5172–5185 (2023)

Ke, X., Zeng, G., Guo, W.: An ultra-fast automatic license plate recogni- tion approach for unconstrained scenarios. IEEE transactions on intelligent transportation systems24(5), 5172–5185 (2023)

work page 2023
[53]

In: 2011 IEEE International Conference on Robotics and Automation, pp

Olson, E.: Apriltag: A robust and flexible visual fiducial system. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3400–3407 (2011). IEEE

work page 2011
[54]

http://www.cs

Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 Dataset. http://www.cs. toronto.edu/∼kriz/cifar.html

work page
[55]

Scientific data10(1), 41 (2023)

Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medm- nist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific data10(1), 41 (2023)

work page 2023
[56]

IEEE transactions on image processing13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assess- ment: from error visibility to structural similarity. IEEE transactions on image processing13(4), 600–612 (2004)

work page 2004
[57]

Advances in Neural Information Processing Systems37, 76232–76264 (2024)

H¨ agele, A., Bakouch, E., Kosson, A., Allal, L.B., Von Werra, L., Jaggi, M.: Scaling laws and compute-optimal training beyond fixed training durations. Advances in Neural Information Processing Systems37, 76232–76264 (2024)

work page 2024
[58]

In: The Twelfth International Conference on Learning Representations (2023)

Liu, H., Li, Z., Hall, D.L.W., Liang, P., Ma, T.: Sophia: A scalable stochas- tic second-order optimizer for language model pre-training. In: The Twelfth International Conference on Learning Representations (2023)

work page 2023
[59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H.,et al.: Robust fine-tuning of zero-shot models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7959–7971 (2022) 27

work page 2022