End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited Readout
Pith reviewed 2026-06-27 17:02 UTC · model grok-4.3
The pith
No incoherent phase mask exceeds the ideal-channel mutual information between detector measurements and class labels; a conventional lens approaches this ceiling and joint optimization yields no gain under full readout.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under full detector readout, no incoherent phase mask can exceed the ideal-channel mutual information between measurements and class labels; a conventional focusing lens approaches this upper bound, and joint optimization of mask and network yields no empirical gain. When readout is constrained by coarse sampling or few measurements, optimized optics improve classification accuracy by raising class separability in the detector data. These gains shrink with rising detector noise, since the mask shapes the signal before noise is added and cannot remove post-detection noise. The benefit is also largest when class-discriminative spectral content is concentrated at lower spatial frequencies than
What carries the argument
The ideal-channel mutual information bound between detector measurements and class labels under incoherent imaging; it functions as a provable upper limit that no phase mask can surpass, thereby explaining the absence of gains from joint optimization under full readout.
If this is right
- Under full readout a conventional lens suffices because it approaches the mutual-information ceiling.
- Optimized phase masks raise class separability only when readout is limited by coarse sampling or few measurements.
- Gains shrink as detector noise increases because optics act before noise addition.
- Co-design helps most when class-discriminative content lies at lower spatial frequencies than within-class variation.
- The same distinctions hold on both synthetic data and standard image benchmarks.
Where Pith is reading between the lines
- The bound implies that, for full-readout classification, engineering effort should shift from optics to detector design or noise reduction.
- If the physical system deviates from the assumed incoherent forward model, the mutual-information ceiling may not apply.
- The framework could be tested on detection or segmentation tasks to check whether readout constraints similarly limit optics gains.
- The spectral-frequency dependence suggests a simple pre-screening test: measure the power spectra of inter-class versus intra-class differences before deciding on co-design.
Load-bearing premise
The analysis assumes that detector noise is added after the optics and cannot be mitigated by the phase mask, and that the forward model of incoherent imaging accurately represents the physical system.
What would settle it
An experiment in which an optimized phase mask achieves strictly higher mutual information to the labels than the ideal-channel bound under full detector readout, or in which joint optimization produces statistically significant accuracy gains over a lens in the full-readout regime.
Figures
read the original abstract
End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism characterizing when and why such systems outperform conventional lens-based imaging is largely lacking. This paper focuses on object classification, a central imaging task, and asks when end-to-end optimization of a phase mask for incoherent imaging improves performance over a conventional focusing lens. We find that these gains arise primarily under constrained detector readout and are limited under full detector readout. In the latter setting, we prove that no incoherent phase mask exceeds the ideal-channel mutual information between detector measurements and class labels; a conventional focusing lens approaches this ceiling, and joint optimization yields no empirical gain. When detector readout is constrained -- by coarse spatial sampling or a limited number of measurements -- optimized optics can substantially improve classification by increasing class separability in the detector measurements. These gains are largest under low detector noise and shrink as noise grows, because the optics shape the signal before it reaches the detector but cannot remove noise added afterward. The advantage also depends on the spectral structure of the task: co-design helps most when class-discriminative content is concentrated at lower spatial frequencies than within-class variation. We develop a theoretical framework formalizing these distinctions and test its predictions on synthetic data and standard benchmarks (MNIST, FashionMNIST, SVHN).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that end-to-end co-optimization of an incoherent phase mask and neural network backend for object classification yields no benefit under full detector readout, because no phase mask can exceed the mutual information of the ideal channel (which a conventional lens approaches); under constrained readout (coarse sampling or limited measurements), optimized optics improve class separability, with gains largest at low noise and when class-discriminative content is at lower spatial frequencies than within-class variation. A theoretical framework is developed and tested on synthetic data plus MNIST, FashionMNIST, and SVHN.
Significance. If the central claims hold, the work supplies a clear formalism distinguishing when optics-computation co-design is useful versus redundant for classification, with the mutual-information upper bound and the spectral-structure condition as notable contributions. The explicit dependence on readout constraints and post-optics noise is a useful practical takeaway, and the use of public benchmarks aids reproducibility.
major comments (2)
- [Theory section deriving the MI bound] The MI bound (abstract and theory section) is derived under the model where the phase mask shapes intensity via the incoherent PSF before additive detector noise is applied. This premise is load-bearing for the claim that 'no incoherent phase mask exceeds the ideal-channel mutual information'; if physical noise (e.g., Poisson) occurs on the intensity before or during propagation, or if the forward model omits non-shift-invariant effects, the inequality may not hold and a phase mask could still improve separability.
- [Empirical evaluation under full readout] The statement that 'a conventional focusing lens approaches this ceiling' (abstract) requires quantitative support: the manuscript should report the numerical gap between the lens MI and the ideal-channel bound on the same datasets used for the empirical tests.
minor comments (2)
- [Abstract] The abstract lists benchmarks but omits class counts, image resolutions, and any preprocessing; these details should be stated explicitly for reproducibility.
- [Theory section] Notation for the incoherent PSF and the ideal channel should be introduced with a single equation reference rather than scattered across paragraphs.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the scope of our theoretical claims and strengthen the empirical presentation. We respond to each major comment below.
read point-by-point responses
-
Referee: [Theory section deriving the MI bound] The MI bound (abstract and theory section) is derived under the model where the phase mask shapes intensity via the incoherent PSF before additive detector noise is applied. This premise is load-bearing for the claim that 'no incoherent phase mask exceeds the ideal-channel mutual information'; if physical noise (e.g., Poisson) occurs on the intensity before or during propagation, or if the forward model omits non-shift-invariant effects, the inequality may not hold and a phase mask could still improve separability.
Authors: Our analysis is developed under the standard model of incoherent imaging (phase mask applied to the object via the PSF) followed by additive post-detection noise. This models common detector readout noise and is the setting in which the mutual-information upper bound holds. We will revise the manuscript to state this modeling assumption explicitly in the theory section and to discuss its implications, including that pre-propagation Poisson noise or non-shift-invariant aberrations would require a separate analysis. The bound and the conclusion that no phase mask exceeds the ideal channel are therefore scoped to the stated forward model. revision: partial
-
Referee: [Empirical evaluation under full readout] The statement that 'a conventional focusing lens approaches this ceiling' (abstract) requires quantitative support: the manuscript should report the numerical gap between the lens MI and the ideal-channel bound on the same datasets used for the empirical tests.
Authors: We agree that reporting the numerical gap will make the claim more precise. In the revised manuscript we will add a table (or figure panel) showing the estimated mutual information achieved by the conventional lens versus the ideal-channel bound for MNIST, FashionMNIST, and SVHN under full readout, using the same estimation procedure employed elsewhere in the paper. revision: yes
Circularity Check
No circularity: MI bound follows from standard information theory on stated model
full rationale
The paper's central proof states that no incoherent phase mask can exceed the ideal-channel mutual information I(detector measurements; labels) under the explicit model of incoherent PSF shaping followed by additive detector noise. This follows directly from the data-processing inequality and the fact that the phase mask cannot alter post-optics noise statistics; the derivation uses textbook information-theoretic arguments rather than any fitted parameter, self-citation chain, or ansatz imported from prior author work. No equation reduces to a tautology or renames a fitted quantity as a prediction. Empirical sections rely on public datasets (MNIST, FashionMNIST, SVHN) without self-referential fitting loops. The assumption that noise is strictly post-optics is a modeling premise, not a circularity in the derivation itself.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Mutual information is defined in the standard way between random variables representing class labels and detector measurements.
- domain assumption Incoherent imaging is modeled as a linear intensity mapping followed by additive detector noise.
Reference graph
Works this paper leans on
-
[1]
D. Guo, S. Shamai, and S. Verd\'u, ``Mutual information and minimum mean-square error in Gaussian channels,'' IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261--1282, Apr. 2005
2005
-
[2]
R. M. Fano, Transmission of Information: A Statistical Theory of Communications. Cambridge, MA: MIT Press, 1961
1961
-
[3]
G. Arya, W. F. Li, C. Roques-Carmes, M. Solja c i\' c , S. G. Johnson, and Z. Lin. End-to-End Optimization of Metasurfaces for Imaging with Compressed Sensing . ACS Photonics, 11(5):2077--2087, 2024. https://doi.org/10.1021/acsphotonics.4c00259
-
[4]
Fisher, G
S. Fisher, G. Arya, A. Majumdar, Z. Lin, and S. G. Johnson, ``End-to-end metasurface design for temperature imaging via broadband Planck-radiation regression,'' Advanced Optical Materials, vol. 13, no. 9, 2025
2025
-
[5]
Molesky, Z
S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vuckovic, and A. W. Rodriguez, ``Inverse design in nanophotonics,'' Nature Photonics, vol. 12, no. 11, pp. 659--670, 2018
2018
-
[6]
A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vuckovic, ``Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,'' Nature Photonics, vol. 9, no. 6, pp. 374--377, 2015
2015
-
[7]
J. Chen, S. Vaidya, S. Pajovic, S. Choi, W. Michaels, L. Martin-Monier, J. Hu, C. Cogswell, C. Roques-Carmes, and M. Solja c i\' c . Wavefront Engineering for Scintillation-Based Imaging . ACS Photonics, 2026. https://doi.org/10.1021/acsphotonics.5c03124
-
[8]
Y. Baek, B. Bae, H. Shin, C. Sonnadara, H. Cho, C.-Y. Lin, Y. Mu, C. Shen, S. Shah, G. Wang, and K. Lee, ``Edge intelligence through in-sensor and near-sensor computing for the artificial intelligence of things,'' npj Unconventional Computing, vol. 2, art. 25, 2025, doi: 10.1038/s44335-025-00040-6
-
[9]
M. Choi and A. Majumdar, ``Free-space optical encoder for computer vision,'' npj Nanophotonics, vol. 2, art. 36, 2025, doi: 10.1038/s44310-025-00082-5
-
[10]
D. Gehrig and D. Scaramuzza, ``Low-latency automotive vision with event cameras,'' Nature, vol. 629, no. 8014, pp. 1034--1040, 2024, doi: 10.1038/s41586-024-07409-w
-
[11]
G. M. Gibson, S. D. Johnson, and M. J. Padgett, ``Single-pixel imaging 12 years on: a review,'' Optics Express, vol. 28, no. 19, pp. 28190--28208, 2020, doi: 10.1364/OE.403195
-
[12]
R. I. Stantchev, X. Yu, T. Blu, and E. Pickwell-MacPherson, ``Real-time terahertz imaging with a single-pixel detector,'' Nature Communications, vol. 11, art. 2535, 2020, doi: 10.1038/s41467-020-16370-x
-
[13]
E. N. Malamas, E. G. M. Petrakis, M. Zervakis, L. Petit, and J.-D. Legat, ``A survey on industrial vision systems, applications and tools,'' Image and Vision Computing, vol. 21, no. 2, pp. 171--188, 2003, doi: 10.1016/S0262-8856(02)00152-X
-
[14]
H. Golnabi and A. Asadpour, ``Design and application of industrial machine vision systems,'' Robotics and Computer-Integrated Manufacturing, vol. 23, no. 6, pp. 630--637, 2007, doi: 10.1016/j.rcim.2007.02.005
-
[15]
Sitzmann, S
V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, ``End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,'' ACM Trans. Graph., vol. 37, no. 4, 2018
2018
-
[16]
Tseng, A
E. Tseng, A. Mosleh, F. Mannan, K. St-Arnaud, A. Sharma, Y. Peng, A. Braun, D. Nowrouzezahrai, J.-F. Lalonde, and F. Heide, ``Differentiable compound optics and processing pipeline optimization for end-to-end camera design,'' ACM Trans. Graph., vol. 40, no. 2, 2021
2021
-
[17]
Colburn, A
S. Colburn, A. Zhan, and A. Majumdar, ``Metasurface optics for full-color computational imaging,'' Science Advances, vol. 4, no. 2, 2018
2018
-
[18]
S. Min, S. Choi, S. Pajovic, S. Vaidya, N. Rivera, S. Fan, M. Solja c i\'c, and C. Roques-Carmes, ``End-to-end design of multicolor scintillators for enhanced energy resolution in X-ray imaging,'' Light: Science & Applications, vol. 14, no. 1, p. 158, 2025. doi: 10.1038/s41377-025-01836-8 https://doi.org/10.1038/s41377-025-01836-8
-
[19]
Tseng, S
E. Tseng, S. Colburn, J. Whitehead, L. Huang, S.-H. Baek, A. Majumdar, and F. Heide, ``Neural nano-optics for high-quality thin lens imaging,'' Nature Communications, vol. 12, no. 1, p. 6493, 2021
2021
-
[20]
X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, ``All-optical machine learning using diffractive deep neural networks,'' Science, vol. 361, no. 6406, pp. 1004--1008, 2018
2018
-
[21]
Y. Luo, D. Mengu, N. T. Yardimci, Y. Rivenson, M. Veli, M. Jarrahi, and A. Ozcan, ``Design of task-specific optical systems using broadband diffractive neural networks,'' Light: Science & Applications, vol. 8, no. 1, p. 112, 2019
2019
-
[22]
Colburn, Y
S. Colburn, Y. Chu, E. Shilzerman, and A. Majumdar, ``Optical frontend for a convolutional neural network,'' Applied Optics, vol. 58, no. 12, pp. 3179--3186, 2019
2019
-
[23]
H. Pinkard, L. Kabuli, E. Markley, T. Chien, J. Jiao, and L. Waller. Information-driven design of imaging systems . arXiv:2405.20559 [physics.optics], 2025
arXiv 2025
-
[24]
L. A. Kabuli, H. Pinkard, E. Markley, C. S. Hung, and L. Waller. Designing lensless imaging systems to maximize information capture . Optica, 13:227--235, 2026
2026
-
[25]
E. Markley, H. Pinkard, L. Kabuli, N. Singh, and L. Waller. Computationally Efficient Information-Driven Optical Design with Interchanging Optimization . arXiv:2507.07789 [eess.IV], 2025
arXiv 2025
-
[26]
Hamerly, J
R. Hamerly, J. R. Basani, A. Sludds, S. K. Vadlamani, and D. Englund, ``Toward the information-theoretic limit of programmable photonics,'' APL Photonics, vol. 10, no. 11, 2025
2025
-
[27]
B. W. Brunton, S. L. Brunton, J. L. Proctor, and J. N. Kutz, ``Optimal Sensor Placement and Enhanced Sparsity for Classification,'' arXiv preprint arXiv:1310.4217, 2013. Available: https://arxiv.org/abs/1310.4217
Pith/arXiv arXiv 2013
-
[28]
L. Mennel, D. K. Polyushkin, D. Kwak, et al., ``Sparse pixel image sensor,'' Scientific Reports, vol. 12, art. 5650, 2022. doi: 10.1038/s41598-022-09594-y
-
[29]
J. J. Jaeger et al., ``A sparse data scan circuit for pixel detector readout,'' IEEE Transactions on Nuclear Science, vol. 41, no. 3, pt. 2, Jun. 1994. doi: 10.1109/23.299813
-
[30]
J. N. Kutz, Data-Driven Modeling & Scientific Computation: Methods for Complex Systems & Big Data. Oxford University Press, 2013
2013
-
[31]
J. W. Goodman, Introduction to Fourier Optics, 4th ed. New York, NY, USA: W. H. Freeman and Company, 2017
2017
-
[32]
Born and E
M. Born and E. Wolf, Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light, 7th ed. Cambridge, U.K.: Cambridge University Press, 1999
1999
-
[33]
Bounds on mutual information of mixture data for classification tasks,
Y. Ding and A. Ashok, "Bounds on mutual information of mixture data for classification tasks," J. Opt. Soc. Am. A 39, 1160--1171 (2022)
2022
-
[34]
S. M. Kay, Fundamentals of Statistical Signal Processing, Volume II: Detection Theory. Upper Saddle River, NJ, USA: Prentice Hall, 1998
1998
-
[35]
H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York, NY, USA: Wiley, 2001
2001
-
[36]
C. K. Abbey and M. P. Eckstein, ``Classification images for simple detection and discrimination tasks in correlated noise,'' J. Opt. Soc. Am. A, vol. 24, no. 12, pp. B110--B124, Dec. 2007
2007
-
[37]
Modelling the power spectra of natural images: Statistics and information,
A. van der Schaaf and J. H. van Hateren, “Modelling the power spectra of natural images: Statistics and information,” Vision Research, vol. 36, no. 17, pp. 2759--2770, 1996
1996
-
[38]
Yedidia, C
A. Yedidia, C. Thrampoulidis, and G. Wornell, ``Analysis and optimization of aperture design in computational imaging,'' in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2018, pp. 4029--4033
2018
-
[39]
A. Oliva and A. Torralba, ``Modeling the shape of the scene: A holistic representation of the spatial envelope,'' International Journal of Computer Vision, vol. 42, no. 3, pp. 145--175, 2001, doi: 10.1023/A:1011139631724
-
[40]
A. Torralba and A. Oliva, ``Statistics of natural image categories,'' Network: Computation in Neural Systems, vol. 14, no. 3, pp. 391--412, 2003, doi: 10.1088/0954-898X/14/3/302
-
[41]
C. A. Collin and P. A. McMullen, ``Subordinate-level categorization relies on high spatial frequencies to a greater degree than basic-level categorization,'' Perception & Psychophysics, vol. 67, no. 2, pp. 354--364, 2005, doi: 10.3758/BF03206498
-
[42]
P. A. Lachenbruch, Discriminant Analysis. New York: Hafner Press, 1975
1975
-
[43]
W. R. Klecka, Discriminant Analysis, Quantitative Applications in the Social Sciences Series, no. 19. Thousand Oaks, CA, USA: Sage Publications, 1980
1980
-
[44]
V. I. Bogachev, Measure Theory, vol. I. Berlin, Heidelberg, New York: Springer-Verlag, 2007
2007
-
[45]
R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2012
2012
-
[46]
C. M. Bishop, Pattern Recognition and Machine Learning. Springer, New York, 2006
2006
-
[47]
Etemad and R
K. Etemad and R. Chellappa, ``Discriminant analysis for recognition of human face images,'' Journal of the Optical Society of America A, vol. 14, no. 8, pp. 1724--1733, 1997
1997
-
[48]
H. Gan, N. Sang, and R. Huang, ``Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation,'' Journal of the Optical Society of America A, vol. 31, pp. 1--6, 2014
2014
-
[49]
Goudail, P
F. Goudail, P. R\'efr\'egier, and G. Delyon, ``Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images,'' J. Opt. Soc. Am. A, vol. 21, no. 7, pp. 1231--1240, 2004
2004
-
[50]
Nielsen, ``Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means,'' Pattern Recognition Letters, vol
F. Nielsen, ``Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means,'' Pattern Recognition Letters, vol. 42, pp. 25--34, 2014
2014
-
[51]
Matsushima and T
K. Matsushima and T. Shimobaba, ``Band-Limited Angular Spectrum Method for Numerical Simulation of Free-Space Propagation in Far and Near Fields,'' Opt. Express, vol. 17, pp. 19662--19673, 2009
2009
-
[52]
J. R. Janesick, Scientific Charge-Coupled Devices . SPIE Press, Bellingham, WA, 2001
2001
-
[53]
H. H. Hopkins, ``The frequency response of a defocused optical system,'' Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 231, no. 1184, pp. 91--103, 1955
1955
-
[54]
LeCun, C
Y. LeCun, C. Cortes, and C. J. C. Burges. The MNIST database of handwritten digits . Available at http://yann.lecun.com/exdb/mnist/. 1998
1998
-
[55]
Deng, ``The MNIST database of handwritten digit images for machine learning research,'' IEEE Signal Processing Magazine, vol
L. Deng, ``The MNIST database of handwritten digit images for machine learning research,'' IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141--142, Nov. 2012
2012
-
[56]
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . arXiv:1708.07747 [cs.LG], 2017
Pith/arXiv arXiv 2017
-
[57]
I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks . arXiv:1312.6082 [cs.CV], 2014
Pith/arXiv arXiv 2014
-
[58]
Shastri and F
K. Shastri and F. Monticone, ``Nonlocal flat optics,'' Nature Photonics, vol. 17, no. 1, pp. 36--47, Dec. 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.