An Optimal Control Approach to Early Stopping Variational Methods for Image Restoration

Alexander Effland; Erich Kobler; Karl Kunisch; Thomas Pock

arxiv: 1907.08488 · v1 · pith:H4JBGZGUnew · submitted 2019-07-19 · 🧮 math.OC · cs.LG· eess.IV

An Optimal Control Approach to Early Stopping Variational Methods for Image Restoration

Alexander Effland , Erich Kobler , Karl Kunisch , Thomas Pock This is my paper

Pith reviewed 2026-05-24 19:00 UTC · model grok-4.3

classification 🧮 math.OC cs.LGeess.IV

keywords optimal controlearly stoppingvariational methodsimage restorationimage denoisingimage deblurringgradient flow

0 comments

The pith

Learning an optimal stopping time via optimal control improves variational gradient flows for image restoration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Variational methods for image processing typically reach their best quality when the gradient flow is stopped early instead of running to a stationary point. This occurs because of an inherent tradeoff between the error from incomplete optimization and the error from the model itself not perfectly matching the data. The paper treats the stopping time itself as the variable to optimize and learns it directly from training data by casting the problem as an optimal control task. The resulting schemes run efficiently and match the performance of existing methods on denoising and deblurring.

Core claim

By introducing an optimal stopping time into the gradient flow process and learning it from data by means of an optimal control approach, we obtain highly efficient numerical schemes that achieve competitive results for image denoising and image deblurring. A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights about the different regularization properties.

What carries the argument

Optimal stopping time learned from data via an optimal control formulation of the gradient flow.

If this is right

The learned stopping time produces competitive numerical results on image denoising and deblurring tasks.
Nonlinear spectral analysis of the learned regularizer reveals its distinct regularization properties.
The formulation remains valid even when the regularizer itself is learned from data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single learned stopping time may simplify parameter tuning compared with methods that adjust multiple hyperparameters separately.
The same optimal-control framing could be applied to other inverse problems where early stopping improves practical performance.

Load-bearing premise

The tradeoff between optimization and modelling errors in variational models can be captured and optimized by learning a single stopping time via an optimal control formulation from data.

What would settle it

If a fixed stopping time or a standard early-stopping heuristic matches or exceeds the restoration quality of the learned stopping-time scheme on held-out denoising or deblurring test images, the central claim is falsified.

Figures

Figures reproduced from arXiv: 1907.08488 by Alexander Effland, Erich Kobler, Karl Kunisch, Thomas Pock.

**Figure 2.** Figure 2: Image sequence with globally best PSNR value. Left to right: input [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Schematic drawing of optimal trajectory (black curve) as well as [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Left: Trajectories of the state equation for [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Plots of the average PSNR value across the test set (first and third [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 9.** Figure 9: In the case of denoising, for T < T we still observe noisy images, whereas for too large T local image patterns are smoothed out. For image deblurring, images computed with too small values of T remain blurry, while for 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 6.** Figure 6: Average change of consecutive convolution kernels (solid blue) and [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Plots of the energies (first and third plot) and first order conditions [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: From left to right: ground truth image, noisy input image ( [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: From left to right: ground truth image, blurry input image ( [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Band plots of the energies (blue plots) and first order conditions [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Triplets of 7 × 7-kernels (top), potential functions ρ (middle) and activation functions φ (bottom) learned for image denoising. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Triplets of 7 × 7-kernels (top), potential functions ρ (middle) and activation functions φ (bottom) learned for image deblurring. which shows that the regularizer has a tendency to decrease the contrast. Formula (34) also reveals that eigenfunctions corresponding to contrast factors close to 1 are preserved over several iterations. In summary, the learned regularizer has a tendency to reduce the contrast… view at source ↗

**Figure 13.** Figure 13: Nv = 64 eigenpairs for image denoising, where all eigenfunctions have the resolution 127×127 and the intensity of each eigenfunction is adjusted to [0, 1]. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Nv = 64 eigenpairs for image deblurring, where all eigenfunctions have the resolution 127×127 and the intensity of each eigenfunction is adjusted to [0, 1]. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

read the original abstract

We investigate a well-known phenomenon of variational approaches in image processing, where typically the best image quality is achieved when the gradient flow process is stopped before converging to a stationary point. This paradox originates from a tradeoff between optimization and modelling errors of the underlying variational model and holds true even if deep learning methods are used to learn highly expressive regularizers from data. In this paper, we take advantage of this paradox and introduce an optimal stopping time into the gradient flow process, which in turn is learned from data by means of an optimal control approach. As a result, we obtain highly efficient numerical schemes that achieve competitive results for image denoising and image deblurring. A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights about the different regularization properties.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames early stopping as an optimal control problem to learn a data-driven stopping time T for variational gradient flows, which is a clean new angle but rests on a single fixed T that may not adapt to image variation.

read the letter

The core move here is recasting the early-stopping paradox in variational image restoration as an optimal control problem whose solution is a stopping time learned from data. That framing is new relative to the usual hand-tuned or heuristic stopping rules, and it directly targets the tradeoff between optimization error and modeling error in the gradient flow. They apply it to denoising and deblurring and add a nonlinear spectral analysis of the learned regularizer to see what regularization properties emerge. That analysis is a nice extra that could be useful on its own. The abstract claims the resulting schemes are competitive and efficient, which is plausible if the control formulation works as described. The main soft spot is the assumption that one scalar T, optimized on training data, is enough. In denoising and deblurring the optimal stopping point routinely shifts with image content, noise strength, or blur kernel, so a single learned T is likely a compromise rather than a full exploitation of the paradox. Without train/test details or per-image variation numbers it is hard to judge how much this matters in practice. The work is aimed at people already working on variational methods and early stopping in imaging. It is worth sending to a serious referee because the optimal-control formulation is worth testing even if the single-T model turns out to need per-image extensions or more validation.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes learning a single stopping time T for the gradient flow of variational image restoration models via an optimal control formulation. This exploits the known tradeoff between optimization error and modeling error (the early-stopping paradox), even when the regularizer is learned from data. The resulting schemes are applied to denoising and deblurring and are claimed to be highly efficient while achieving competitive results; a nonlinear spectral analysis of the learned regularizer is provided for interpretability.

Significance. If the central construction holds, the work supplies a principled, data-driven mechanism for selecting stopping times in variational flows and demonstrates that optimal-control ideas can be used to turn the early-stopping paradox into a practical advantage. The spectral analysis of the learned regularizer is a concrete strength that may aid interpretability. The approach sits at the intersection of optimal control, variational methods, and learning, which is timely for the math.OC community.

major comments (2)

[§3 and §4] §3 (optimal-control formulation) and §4 (experiments): the method learns and deploys a single scalar stopping time T for the entire test set. No analysis is supplied showing the variation of per-image optimal stopping times (e.g., histograms or standard deviation of T* across the training images or across noise levels). If this variation is large, the single-T compromise undermines the claim that the formulation directly exploits the paradox to produce competitive, generalizable schemes.
[§4] §4 (experimental protocol): because T is learned from data, the manuscript must demonstrate that the reported competitive results are obtained on held-out test images after T has been fixed on a disjoint training set. No explicit statement of the train/test split, cross-validation procedure, or independent validation of T appears; without it the evaluation risks circularity and the competitiveness claim cannot be verified.

minor comments (1)

[Abstract and §4] The abstract states that the schemes are “highly efficient” yet the manuscript does not report wall-clock times, iteration counts, or flop counts relative to full convergence or to standard early-stopping heuristics; adding a small efficiency table would strengthen the efficiency claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the recognition of the work's timeliness at the intersection of optimal control and variational methods. We address the two major comments point by point below, proposing revisions to strengthen the manuscript where the concerns are valid.

read point-by-point responses

Referee: [§3 and §4] §3 (optimal-control formulation) and §4 (experiments): the method learns and deploys a single scalar stopping time T for the entire test set. No analysis is supplied showing the variation of per-image optimal stopping times (e.g., histograms or standard deviation of T* across the training images or across noise levels). If this variation is large, the single-T compromise undermines the claim that the formulation directly exploits the paradox to produce competitive, generalizable schemes.

Authors: We agree that quantifying the variation of per-image optimal stopping times would strengthen the paper and allow readers to evaluate the single-T compromise directly. Although the formulation is designed to learn one global T (as stated in §3), we will add in the revision histograms of per-image T* values obtained by solving the optimal-control problem independently on each training image, together with mean, standard deviation, and dependence on noise level. This analysis will be placed in §4. We maintain that the global-T results remain competitive even if variation exists, because the optimal-control objective explicitly balances the early-stopping tradeoff across the training distribution; the added figures will make this transparent rather than undermine the claim. revision: yes
Referee: [§4] §4 (experimental protocol): because T is learned from data, the manuscript must demonstrate that the reported competitive results are obtained on held-out test images after T has been fixed on a disjoint training set. No explicit statement of the train/test split, cross-validation procedure, or independent validation of T appears; without it the evaluation risks circularity and the competitiveness claim cannot be verified.

Authors: We accept that the experimental protocol description in §4 lacks sufficient explicitness on data partitioning. The underlying experiments follow the conventional splits of the BSDS500 and Set12/Set14 datasets (training subset for learning the regularizer and T, disjoint test subset for final evaluation), but this was not stated clearly. In the revised manuscript we will insert a concise paragraph at the beginning of §4 that specifies the exact train/test division, confirms that T is learned solely on the training portion and then frozen, and states that all quantitative and visual results are reported on the held-out test images. This removes any ambiguity about circularity. revision: yes

Circularity Check

0 steps flagged

No circularity: optimal control formulation for stopping time is a standard data-driven method

full rationale

The paper formulates early stopping of gradient flow as an optimal control problem whose solution (stopping time T) is learned from data to balance optimization and modeling error. This is a conventional supervised learning setup whose output (restored images on test data) is not equivalent to the training inputs by construction. No self-definitional steps, fitted-input-called-prediction reductions, or load-bearing self-citations are identifiable from the abstract or description; the central claim rests on the external validity of the learned T rather than re-deriving the input data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the learning procedure itself is the main unstated modeling choice.

pith-pipeline@v0.9.0 · 5664 in / 857 out tokens · 21321 ms · 2026-05-24T19:00:20.192121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savare. Gradient Flows in Metric Spaces and in the Space of Probability Measures . Birkh¨ auser Basel, 2008

work page 2008
[2]

Atkinson

K. Atkinson. An introduction to numerical analysis . John Wiley & Sons, second edition, 1989

work page 1989
[3]

Benning, E

M. Benning, E. Celledoni, M. Ehrhardt, B. Owren, and C.-B. Sch¨ onlieb. Deep learning as optimal control problems: models and numerical methods. 2019

work page 2019
[4]

Binder, M

A. Binder, M. Hanke, and O. Scherzer. (2009) On the Landweber itera- tion for nonlinear ill-posed problems J. Inv. Ill-Posed Prob/ems , 4(5):381– 390,1996

work page 2009
[5]

J. C. Butcher. Numerical Methods for Ordinary Diﬀerential Equations . John Wiley & Sons, second edition, 2008

work page 2008
[6]

Chang, L

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham. Reversible Architectures for Arbitrarily Deep Residual Neural Networks. AAAI Conference on Artiﬁcial Intelligence , 2018

work page 2018
[7]

Chambolle, V

A. Chambolle, V. Caselles, M. Novaga, D. Cremers, and T. Pock. An introduction to total variation for image analysis, 2009

work page 2009
[8]

Chambolle and T

A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numer., 25:161–319, 2016. 29

work page 2016
[9]

Y. Chen, R. Ranftl, and T. Pock. Insights into analysis operator learning: From patch-based sparse models to higher-order MRFs. IEEE transactions on image processing, 99(1):1060–1072, 2014

work page 2014
[10]

Chen and T

Y. Chen and T. Pock. Trainable nonlinear reaction diﬀusion: A ﬂexible framework for fast and eﬀective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence , 39(6):1256–1272, 2017

work page 2017
[11]

W. E. A proposal on machine learning via dynamical systems Commun Math Stat, 5:1–11, 2017

work page 2017
[12]

W. E, J. Han, and Q. Li. A mean-ﬁeld optimal control formulation of deep learning. Res Math Sci , 6(10), 2019

work page 2019
[13]

G. Gilboa. Nonlinear Eigenproblems in Image Processing and Computer Vision. Springer International Publishing AG, 2018

work page 2018
[14]

Haber and L

E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1), 2017

work page 2017
[15]

J. K. Hale. Ordinary Diﬀerential Equations. Dover Publications, 1980

work page 1980
[16]

Hammernik, T

K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll. Learning a variational network for reconstruction of accelerated MRI data. Magnetic Resonance in Medicine, 79(6):3055–3071, 2018

work page 2018
[17]

He and X

K. He and X. Zhang and S. Ren and J. Sun. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[18]

Ito and K

K. Ito and K. Kunisch. Lagrange Multiplier Approach to Variational Prob- lems and Applications . Society for Industrial and Applied Mathematics, 2008

work page 2008
[19]

Kobler, T

E. Kobler, T. Klatzer, K. Hammernik, and T. Pock. Variational networks: Connecting variational methods and deep learning. In Pattern Recognition, pages 281–293. Springer International Publishing, 2017

work page 2017
[20]

Landweber

L. Landweber. An iteration formula for fredholm integral equations of the ﬁrst kind. American Journal of Mathematics , 73(3):615–624, 1951

work page 1951
[21]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436–444, 2015

work page 2015
[22]

Q. Li, L. Chen, C. Tai, and W. E. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research , 18:1–29, 2018

work page 2018
[23]

Li and S

Q. Li and S. Hao. An optimal control approach to deep learning and applications to discrete-weight neural networks. 2018. 30

work page 2018
[24]

Martin, C

D. Martin, C. Fowlkes, D. Tal, J. Malik. A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In International Conference on Com- puter Vision, 2001

work page 2001
[25]

Don't relax: early stopping for convex regularization

S. Matet, L. Rosasco, S. Villa, and B. L. Vu. Dont relax: early stopping for convex regularization. arXiv:1707.05422, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Mumford and J

D. Mumford and J. Shah. Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. , 42(5):577–685, 1989

work page 1989
[27]

Perona and J

P. Perona and J. Malik. Scale-space and edge detection using anisotropic diﬀusion. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 12(7):629–639, 1990

work page 1990
[28]

Pock and S

T. Pock and S. Sabach. Inertial proximal alternating linearized minimiza- tion (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci., 9(4):1756–1787, 2016

work page 2016
[29]

Prechelt

L. Prechelt. Early Stopping — But When? In Neural Networks: Tricks of the Trade, second edition, Springer Berlin Heidelberg, pages 53–67, 2012

work page 2012
[30]

Rosasco and S

L. Rosasco and S. Villa. Learning with Incremental Iterative Regulariza- tion. In Advances in Neural Information Processing Systems 28 , pages 1630–1638, 2015

work page 2015
[31]

Roth and M

S. Roth and M. J. Black. Fields of Experts. Int J Comput Vis , 82(2):205– 229, 2009

work page 2009
[32]

Raskutti, M

G. Raskutti, M. J. Wainwright, and B. Yu. Early stopping for non- parametric regression: An optimal data-dependent stopping rule. In 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1318–1325, 2011

work page 2011
[33]

Rudin, S

L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992

work page 1992
[34]

Schulter, C

S. Schulter, C. Leistner, and H. Bischof. Fast and accurate image upscaling with super-resolution forests. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3791–3799, 2015

work page 2015
[35]

G. Teschl. Ordinary Diﬀerential Equations and Dynamical Systems. Amer- ican Mathematical Society, 2012

work page 2012
[36]

Y. Yao, L. Rosasco, and A. Caponnetto. On Early Stopping in Gradient Descent Learning. Constructive Approximation, 26(2):289–315, 2007

work page 2007
[37]

E. Zeidler. Nonlinear Functional Analysis and its Applications III: Varia- tional Methods and Optimization . Springer-Verlag New York, 1985. 31

work page 1985
[38]

Zhang and B

T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. Annals of Statistics , 33(4):1538–1579, 2005. 32

work page 2005

[1] [1]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savare. Gradient Flows in Metric Spaces and in the Space of Probability Measures . Birkh¨ auser Basel, 2008

work page 2008

[2] [2]

Atkinson

K. Atkinson. An introduction to numerical analysis . John Wiley & Sons, second edition, 1989

work page 1989

[3] [3]

Benning, E

M. Benning, E. Celledoni, M. Ehrhardt, B. Owren, and C.-B. Sch¨ onlieb. Deep learning as optimal control problems: models and numerical methods. 2019

work page 2019

[4] [4]

Binder, M

A. Binder, M. Hanke, and O. Scherzer. (2009) On the Landweber itera- tion for nonlinear ill-posed problems J. Inv. Ill-Posed Prob/ems , 4(5):381– 390,1996

work page 2009

[5] [5]

J. C. Butcher. Numerical Methods for Ordinary Diﬀerential Equations . John Wiley & Sons, second edition, 2008

work page 2008

[6] [6]

Chang, L

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham. Reversible Architectures for Arbitrarily Deep Residual Neural Networks. AAAI Conference on Artiﬁcial Intelligence , 2018

work page 2018

[7] [7]

Chambolle, V

A. Chambolle, V. Caselles, M. Novaga, D. Cremers, and T. Pock. An introduction to total variation for image analysis, 2009

work page 2009

[8] [8]

Chambolle and T

A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numer., 25:161–319, 2016. 29

work page 2016

[9] [9]

Y. Chen, R. Ranftl, and T. Pock. Insights into analysis operator learning: From patch-based sparse models to higher-order MRFs. IEEE transactions on image processing, 99(1):1060–1072, 2014

work page 2014

[10] [10]

Chen and T

Y. Chen and T. Pock. Trainable nonlinear reaction diﬀusion: A ﬂexible framework for fast and eﬀective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence , 39(6):1256–1272, 2017

work page 2017

[11] [11]

W. E. A proposal on machine learning via dynamical systems Commun Math Stat, 5:1–11, 2017

work page 2017

[12] [12]

W. E, J. Han, and Q. Li. A mean-ﬁeld optimal control formulation of deep learning. Res Math Sci , 6(10), 2019

work page 2019

[13] [13]

G. Gilboa. Nonlinear Eigenproblems in Image Processing and Computer Vision. Springer International Publishing AG, 2018

work page 2018

[14] [14]

Haber and L

E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1), 2017

work page 2017

[15] [15]

J. K. Hale. Ordinary Diﬀerential Equations. Dover Publications, 1980

work page 1980

[16] [16]

Hammernik, T

K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll. Learning a variational network for reconstruction of accelerated MRI data. Magnetic Resonance in Medicine, 79(6):3055–3071, 2018

work page 2018

[17] [17]

He and X

K. He and X. Zhang and S. Ren and J. Sun. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[18] [18]

Ito and K

K. Ito and K. Kunisch. Lagrange Multiplier Approach to Variational Prob- lems and Applications . Society for Industrial and Applied Mathematics, 2008

work page 2008

[19] [19]

Kobler, T

E. Kobler, T. Klatzer, K. Hammernik, and T. Pock. Variational networks: Connecting variational methods and deep learning. In Pattern Recognition, pages 281–293. Springer International Publishing, 2017

work page 2017

[20] [20]

Landweber

L. Landweber. An iteration formula for fredholm integral equations of the ﬁrst kind. American Journal of Mathematics , 73(3):615–624, 1951

work page 1951

[21] [21]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436–444, 2015

work page 2015

[22] [22]

Q. Li, L. Chen, C. Tai, and W. E. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research , 18:1–29, 2018

work page 2018

[23] [23]

Li and S

Q. Li and S. Hao. An optimal control approach to deep learning and applications to discrete-weight neural networks. 2018. 30

work page 2018

[24] [24]

Martin, C

D. Martin, C. Fowlkes, D. Tal, J. Malik. A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In International Conference on Com- puter Vision, 2001

work page 2001

[25] [25]

Don't relax: early stopping for convex regularization

S. Matet, L. Rosasco, S. Villa, and B. L. Vu. Dont relax: early stopping for convex regularization. arXiv:1707.05422, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Mumford and J

D. Mumford and J. Shah. Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. , 42(5):577–685, 1989

work page 1989

[27] [27]

Perona and J

P. Perona and J. Malik. Scale-space and edge detection using anisotropic diﬀusion. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 12(7):629–639, 1990

work page 1990

[28] [28]

Pock and S

T. Pock and S. Sabach. Inertial proximal alternating linearized minimiza- tion (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci., 9(4):1756–1787, 2016

work page 2016

[29] [29]

Prechelt

L. Prechelt. Early Stopping — But When? In Neural Networks: Tricks of the Trade, second edition, Springer Berlin Heidelberg, pages 53–67, 2012

work page 2012

[30] [30]

Rosasco and S

L. Rosasco and S. Villa. Learning with Incremental Iterative Regulariza- tion. In Advances in Neural Information Processing Systems 28 , pages 1630–1638, 2015

work page 2015

[31] [31]

Roth and M

S. Roth and M. J. Black. Fields of Experts. Int J Comput Vis , 82(2):205– 229, 2009

work page 2009

[32] [32]

Raskutti, M

G. Raskutti, M. J. Wainwright, and B. Yu. Early stopping for non- parametric regression: An optimal data-dependent stopping rule. In 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1318–1325, 2011

work page 2011

[33] [33]

Rudin, S

L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992

work page 1992

[34] [34]

Schulter, C

S. Schulter, C. Leistner, and H. Bischof. Fast and accurate image upscaling with super-resolution forests. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3791–3799, 2015

work page 2015

[35] [35]

G. Teschl. Ordinary Diﬀerential Equations and Dynamical Systems. Amer- ican Mathematical Society, 2012

work page 2012

[36] [36]

Y. Yao, L. Rosasco, and A. Caponnetto. On Early Stopping in Gradient Descent Learning. Constructive Approximation, 26(2):289–315, 2007

work page 2007

[37] [37]

E. Zeidler. Nonlinear Functional Analysis and its Applications III: Varia- tional Methods and Optimization . Springer-Verlag New York, 1985. 31

work page 1985

[38] [38]

Zhang and B

T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. Annals of Statistics , 33(4):1538–1579, 2005. 32

work page 2005