Strong log-concavity in probit regression

Giacomo Zanella; Martin Chak

arxiv: 2605.31218 · v1 · pith:UQKJ47R3new · submitted 2026-05-29 · 🧮 math.ST · stat.TH

Strong log-concavity in probit regression

Martin Chak , Giacomo Zanella This is my paper

Pith reviewed 2026-06-28 20:04 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords probit regressionstrong log-concavityGaussian designcondition numbermaximum likelihood estimatorlogistic regressionhigh-dimensional asymptotics

0 comments

The pith

Probit regression likelihoods are strongly log-concave without ridge penalization when the Gaussian design ratio r = d/n is small enough.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the log-likelihood for probit regression becomes strongly concave without any added penalty term, unlike the logistic case where penalization is typically needed. It gives an explicit characterization of this strong log-concavity for any fixed design that mirrors the known conditions under which the maximum likelihood estimator exists. For random Gaussian designs the analysis ties the property to the ratio r = d/n: when r is below a sufficient threshold the Hessian condition number stays finite with high probability. In the joint limit n, d to infinity this condition number becomes independent of r.

Core claim

Strong log-concavity of the probit log-likelihood holds without penalization. For fixed designs this is characterized similarly to MLE existence. For Gaussian designs with r = d/n small, the resulting condition number is finite with high probability and asymptotically independent of r as n,d to infinity.

What carries the argument

The strong log-concavity parameter of the probit log-likelihood, which determines whether the Hessian has eigenvalues bounded away from zero and thus yields a finite condition number.

If this is right

The maximum likelihood estimator exists and is unique for qualifying designs without any penalty or prior.
Gradient-based or Newton optimization converges linearly at a rate governed by the finite condition number.
The asymptotic independence from r permits uniform statements about estimation error across a range of dimension-to-sample ratios.
No ridge term is required to guarantee strong concavity, in contrast to the logistic link.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Flat-prior Bayesian inference for probit would then enjoy the same contraction rates as the MLE without extra regularization.
The same characterization may apply to other sigmoid-like links whose second derivative satisfies analogous sign and boundedness conditions.
Numerical checks could locate the critical r threshold by monitoring the minimal eigenvalue over many draws of the design matrix.
Unpenalized probit could be used directly in moderate-dimensional classification tasks where d remains a small fraction of n.

Load-bearing premise

The Gaussian design analysis requires that r = d/n is small enough for the strong log-concavity condition to hold with high probability.

What would settle it

A single Gaussian design matrix with r = 0.05 for which the smallest eigenvalue of the observed information matrix is zero or negative would falsify the high-probability claim.

read the original abstract

We show that strong log-concavity emerges in probit regression likelihoods without ridge penalization (i.e. Gaussian priors), unlike for the logistic case. Specifically, we provide: (a) a characterization of strong log-concavity for fixed designs, similar to that for the existence of the maximum likelihood estimator (MLE) and (b) an analysis for Gaussian design, dependent on the proportionality $d/n = r\in [0, 1)$ between the sample size $n$ and the number of covariates $d$. In the latter case we show that, with high probability, provided $r$ is small enough, the resulting condition number is finite and, in the asymptotic regime $n, d\rightarrow \infty$, independent of $r$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Probit regression gets strong log-concavity without penalization, with a fixed-design characterization and a Gaussian-design result that holds only for small enough r.

read the letter

The main things to know are that the probit likelihood is strongly log-concave without any ridge term, unlike the logistic case, and that for Gaussian designs the Hessian condition number stays finite with high probability when r = d/n is small enough, with the limiting value independent of r.

The fixed-design characterization is the cleanest part. It lines up closely with the known condition for MLE existence, which makes the result easy to place against prior work and gives a concrete distinction between the two link functions. That part feels like a useful observation rather than a heavy lift.

The Gaussian-design analysis is where the soft spot sits. The abstract states the result holds when r is small enough and that the limit does not depend on r, but it supplies no explicit threshold or dependence on signal strength. If the allowable r must shrink with n or with the signal, the asymptotic independence claim would need extra justification. The stress-test note flags exactly this gap, and nothing in the provided abstract removes it.

The rest of the argument appears to rest on standard concentration for random matrices once the fixed-design condition is available. No circularity is visible, and the citations track the relevant GLM and high-dimensional literature.

This is a narrow but focused piece aimed at people who work on curvature properties of GLM likelihoods and their consequences for optimization or conditioning. A reader already familiar with probit MLE conditions will get the extension quickly. The claims are specific enough to be checked by a referee in reasonable time, even if the random-design side needs tightening on the threshold.

I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that strong log-concavity holds for the probit regression log-likelihood without ridge penalization. For fixed designs it gives a characterization of the Hessian condition number that is similar to the condition for MLE existence. For Gaussian designs with d/n = r ∈ [0,1), it asserts that when r is sufficiently small the condition number is finite with high probability and, in the joint limit n,d o∞, the limiting value is independent of r.

Significance. If the Gaussian-design result holds with an explicit, parameter-independent threshold on r, the finding would be significant: it would separate probit from logistic regression by establishing unpenalized strong convexity in high dimensions and would supply a concrete, asymptotically r-independent bound on the Hessian condition number. The fixed-design characterization, if shown to be strictly stronger than MLE existence, would also be useful for optimization and statistical analysis.

major comments (2)

[Abstract] Abstract (Gaussian-design paragraph): the claim that the condition number is finite w.h.p. 'provided r is small enough' and asymptotically independent of r is load-bearing, yet no explicit threshold r* or its dependence on signal strength, noise variance, or other constants is stated. Without such a bound it is unclear whether the result applies to any fixed r>0 as n,d o∞ or whether r* must vanish.
[Fixed-design section] Fixed-design characterization (presumably §3 or §4): strong log-concavity is asserted to be 'similar to' the MLE-existence condition, but the two properties are not equivalent; an explicit mapping or counter-example showing when the stronger property holds is required to support the subsequent Gaussian-design step.

minor comments (1)

[Abstract] Notation for the proportionality constant r = d/n should be introduced once and used consistently; the interval [0,1) is stated but the boundary case r=1 is never discussed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract (Gaussian-design paragraph): the claim that the condition number is finite w.h.p. 'provided r is small enough' and asymptotically independent of r is load-bearing, yet no explicit threshold r* or its dependence on signal strength, noise variance, or other constants is stated. Without such a bound it is unclear whether the result applies to any fixed r>0 as n,d→∞ or whether r* must vanish.

Authors: Our proof in the Gaussian-design analysis shows existence of a positive threshold r* (depending on signal strength, noise variance, and other fixed constants of the model) such that the stated properties hold for all r < r*. The limiting condition number is independent of r for any such fixed r. An explicit closed-form expression for r* is not derived, as it would require substantially sharper concentration bounds; we view the existence result and the r-independence of the limit as the main contributions. We will revise the abstract to clarify the parameter dependence of r*. revision: partial
Referee: [Fixed-design section] Fixed-design characterization (presumably §3 or §4): strong log-concavity is asserted to be 'similar to' the MLE-existence condition, but the two properties are not equivalent; an explicit mapping or counter-example showing when the stronger property holds is required to support the subsequent Gaussian-design step.

Authors: We agree that the term 'similar to' is imprecise and that the two properties are not equivalent. The fixed-design characterization gives an explicit condition on the weighted Gram matrix for the Hessian condition number to be finite. This condition is strictly stronger than the MLE existence condition. We will add a short proposition (with a low-dimensional counter-example) that makes the relationship precise and shows how the stronger condition is used in the Gaussian-design argument. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on independent characterizations and probabilistic analysis

full rationale

The abstract and described claims present a direct mathematical characterization of strong log-concavity for fixed designs (analogous but not identical to MLE existence) and a separate high-probability finite-condition-number result for Gaussian designs when r is small enough, with asymptotic independence of r. No equations or steps are shown that reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the analysis is self-contained against external benchmarks such as known MLE conditions and standard random-design concentration.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; full manuscript would be required to audit them.

pith-pipeline@v0.9.1-grok · 5646 in / 880 out tokens · 22344 ms · 2026-06-28T20:04:20.371267+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 canonical work pages

[1]

E. J. Cand` es and P. Sur. The phase transition for the existence of the maximum likeli- hood estimate in high-dimensional logistic regression.Ann. Statist., 48(1):27–42, 2020

2020
[2]

Caron and T

R. Caron and T. Traynor. The zero set of a polynomial. WSMR Report 05-02, May 2005

2005
[3]

Chak and G

M. Chak and G. Zanella. Complexity of Markov Chain Monte Carlo for Generalized Linear Models, 2025. arXiv:2512.12748

work page arXiv 2025
[4]

Chewi.Log-concave sampling

S. Chewi.Log-concave sampling. Forthcoming, 2026. Available online athttps:// chewisinho.github.io/

2026
[5]

A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log- concave densities.Journal of the Royal Statistical Society Series B: Statistical Method- ology, 79(3):651–676, 2017

2017
[6]

Lecu´ e and S

G. Lecu´ e and S. Mendelson. Sparse recovery under weak moment assumptions.J. Eur. Math. Soc. (JEMS), 19(3):881–904, 2017

2017
[7]

Lesaffre and H

E. Lesaffre and H. Kaufmann. Existence and uniqueness of the maximum likelihood estimator for a multivariate probit model.J. Amer. Statist. Assoc., 87(419):805–811, 1992

1992
[8]

McCullagh and J

P. McCullagh and J. A. Nelder.Generalized linear models. Monographs on Statistics and Applied Probability. Chapman & Hall, London, second edition, 1989

1989
[9]

Mohri, A

M. Mohri, A. Rostamizadeh, and A. Talwalkar.Foundations of machine learning. Adap- tive Computation and Machine Learning. MIT Press, Cambridge, MA, second edition, 2018

2018
[10]

Nesterov.Introductory lectures on convex optimization, volume 87 ofApplied Opti- mization

Y. Nesterov.Introductory lectures on convex optimization, volume 87 ofApplied Opti- mization. Kluwer Academic Publishers, Boston, MA, 2004. A basic course

2004
[11]

M. R. Sampford. Some inequalities on Mill’s ratio and related functions.Ann. Math. Statistics, 24:130–132, 1953

1953
[12]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014

2014
[13]

Tang and Y

W. Tang and Y. Ye. The existence of maximum likelihood estimate in high-dimensional binary response generalized linear models.Electron. J. Stat., 14(2):4028–4053, 2020

2020
[14]

A. W. van der Vaart.Asymptotic statistics, volume 3 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. 10

1998
[15]

M. J. Wainwright.High-dimensional statistics, volume 48 ofCambridge Series in Sta- tistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. A non-asymptotic viewpoint. 11

2019

[1] [1]

E. J. Cand` es and P. Sur. The phase transition for the existence of the maximum likeli- hood estimate in high-dimensional logistic regression.Ann. Statist., 48(1):27–42, 2020

2020

[2] [2]

Caron and T

R. Caron and T. Traynor. The zero set of a polynomial. WSMR Report 05-02, May 2005

2005

[3] [3]

Chak and G

M. Chak and G. Zanella. Complexity of Markov Chain Monte Carlo for Generalized Linear Models, 2025. arXiv:2512.12748

work page arXiv 2025

[4] [4]

Chewi.Log-concave sampling

S. Chewi.Log-concave sampling. Forthcoming, 2026. Available online athttps:// chewisinho.github.io/

2026

[5] [5]

A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log- concave densities.Journal of the Royal Statistical Society Series B: Statistical Method- ology, 79(3):651–676, 2017

2017

[6] [6]

Lecu´ e and S

G. Lecu´ e and S. Mendelson. Sparse recovery under weak moment assumptions.J. Eur. Math. Soc. (JEMS), 19(3):881–904, 2017

2017

[7] [7]

Lesaffre and H

E. Lesaffre and H. Kaufmann. Existence and uniqueness of the maximum likelihood estimator for a multivariate probit model.J. Amer. Statist. Assoc., 87(419):805–811, 1992

1992

[8] [8]

McCullagh and J

P. McCullagh and J. A. Nelder.Generalized linear models. Monographs on Statistics and Applied Probability. Chapman & Hall, London, second edition, 1989

1989

[9] [9]

Mohri, A

M. Mohri, A. Rostamizadeh, and A. Talwalkar.Foundations of machine learning. Adap- tive Computation and Machine Learning. MIT Press, Cambridge, MA, second edition, 2018

2018

[10] [10]

Nesterov.Introductory lectures on convex optimization, volume 87 ofApplied Opti- mization

Y. Nesterov.Introductory lectures on convex optimization, volume 87 ofApplied Opti- mization. Kluwer Academic Publishers, Boston, MA, 2004. A basic course

2004

[11] [11]

M. R. Sampford. Some inequalities on Mill’s ratio and related functions.Ann. Math. Statistics, 24:130–132, 1953

1953

[12] [12]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014

2014

[13] [13]

Tang and Y

W. Tang and Y. Ye. The existence of maximum likelihood estimate in high-dimensional binary response generalized linear models.Electron. J. Stat., 14(2):4028–4053, 2020

2020

[14] [14]

A. W. van der Vaart.Asymptotic statistics, volume 3 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. 10

1998

[15] [15]

M. J. Wainwright.High-dimensional statistics, volume 48 ofCambridge Series in Sta- tistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. A non-asymptotic viewpoint. 11

2019