Optuna Constrained Tree-Structured Parzen Estimator Is a Joint Density Generalization of c-TPE
Pith reviewed 2026-06-28 07:37 UTC · model grok-4.3
The pith
Optuna's constrained TPE builds a single joint density over objective and constraints to compute the expected constrained improvement acquisition function.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Optuna's constrained TPE is joint c-TPE: it uses the same expected constrained improvement acquisition function but replaces the product of independent densities with a single joint likelihood over objective and constraint values constructed directly from observed trials.
What carries the argument
Joint likelihood model over objective and constraints, which replaces the independent product inside the ECI acquisition function.
If this is right
- Joint c-TPE acquisition values remain identical when a constraint is duplicated in the problem statement.
- Independent c-TPE acquisition values change and typically degrade when duplicated constraints multiply extra factors into the likelihood product.
- The choice between joint and independent formulations affects robustness in problems that contain repeated or redundant constraints.
- Future analysis can compare the two forms on benchmarks that vary the degree of constraint overlap.
Where Pith is reading between the lines
- Joint modeling may be preferable whenever constraints share latent structure that an independence assumption would ignore.
- The invariance property could be tested in other acquisition functions that combine multiple likelihood terms.
- Implementations could expose a switch between joint and independent modes so users can match the formulation to their constraint set.
Load-bearing premise
The joint density is constructed directly from the observed data without forcing independence between the objective and the constraints.
What would settle it
Run both formulations on the same set of observed trials and check whether the acquisition values produced by Optuna's TPE match the joint ECI expression but diverge from the independent product expression.
Figures
read the original abstract
Constrained hyperparameter optimization (HPO) is common in practice, yet Optuna's widely used constrained TPE lacks algorithmic analysis. While c-TPE proposes an expected constrained improvement (ECI) approach assuming independence between the objective and constraints, Optuna uses a single joint density over both. We show that Optuna's constrained TPE is joint c-TPE -- the same ECI acquisition function using a joint likelihood. We demonstrate joint c-TPE is invariant to constraint duplication whereas independent c-TPE degrades as the product accumulates duplicated factors. We outline practical tradeoffs between the formulations and directions for future study.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that Optuna's constrained TPE implements a joint-density version of c-TPE: both use the same expected constrained improvement (ECI) acquisition function, but Optuna models a joint likelihood p(y, c | x) over the objective and constraints rather than the factorized form assumed in the original c-TPE. It further shows that the joint formulation is invariant to constraint duplication while the independent version degrades, and discusses practical trade-offs between the two.
Significance. If the equivalence is established, the work supplies a useful algorithmic clarification of a widely deployed but previously unanalyzed method in constrained hyperparameter optimization. The invariance result supplies a concrete, testable distinction that can inform practitioner choice between formulations; the analysis also highlights modeling assumptions that affect robustness under repeated constraints.
major comments (1)
- [derivation of equivalence (likely §3)] The central equivalence claim requires an explicit demonstration that Optuna's density estimator constructs a true joint p(y, c | x) rather than a product of separate Parzen estimators. Without the explicit joint-density construction (or a proof that no implicit factorization occurs), the claim that Optuna realizes joint c-TPE rather than independent c-TPE remains unverified at the level needed to support the title and abstract.
minor comments (2)
- Notation for the joint versus independent likelihoods should be introduced with a single, consistent table or equation block early in the paper to aid comparison.
- The invariance proof would benefit from a short numerical example (e.g., two identical constraints) showing the numerical degradation of the independent ECI versus constancy of the joint ECI.
Simulated Author's Rebuttal
We thank the referee for their careful review and for highlighting the need for a more explicit demonstration of the joint-density construction. We address the major comment below and will revise the manuscript to strengthen this section.
read point-by-point responses
-
Referee: [derivation of equivalence (likely §3)] The central equivalence claim requires an explicit demonstration that Optuna's density estimator constructs a true joint p(y, c | x) rather than a product of separate Parzen estimators. Without the explicit joint-density construction (or a proof that no implicit factorization occurs), the claim that Optuna realizes joint c-TPE rather than independent c-TPE remains unverified at the level needed to support the title and abstract.
Authors: We agree that an explicit construction is necessary to fully support the central claim. In the revised version we will expand §3 with a new subsection that (i) reproduces the relevant excerpt from Optuna’s source code for the constrained TPE density estimator, (ii) shows that a single multivariate Parzen estimator is fitted to the concatenated observations (y, c) rather than to y and c separately, and (iii) contrasts this with the factorized form used in the original c-TPE. We will also add a short proof that the resulting acquisition function is exactly the ECI expression under the joint model. These additions will be placed immediately after the current derivation of ECI so that the equivalence is verified at the level required by the title and abstract. revision: yes
Circularity Check
Analysis of existing Optuna and c-TPE implementations; no fitted prediction or self-defined result
full rationale
The paper performs a direct comparison between Optuna's existing constrained TPE code and the joint-density formulation of c-TPE. The central claim equates the two via the shared ECI acquisition function once the density model is recognized as joint rather than factorized. No new parameters are fitted to data and then used to 'predict' a related quantity; the equivalence follows from inspecting the modeling choice already present in the implementations. No load-bearing self-citation chain or uniqueness theorem imported from the authors' prior work is required. This is the normal case of an analysis paper whose result is self-contained against external code and prior definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The objective and constraint observations can be modeled by a joint density estimated from data.
Reference graph
Works this paper leans on
-
[1]
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna : A next-generation hyperparameter optimization framework. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
2019
-
[2]
Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems
2011
-
[3]
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II . IEEE Transactions on Evolutionary Computation , 6(2)
2002
-
[4]
R., Kusner, M
Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). B ayesian optimization with inequality constraints. In International Conference on Machine Learning
2014
-
[5]
A., Snoek, J., and Adams, R
Gelbart, M. A., Snoek, J., and Adams, R. P. (2014). B ayesian optimization with unknown constraints. In Uncertainty in Artificial Intelligence
2014
-
[6]
Ozaki, Y., Tanigaki, Y., Watanabe, S., Nomura, M., and Onishi, M. (2022). Multiobjective tree-structured P arzen estimator. Journal of Artificial Intelligence Research , 73
2022
-
[7]
Ozaki, Y., Tanigaki, Y., Watanabe, S., and Onishi, M. (2020). Multiobjective tree-structured P arzen estimator for computationally expensive optimization problems. In Genetic and Evolutionary Computation Conference
2020
-
[8]
Ozaki, Y., Watanabe, S., and Yanase, T. (2026). OptunaHub : A platform for black-box optimization. Journal of Machine Learning Research
2026
-
[9]
Watanabe, S. (2023). Tree-structured P arzen estimator: Understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
and Hutter, F
Watanabe, S. and Hutter, F. (2023). c-TPE : Tree-structured P arzen estimator with inequality constraints for expensive hyperparameter optimization. In International Joint Conference on Artificial Intelligence
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.