Recognition: 2 theorem links
· Lean TheoremCertified Robustness under Heterogeneous Perturbations via Hybrid Randomized Smoothing
Pith reviewed 2026-05-14 20:19 UTC · model grok-4.3
The pith
Hybrid randomized smoothing yields a closed-form one-dimensional certificate that generalizes both Gaussian and discrete smoothing for joint discrete-continuous inputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By analyzing the joint likelihood ordering induced by factorized discrete and continuous noise, the approach yields a closed-form, one-dimensional certificate that strictly generalizes both Gaussian (image-only) and discrete (text-only) randomized smoothing and supplies the first model-agnostic Neyman-Pearson certificate for joint discrete-token and continuous-image perturbations.
What carries the argument
Analytically tractable Neyman-Pearson formulation of the joint worst-case problem under factorized discrete and continuous noise.
If this is right
- Supplies the first model-agnostic certificate that simultaneously protects against discrete-token and continuous-image perturbations.
- Recovers existing Gaussian and discrete smoothing bounds as special cases of a single formula.
- Enables certified safety filtering for text-image models without requiring modality-specific retraining.
- Reduces the certification computation to a one-dimensional search over the joint likelihood ratio.
Where Pith is reading between the lines
- The same factorized-noise construction could be applied to other heterogeneous pairs such as audio waveforms with discrete metadata.
- If the factorized assumption is relaxed, the certificate may become tighter but would likely lose its closed-form character.
- The one-dimensional reduction suggests that similar likelihood-ordering arguments could simplify certification for additional multimodal architectures.
Load-bearing premise
The joint likelihood ordering induced by factorized discrete and continuous noise permits an analytically tractable Neyman-Pearson formulation of the worst-case problem.
What would settle it
An explicit joint perturbation attack on a multimodal model for which the smoothed classifier's accuracy falls below the radius predicted by the one-dimensional certificate.
Figures
read the original abstract
Randomized smoothing provides strong, model-agnostic robustness certificates, but existing guarantees are limited to single modalities, treating continuous and discrete inputs in isolation. This limitation becomes critical in multimodal models, where decisions depend on cross-modal semantics and adversaries can jointly perturb heterogeneous inputs, rendering unimodal certificates insufficient. We introduce a unified randomized smoothing framework for mixed discrete--continuous inputs based on an analytically tractable Neyman--Pearson formulation of the joint worst-case problem. By analyzing the joint likelihood ordering induced by factorized discrete and continuous noise, our approach yields a closed-form, one-dimensional certificate that strictly generalizes both Gaussian (image-only) and discrete (text-only) randomized smoothing. We validate the framework on multimodal safety filtering, providing, to our knowledge, the first model-agnostic Neyman--Pearson certificate for joint discrete-token and continuous-image perturbations in interaction-dependent text--image safety filtering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a unified randomized smoothing framework for mixed discrete-continuous inputs in multimodal models. It formulates the joint worst-case robustness problem via the Neyman-Pearson lemma applied to factorized discrete and continuous noise, deriving a closed-form one-dimensional certificate that generalizes both standard Gaussian smoothing for images and discrete multinomial bounds for text. The approach is validated on multimodal safety filtering tasks involving joint token and image perturbations.
Significance. If the central derivation holds, the result is significant because it supplies the first model-agnostic Neyman-Pearson certificate that simultaneously handles heterogeneous perturbations whose effects interact through cross-modal semantics, extending randomized smoothing beyond unimodal settings where separate certificates are provably insufficient.
major comments (1)
- [Main theorem / Neyman-Pearson derivation] The core claim that the joint likelihood ratio under independent discrete and continuous noise admits an analytically tractable one-dimensional worst-case formulation (abstract and the main theorem section) is load-bearing yet insufficiently constructed: the argument must explicitly show why the Neyman-Pearson threshold depends on a single scalar rather than separate parameters for the discrete support size and continuous density scale, otherwise the optimization remains multi-dimensional and the closed-form reduction does not follow.
minor comments (1)
- [Abstract] The abstract asserts generalization to both Gaussian and discrete cases but does not display the explicit functional form of the resulting certificate (e.g., the expression involving the inverse CDF or multinomial tail that reduces to each special case); including this in the main text would aid immediate verification.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the significance of our unified Neyman-Pearson framework for heterogeneous perturbations and for the constructive feedback on the main derivation. We address the single major comment below and will revise the manuscript accordingly to strengthen the exposition.
read point-by-point responses
-
Referee: [Main theorem / Neyman-Pearson derivation] The core claim that the joint likelihood ratio under independent discrete and continuous noise admits an analytically tractable one-dimensional worst-case formulation (abstract and the main theorem section) is load-bearing yet insufficiently constructed: the argument must explicitly show why the Neyman-Pearson threshold depends on a single scalar rather than separate parameters for the discrete support size and continuous density scale, otherwise the optimization remains multi-dimensional and the closed-form reduction does not follow.
Authors: We agree that the reduction to a one-dimensional certificate requires a clearer step-by-step argument. Because the joint noise distribution factorizes as p(x,y) = p_D(x) p_C(y), the joint likelihood ratio under any pair of inputs is exactly the product Λ(x,y) = Λ_D(x) · Λ_C(y). The Neyman-Pearson test for the worst-case adversary therefore seeks the infimum over all (x,y) of the measure of the set where Λ(x,y) exceeds a threshold t. Due to independence, the level sets of log Λ = log Λ_D + log Λ_C can be ordered monotonically by a single scalar t: for any fixed t the worst-case probability is obtained by independently choosing the discrete and continuous perturbations that maximize the tail probabilities subject to the same additive threshold split (t = t_D + t_C). This collapses the joint optimization to a univariate search over t (or equivalently over the combined p-value), exactly as in the pure-Gaussian and pure-discrete cases. The manuscript states this factorization and the resulting univariate form in the main theorem, but we acknowledge the exposition is terse. In the revision we will insert an explicit lemma showing the reduction from the two-dimensional (support-size, scale) parameter space to the single scalar t, together with the corresponding one-dimensional integral expression for the certificate. revision: yes
Circularity Check
No significant circularity; derivation applies Neyman-Pearson to factorized joint likelihood
full rationale
The central result is obtained by analyzing the joint likelihood ordering under factorized discrete-continuous noise and applying the Neyman-Pearson lemma to obtain a one-dimensional certificate. This is a direct statistical derivation from the problem setup rather than a reduction to a fitted quantity, self-defined parameter, or self-citation chain. The generalization to Gaussian and discrete cases emerges from the factorization assumption without requiring the target certificate as an input. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- discrete and continuous noise scales
axioms (1)
- domain assumption Neyman-Pearson lemma applies directly to the joint likelihood ratio test for heterogeneous perturbations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.2 ... F(t;r) = ∑ p1(z1|x1) Φ[(r²/2 + σ²(log t − log γ1(z1)))/(σ r)] ... unique t⋆(r) ... V(x1,adv;r) ... padv(d,ϵ) = inf V
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hybrid Neyman–Pearson ... one-dimensional, continuous, and invertible likelihood-ratio CDF that strictly generalizes both discrete knapsack-based and Gaussian randomized smoothing
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A., Jagielski, M., Gao, I., Awadalla, A., Koh, P
Carlini, N., Nasr, M., Choquette-Choo, C. A., Jagielski, M., Gao, I., Awadalla, A., Koh, P. W., Ippolito, D., Lee, K., Tramer, F., and Schmidt, L. Are aligned neural net- works adversarially aligned? InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023a. Carlini, N., Tram`er, F., Dvijotham, K. D., Rice, L., Su...
2023
-
[3]
A Survey on Multimodal Large Language Models for Autonomous Driving
Cui, C., Ma, Y ., Cao, X., Ye, W., Zhou, Y ., Liang, K., Chen, J., Lu, J., Yang, Z., Liao, K.-D., Gao, T., Li, E., Tang, K., Cao, Z., Zhou, T., Liu, A., Yan, X., Mei, S., Cao, J., Wang, Z., and Zheng, C. A Survey on Multimodal Large Language Models for Autonomous Driving . In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WA...
2024
-
[4]
Defeating Prompt Injections by Design
Debenedetti, E., Shumailov, I., Fan, T., Hayes, J., Carlini, N., Fabian, D., Kern, C., Shi, C., Terzis, A., and Tram`er, F. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Ad- versarial attacks to multi-modal models.arXiv preprint arXiv:2409.06793,
Dou, Z., Hu, X., Yang, H., Liu, Z., and Fang, M. Ad- versarial attacks to multi-modal models.arXiv preprint arXiv:2409.06793,
-
[6]
Helff, L., Friedrich, F., Brack, M., Kersting, K., and Schramowski, P. Llavaguard: An open vlm-based frame- work for safeguarding vision datasets and models.arXiv preprint arXiv:2406.05113,
-
[7]
doi: 10.48550/arXiv. 2406.05113. 10 Certified Robustness under Heterogeneous Perturbations Huang, Z., Chu, W., Li, L., Xu, C., and Li, B. Commit: Cer- tifying robustness of multi-sensor fusion systems against semantic attacks. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25),
work page internal anchor Pith review doi:10.48550/arxiv
-
[8]
Khomsky, D., Maloyan, N., and Nutfullin, B
doi: 10.1109/ACCESS.2025.3609980. Khomsky, D., Maloyan, N., and Nutfullin, B. Prompt injection attacks in defended systems.arXiv preprint arXiv:2406.14048,
-
[9]
Certified robustness to adversarial examples with differential privacy
Lecuyer, M., Atlidakis, V ., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. InProceedings of the 2019 IEEE Symposium on Security and Privacy, pp. 656–672. IEEE,
2019
-
[10]
Second-order ad- versarial attack and certifiable robustness.arXiv preprint arXiv:1809.03113,
Li, B., Chen, C., Wang, W., and Carin, L. Second-order ad- versarial attack and certifiable robustness.arXiv preprint arXiv:1809.03113,
-
[11]
MM-SafetyBench: A benchmark for safety evaluation of multimodal large language models
Liu, X., Zhu, Y ., Gu, J., Lan, Y ., Yang, C., and Qiao, Y . MM-SafetyBench: A benchmark for safety evaluation of multimodal large language models. In Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G. (eds.),Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LV...
2024
-
[12]
doi: 10.1007/978-3-031-72992-8\
-
[13]
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Lou, A., Meng, C., and Ermon, S. Discrete diffusion lan- guage modeling by estimating the ratios of the data distri- bution.arXiv preprint arXiv:2310.16834,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Randomized smoothing meets vision-language models
Seferis, E., Wu, C., Kollias, S., Bensalem, S., and Cheng, C.- H. Randomized smoothing meets vision-language models. In Christodoulopoulos, C., Chakraborty, T., Rose, C., and Peng, V . (eds.),Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,
2025
-
[15]
Wu, H. and Cao, Y . Membership inference attacks on large- scale models: A survey.arXiv preprint arXiv:2503.19338,
-
[16]
doi: 10.18653/v1/2020.acl-main.317
Association for Computational Lin- guistics. doi: 10.18653/v1/2020.acl-main.317. Yin, Z., Ye, M., Zhang, T., Du, T., Zhu, J., Liu, H., Chen, J., Wang, T., and Ma, F. Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models. InAdvances in Neural Information Processing Systems,
- [17]
-
[18]
Zhan, Q., Fang, R., Panchal, H
doi: 10.1162/coli a 00476. Zhan, Q., Fang, R., Panchal, H. S., and Kang, D. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL,
-
[19]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversar- ial attacks on aligned language models.arXiv preprint arXiv:2307.15043,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
doi: 10.48550/arXiv.2307. 15043. 12 Certified Robustness under Heterogeneous Perturbations A. Appendix A.1. Choice of Thresholdτ Randomized smoothing certifies robustness by bounding the smoothed classifier’s output under worst-case perturbations. For a binary classifier f:X → {0,1} and its smoothed version g(x) =E z∼p(·|x)[f(z)] , the certified radius at...
-
[21]
Finally, since γ1(Z1)≥0 is independent of Yr, the map y7→φ(ay) is convex for every fixed a≥0
Therefore, for all convexφ, E[φ(Yr1)]≤E[φ(Y r2)],E[Y r1] =E[Y r2]. Finally, since γ1(Z1)≥0 is independent of Yr, the map y7→φ(ay) is convex for every fixed a≥0 . Conditioning on a=γ 1(Z1)yields E[φ(γ1(Z1)Yr1)]≤E[φ(γ 1(Z1)Yr2)]. UsingE[Y r1] =E[Y r2] = 1, this implies Γr1 ≤cx Γr2 andE[Γ r1] =E[Γ r2] = 1.(6) 4.4 Convex Order Implies Monotonicity of Lower Pa...
2007
-
[22]
For the image channel, we perform ℓ2-constrained projected gradient descent steps under radius ϵ (Madry et al., 2018)
under a discrete budget d. For the image channel, we perform ℓ2-constrained projected gradient descent steps under radius ϵ (Madry et al., 2018). At each iteration, one modality is updated while the other is held fixed, yielding a block-coordinate ascent scheme on the estimated smoothed objective. Algorithm 1 summarizes the procedure. Relation to Prior Wo...
2018
-
[23]
Look how many people love you
The resulting curves show that even small increases in image perturbation strength can rapidly erode certified coverage once prompt-injection attacks are permitted. Compared to the image-only setting (d= 0 ), certified accuracy under joint certification degrades substantially faster, despite identical image smoothing parameters. This behavior reflects the...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.