pith. machine review for the scientific record. sign in

arxiv: 2605.13115 · v1 · submitted 2026-05-13 · 💻 cs.CR · cs.LG

Recognition: unknown

DiffusionHijack: Supply-Chain PRNG Backdoor Attack on Diffusion Models and Quantum Random Number Defense

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:57 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords diffusion modelsbackdoor attacksupply chainPRNGquantum random number generatorStable Diffusionlatent noise
0
0 comments X

The pith

A malicious PRNG injected through the software supply chain can force diffusion models to output any chosen image pixel-for-pixel without touching model weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that replacing the standard pseudo-random number generator with a backdoored version lets an attacker dictate the exact latent noise fed into diffusion models, producing attacker-specified images at perfect structural similarity. This control works across Stable Diffusion versions even when users add stochastic sampling and independent of the text prompt. Because the change sits outside the neural network, weight audits and safety filters miss it entirely. Replacing the PRNG with a quantum random number generator removes the predictability and drops output similarity to random levels.

Core claim

By compromising the PRNG through a supply-chain package, the attack forces Stable Diffusion v1.4, v1.5, and SDXL to reproduce attacker-chosen content with SSIM equal to 1.00 over 100 trials, remains effective at eta greater than zero, bypasses CLIP safety checkers at 98-100 percent success, and operates without reference to the user's prompt.

What carries the argument

The hijacked PRNG that supplies the deterministic latent noise vector to the diffusion sampling process.

If this is right

  • Weight-based model audits and content moderation systems cannot detect the backdoor.
  • The attack succeeds even when users enable stochastic sampling parameters.
  • Switching to quantum random number generation reduces output similarity to baseline random levels across tested models and prompts.
  • The vulnerability applies to any diffusion pipeline that relies on a replaceable PRNG for noise sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Generative AI pipelines should treat the source of randomness as a first-class security boundary rather than an implementation detail.
  • Similar PRNG hijacks could affect other sampling-based generators such as autoregressive language models.
  • Hardware QRNG modules become a practical default for production image generation services.

Load-bearing premise

The PRNG that generates the initial noise can be replaced or overridden by code inserted through a compromised software dependency.

What would settle it

A trial in which the malicious PRNG is loaded but the generated images fail to reach SSIM of 1.00 with the target, or in which a QRNG source still yields high similarity scores.

Figures

Figures reproduced from arXiv: 2605.13115 by Liling Zheng, Xiaoke Yang, Xuxing Lu, Ziyang You.

Figure 1
Figure 1. Figure 1: Comparison of three inference pipelines: normal inference (top), the DiffusionHi [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual comparison of DiffusionHijack attack, baseline, and QRNG defense outputs [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-model SSIM comparison under attack, baseline, and QRNG defense condi [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of CFG scale w on prompt-agnostic attack effectiveness (SD 1.5). Red solid line with circle markers: Attack SSIM; blue dashed line with triangle markers: QRNG Defense SSIM. Shaded region indicates defense reduction magnitude. Conditions: T = 50, η = 0, N = 100 prompt pairs per scale. cult to achieve than purely remote attacks. Second, the QRNG defense requires dedicated hardware, which incurs additi… view at source ↗
read the original abstract

Diffusion models depend on pseudo-random number generators (PRNGs) for latent noise sampling. We present DiffusionHijack, a supply-chain backdoor attack that hijacks the PRNG to deterministically control generated images. A malicious PRNG, injected via compromised packages, forces pixel-perfect reproduction of attacker-chosen content (SSIM = 1.00, N = 100 trials) on Stable Diffusion v1.4, v1.5, and SDXL -- without modifying model weights. The attack is inherently undetectable by existing model auditing and content moderation mechanisms, as it operates entirely outside the neural network computation graph. The attack remains effective under stochastic sampling (eta > 0), bypasses CLIP-based safety checkers (98-100% success), and operates independently of the user's prompt. As a countermeasure, we replace the PRNG with a quantum random number generator (QRNG), which provides information-theoretic unpredictability. Across N = 100 prompt-model combinations, QRNG defense completely neutralizes the attack, reducing output similarity to random baseline levels (SSIM < 0.20 for SD 1.x models, < 0.45 for SDXL). This work exposes a previously overlooked supply-chain vulnerability and offers a hardware-level fundamental mitigation for generative AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to demonstrate DiffusionHijack, a supply-chain backdoor attack that replaces the PRNG in diffusion models (Stable Diffusion v1.4, v1.5, SDXL) via compromised packages to force pixel-perfect reproduction of attacker-chosen images (SSIM=1.00 across N=100 trials) without modifying model weights. The attack is asserted to operate independently of the user's prompt, remain effective under stochastic sampling (eta>0), bypass CLIP safety checkers (98-100% success), and be undetectable by model auditing. As mitigation, replacing the PRNG with a QRNG is shown to neutralize the attack, reducing similarity to random baselines (SSIM<0.20 for SD 1.x, <0.45 for SDXL) across 100 prompt-model combinations.

Significance. If the empirical claims hold with mechanistic support, the work would be significant for highlighting a novel supply-chain attack surface in generative AI that evades weight-based defenses and auditing, while the QRNG countermeasure provides a hardware-rooted, information-theoretic mitigation with clear practical implications for securing inference pipelines.

major comments (2)
  1. [Abstract] Abstract: the central claim that the attack 'operates independently of the user's prompt' lacks mechanistic support. In the diffusion pipeline the initial latent (controlled by PRNG) is only the starting point; every denoising step applies cross-attention with the prompt embedding. Fixed initial noise therefore cannot produce pixel-identical outputs for dissimilar prompts, implying the reported SSIM=1.00 trials were likely run with prompts already describing the target content. This directly undermines the prompt-independence assertion and the attack surface description.
  2. [Abstract] Abstract and experimental claims: the reported metrics (SSIM=1.00, 98-100% bypass rates, N=100) are presented without implementation details, error analysis, full sampling protocol, or description of how the malicious PRNG is integrated into the computation graph. This absence makes the reproducibility and robustness of the headline result (especially under eta>0) impossible to assess from the manuscript.
minor comments (1)
  1. The manuscript would benefit from a diagram or pseudocode showing exactly where and how the PRNG replacement occurs in the standard diffusion sampling loop.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and have revised the manuscript to improve clarity, add missing details, and correct imprecise claims while preserving the core technical contribution.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the attack 'operates independently of the user's prompt' lacks mechanistic support. In the diffusion pipeline the initial latent (controlled by PRNG) is only the starting point; every denoising step applies cross-attention with the prompt embedding. Fixed initial noise therefore cannot produce pixel-identical outputs for dissimilar prompts, implying the reported SSIM=1.00 trials were likely run with prompts already describing the target content. This directly undermines the prompt-independence assertion and the attack surface description.

    Authors: We agree that the original phrasing overstated prompt independence. The attack fixes only the initial latent via PRNG replacement and therefore produces deterministic outputs for a given prompt; cross-attention still conditions the result on the prompt embedding. In the reported experiments the SSIM=1.00 results were obtained when the user prompt described the attacker-chosen target image. For dissimilar prompts the output remains fully determined by the hijacked noise but will reflect the supplied prompt. We have revised the abstract and introduction to state that the attack enables pixel-perfect reproduction of attacker-specified content for prompts consistent with that content, while still bypassing weight-based auditing and CLIP safety filters. A new mechanistic paragraph in Section 3 explains the interaction between fixed noise and prompt conditioning. This revision preserves the supply-chain attack surface claim. revision: yes

  2. Referee: [Abstract] Abstract and experimental claims: the reported metrics (SSIM=1.00, 98-100% bypass rates, N=100) are presented without implementation details, error analysis, full sampling protocol, or description of how the malicious PRNG is integrated into the computation graph. This absence makes the reproducibility and robustness of the headline result (especially under eta>0) impossible to assess from the manuscript.

    Authors: We accept that the original manuscript omitted necessary implementation information. The revised version adds: (i) a detailed description of the malicious PRNG integration via package-level monkey-patching of torch.rand and the diffusers scheduler noise functions; (ii) the complete sampling protocol (50 steps, guidance scale 7.5, eta in {0.0, 0.5}, DPMSolver scheduler); (iii) per-trial SSIM statistics (mean 1.00, std < 0.005 across N=100); and (iv) pseudocode plus an anonymized repository link. These additions cover both deterministic and stochastic (eta>0) regimes and enable independent reproduction. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack demonstration with independent QRNG defense

full rationale

The paper presents an empirical supply-chain attack via malicious PRNG replacement and a QRNG countermeasure. No equations, derivations, or first-principles results are claimed. Results rest on reported experimental outcomes (SSIM=1.00, N=100 trials) and information-theoretic properties of QRNG, without any reduction to fitted parameters, self-citations as load-bearing premises, or ansatz smuggling. The central claims do not reduce to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that diffusion models rely on PRNGs for latent noise and that supply-chain compromise of software packages is feasible; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Diffusion models use PRNGs for sampling latent noise during generation
    Standard implementation detail of diffusion pipelines referenced in the abstract
  • domain assumption Supply-chain injection of malicious PRNG packages is possible without detection
    Core premise of the attack vector

pith-pipeline@v0.9.0 · 5542 in / 1250 out tokens · 59649 ms · 2026-05-14T18:57:02.933581+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references

  1. [1]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF CVPR, 2022, pp. 10 684–10 695

  2. [2]

    Photorealistic text-to-image diffusion models with deep language under- standing,

    C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, J. Ho, T. Salimans, D. Fleet, and M. Norouzi, “Photorealistic text-to-image diffusion models with deep language under- standing,” inProc. NeurIPS, vol. 35, 2022, pp. 11 285–11 299

  3. [3]

    Guardt2i: Defending text-to-image models from adversarial prompts,

    Y. Yang, R. Gao, X. Qin, J. Shao, and X. Xie, “Guardt2i: Defending text-to-image models from adversarial prompts,” inProc. NeurIPS, 2024

  4. [4]

    Badnets: Identifying vulnerabilities in the ma- chine learning model supply chain,

    T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the ma- chine learning model supply chain,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

  5. [5]

    Backdoor learning: A survey,

    Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 1, pp. 5–22, 2024

  6. [6]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inProc. ICML, 2021, pp. 8748–8763

  7. [7]

    T2isafety: Benchmark for assessing fairness, toxicity, and privacy in text- to-image generation,

    Y. Liet al., “T2isafety: Benchmark for assessing fairness, toxicity, and privacy in text- to-image generation,” inProc. IEEE/CVF CVPR, 2025. 17

  8. [8]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inProc. ICLR, 2021

  9. [9]

    Re- search directions in software supply chain security,

    L. Williams, G. Benedetti, S. Hamer, R. Paramitha, I. Rahman, and M. Tamanna, “Re- search directions in software supply chain security,”ACM Trans. Softw. Eng. Methodol., 2024

  10. [10]

    Beyond typosquatting: An in-depth look at package confusion,

    S. Neupane, G. Holmes, E. Wyss, D. Davidson, and L. De Carli, “Beyond typosquatting: An in-depth look at package confusion,” inProc. USENIX Security Symp., 2023

  11. [11]

    A container security survey: Exploits, attacks, and defenses,

    O. Jarkas, R. K. L. Ko, N. Dong, and R. Mahmud, “A container security survey: Exploits, attacks, and defenses,”ACM Comput. Surv., vol. 57, no. 7, 2025

  12. [12]

    We have a package for you! a comprehensive analysis of package hallucinations by code generating llms,

    J. Spracklen, R. Wijewickrama, and A. H. M. S. M. Jadliwala, “We have a package for you! a comprehensive analysis of package hallucinations by code generating llms,” in Proc. USENIX Security Symp., 2025

  13. [13]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProc. NeurIPS, vol. 33, 2020, pp. 6840–6851

  14. [14]

    Score- based generative modeling through stochastic differential equations,

    Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score- based generative modeling through stochastic differential equations,” inProc. ICLR, 2021

  15. [15]

    Adversarial attacks and defenses on text-to- image diffusion models: A survey,

    C. Zhang, M. Hu, W. Li, and L. Wang, “Adversarial attacks and defenses on text-to- image diffusion models: A survey,”Inf. Fusion, vol. 114, p. 102701, 2025

  16. [16]

    Taxonomy of attacks on open-source software supply chains,

    P. Ladisa, C. Phalen, A. Wasowski, H. Plate, and S. E. Ponta, “Taxonomy of attacks on open-source software supply chains,” inProc. IEEE S&P, 2023

  17. [17]

    Donapi: Malicious npm packages detector using behavior sequence knowledge mapping,

    C. Huang, Y. Wang, Y. Wu, L. Wang, H. Zhou, and H. Chen, “Donapi: Malicious npm packages detector using behavior sequence knowledge mapping,” inProc. USENIX Security Symp., 2024

  18. [18]

    Surrogateprompt: Bypassing the safety filter of text-to-image models via substitution,

    Z. Ba, K. Chen, L. Jiang, Z. Ma, and S. Wang, “Surrogateprompt: Bypassing the safety filter of text-to-image models via substitution,” inProc. ACM CCS, 2024

  19. [19]

    Jailbreakd- iffbench: A comprehensive benchmark for jailbreaking diffusion models,

    X. Jin, Z. Weng, H. Guo, C. Yin, S. Cheng, G. Zhang, and X. Zhang, “Jailbreakd- iffbench: A comprehensive benchmark for jailbreaking diffusion models,” inProc. IEEE/CVF ICCV, 2025

  20. [20]

    Quantum random number generators,

    M. Stipcevic and B. M. Kuo, “Quantum random number generators,”Open Phys., vol. 9, no. 4, pp. 1055–1066, 2011

  21. [21]

    Quantum cryptography: Public key distribution and coin tossing,

    C. H. Bennett and G. Brassard, “Quantum cryptography: Public key distribution and coin tossing,” inProc. IEEE Int. Conf. Comput. Syst. Signal Process., Bangalore, India, 1984, pp. 175–179

  22. [22]

    Certified adversarial robustness via ran- domized smoothing,

    J. M. Cohen, E. Rosenfeld, and J. Z. Kolter, “Certified adversarial robustness via ran- domized smoothing,” inProc. ICML, 2019, pp. 1310–1320. 18

  23. [23]

    Tree-ring watermarks: Fin- gerprints for diffusion images that are invisible and robust,

    Y. Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein, “Tree-ring watermarks: Fin- gerprints for diffusion images that are invisible and robust,” inProc. NeurIPS, 2023

  24. [24]

    Sdxl: Improving latent diffusion models for high-resolution image synthesis,

    D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. M¨ uller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” inProc. ICLR, 2024

  25. [25]

    Classifier-free diffusion guidance,

    J. Ho and T. Salimans, “Classifier-free diffusion guidance,” inProc. NeurIPS, 2022

  26. [26]

    Security analysis of pseudo-random number generators with input: /dev/random is not robust,

    Y. Dodis, D. Pointcheval, S. Ruhault, D. Vergniaud, and D. Wichs, “Security analysis of pseudo-random number generators with input: /dev/random is not robust,” inProc. ACM CCS, 2013, pp. 647–658

  27. [27]

    Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

    S. Jang, J. S. Choi, J. Jo, K. Lee, and S. J. Hwang, “Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,” inProc. IEEE/CVF CVPR, 2025, pp. 8203–8212

  28. [28]

    Text-to-image diffusion models can be easily backdoored through multimodal data poisoning,

    S. Zhai, Y. Dong, Q. Shen, S. Pu, Y. Fang, and H. Su, “Text-to-image diffusion models can be easily backdoored through multimodal data poisoning,” inProc. ACM Int. Conf. Multimedia, 2023, pp. 1577–1587

  29. [29]

    Reason2attack: Jailbreaking text-to- image models via llm reasoning,

    C. Zhang, L. Wang, Y. Ma, W. Li, and A. Liu, “Reason2attack: Jailbreaking text-to- image models via llm reasoning,” inProc. AAAI Conf. Artif. Intell., vol. 40, no. 42, 2025, pp. 36 030–36 038

  30. [30]

    Advdiff: Generating unrestricted adversarial examples using diffusion models,

    X. Dai, K. Liang, and B. Xiao, “Advdiff: Generating unrestricted adversarial examples using diffusion models,” inProc. ECCV, 2024, pp. 93–109. 19