pith. sign in

arxiv: 2510.09689 · v3 · submitted 2025-10-09 · 💻 cs.CR · cs.AI

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

Pith reviewed 2026-05-18 09:26 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords red-teamingweb-augmented LLMssafety vulnerabilitiesadversarial search queriesharmful content retrievalblack-box attacks
0
0 comments X

The pith

Web-augmented LLMs can be tricked into citing harmful content via queries that look harmless.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that adding web search to large language models opens a safety gap that standard red-teaming for standalone models does not cover. It builds CREST-Search around three attack strategies that craft ordinary-looking search queries meant to retrieve unsafe web pages and then cite them in the model's reply. The approach adds an iterative refinement step that works when the attacker cannot see inside the retrieval or safety systems. A reader should care because many deployed systems now rely on live search for up-to-date answers, so any weakness in the retrieval step can reach users even when the model itself refuses to generate harm directly.

Core claim

The central claim is that three novel attack strategies, paired with iterative in-context refinement, can produce search queries that stay effective against black-box web-augmented LLMs, bypass their safety filters, and cause the models to cite harmful or low-credibility web content.

What carries the argument

CREST-Search framework built on three attack strategies that turn benign search queries into vectors for unsafe web citations, plus a WebSearch-Harm dataset used to fine-tune a red-teaming model.

If this is right

  • Safety design for web-augmented LLMs must cover the full search-and-citation workflow rather than generation alone.
  • A dedicated harmful search dataset improves the quality of queries that surface vulnerabilities.
  • Current filters built for standalone models leave measurable gaps when live web results are involved.
  • Systematic red-teaming can map out which parts of the retrieval pipeline are easiest to exploit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future safety benchmarks for LLMs should include search-augmented test cases by default.
  • The same query-generation idea could be tested on other external tools such as code interpreters or database queries.
  • Companies might add separate credibility scoring on retrieved pages before they are shown to the user.

Load-bearing premise

The three attack strategies will keep working against real deployed web-augmented systems even though the researchers have no direct access to the retrieval or safety components.

What would settle it

Take the queries produced by CREST-Search and submit them to commercial web-augmented LLMs; record whether the models return or cite harmful content that their normal safety filters are supposed to block.

Figures

Figures reproduced from arXiv: 2510.09689 by Gelei Deng, Han Qiu, Haoran Ou, Jie Zhang, Kangjie Chen, Tianwei Zhang, Xingshuo Han.

Figure 1
Figure 1. Figure 1: Overview of CREST-Search, consisting of three main phases. (1) Adversarial search queries generation: it generates the adversarial search queries based on various harmful content categories and strategies; (2) Web search execution and risk evaluation: it executes the search query and evaluates the toxicity of the cited webpages; (3) Adversarial search queries refinement: it optimizes the query based on the… view at source ↗
Figure 2
Figure 2. Figure 2: Detailed risks analysis across baseline models. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transferability of CREST-Search across various vic￾tim models. shown in Appendix A.3. These results underscore the urgent re￾quirement for more comprehensive citation defense mechanisms to protect LLMs with web search than vanilla LLMs. 5.3 Ablation Study Finally, we conduct the ablation study to validate the contribu￾tions of each component during query generation and refinement stages. Two key factors af… view at source ↗
Figure 4
Figure 4. Figure 4: The impact of refinement rounds on risk detection rate (a), optimization cost (b), and optimization time (c) by five [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Detection rates for five harmful-content categories [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date information from the open Internet. While this integration enhances model capability, it also introduces a distinct safety threat surface: the retrieval and citation process has the potential risk of exposing users to harmful or low-credibility web content. Existing red-teaming methods are largely designed for standalone LLMs as they primarily focus on unsafe generation, ignoring risks emerging from the complex search workflow. To address this gap, we propose CREST-Search, a pioneering red-teaming framework for LLMs with web search. The cornerstone of CREST-Search is three novel attack strategies that generate seemingly benign search queries yet induce unsafe citations. It also employs an iterative in-context refinement mechanism to strengthen adversarial effectiveness under black-box constraints. In addition, we construct a search-specific harmful dataset, WebSearch-Harm, which enables fine-tuning a specialized red-teaming model to improve query quality. Our experiments demonstrate that CREST-Search can effectively bypass safety filters and systematically expose vulnerabilities in web search-based LLM systems, underscoring the necessity of the development of robust search models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CREST-Search, a red-teaming framework for web-augmented LLMs. It proposes three novel attack strategies that craft seemingly benign search queries to induce unsafe citations, an iterative in-context refinement mechanism to improve effectiveness under black-box constraints, and the WebSearch-Harm dataset to fine-tune a specialized red-teaming model. Experiments are presented to demonstrate that the framework bypasses safety filters and exposes vulnerabilities in web search-based LLM systems.

Significance. If the empirical results are robust, the work is significant because it addresses a previously underexplored attack surface arising from the retrieval and citation workflow in web-augmented LLMs, rather than focusing solely on direct generation. The black-box setting, new attack strategies, iterative refinement, and dedicated dataset provide concrete tools that could help developers of production search-augmented systems improve safety; the emphasis on falsifiable red-teaming outcomes is a strength.

major comments (2)
  1. [§5] §5 (Experimental Setup): The central claim that CREST-Search 'effectively bypasses safety filters and systematically expose vulnerabilities in web search-based LLM systems' rests on black-box experiments, yet the manuscript does not clarify whether evaluations used live commercial APIs with proprietary retrieval and post-retrieval filtering or open-source proxies/simulations. This distinction is load-bearing; success rates observed on proxies may not transfer to production pipelines where ranking and safety mechanisms are unknown and potentially stronger.
  2. [§4.2 and Table 2] §4.2 and Table 2: The three attack strategies and iterative refinement are presented as novel, but the paper provides limited ablation showing their individual contributions versus a simple baseline of direct harmful queries or existing red-teaming methods adapted to search. Without these controls, it is difficult to establish that the reported bypass rates are attributable to the proposed components rather than the underlying LLM's weaknesses.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., bypass rate or comparison to baseline) to support the claim of effectiveness.
  2. [§3.3] Notation for the iterative refinement loop (e.g., how many iterations and the exact in-context prompt template) should be formalized in a figure or algorithm box for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the recognition of the work's potential significance in highlighting an underexplored attack surface in web-augmented LLMs. We respond to each major comment below and outline the revisions we will make to address the concerns.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental Setup): The central claim that CREST-Search 'effectively bypasses safety filters and systematically expose vulnerabilities in web search-based LLM systems' rests on black-box experiments, yet the manuscript does not clarify whether evaluations used live commercial APIs with proprietary retrieval and post-retrieval filtering or open-source proxies/simulations. This distinction is load-bearing; success rates observed on proxies may not transfer to production pipelines where ranking and safety mechanisms are unknown and potentially stronger.

    Authors: We agree that explicitly distinguishing between live commercial APIs and open-source proxies is necessary for evaluating the robustness and transferability of our results. The current §5 description does not provide sufficient detail on this point. We will revise the experimental setup section to clearly specify the exact APIs, retrieval systems, and any filtering mechanisms employed in each set of experiments, along with a discussion of how these choices relate to production environments. revision: yes

  2. Referee: [§4.2 and Table 2] §4.2 and Table 2: The three attack strategies and iterative refinement are presented as novel, but the paper provides limited ablation showing their individual contributions versus a simple baseline of direct harmful queries or existing red-teaming methods adapted to search. Without these controls, it is difficult to establish that the reported bypass rates are attributable to the proposed components rather than the underlying LLM's weaknesses.

    Authors: We acknowledge that stronger ablations would better isolate the contributions of the proposed attack strategies and iterative refinement. While direct harmful queries are often filtered prior to retrieval (making them less relevant as a baseline for search-specific attacks) and existing red-teaming approaches target generation rather than query crafting, we agree that additional controls would strengthen the claims. We will add new ablation experiments comparing against adapted baselines and expand Table 2 to report the individual and combined effects of each component. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical red-teaming framework with independent experimental validation

full rationale

The paper presents CREST-Search as an empirical red-teaming approach consisting of three attack strategies, iterative refinement, and the WebSearch-Harm dataset for fine-tuning. No equations, derivations, or load-bearing self-citations are present that would reduce any claimed result to a fitted parameter or prior input by construction. The central claims rest on experimental outcomes in a stated black-box setting, which are externally testable against deployed systems and do not rely on self-referential definitions or renamings of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the premise that web retrieval introduces a distinct safety threat surface not covered by existing LLM red-teaming. No free parameters are visible in the abstract. One domain assumption is invoked: that 'harmful or low-credibility web content' can be reliably identified for dataset construction and evaluation.

axioms (1)
  • domain assumption Web search integration in LLMs creates a distinct safety threat surface from retrieval and citation of harmful content that existing red-teaming methods do not address.
    Stated directly in the abstract as the motivation for the new framework.
invented entities (2)
  • CREST-Search framework no independent evidence
    purpose: Red-teaming web-augmented LLMs via benign-looking queries that induce unsafe citations
    Introduced as the main contribution; no independent evidence outside the paper is provided in the abstract.
  • WebSearch-Harm dataset no independent evidence
    purpose: Enables fine-tuning a specialized red-teaming model
    Constructed for this work; no external validation or public release details given in the abstract.

pith-pipeline@v0.9.0 · 5753 in / 1609 out tokens · 34395 ms · 2026-05-18T09:26:41.748343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    cs.LG 2026-05 unverdicted novelty 6.0

    NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.

  2. NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    cs.LG 2026-05 unverdicted novelty 6.0

    NodeSynth creates evidence-based synthetic queries via a taxonomy generator to evaluate LLMs, revealing up to 5x higher failure rates than human benchmarks and gaps in guard models.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Bang An, Shiyue Zhang, and Mark Dredze. 2025. Rag llms are not safer: A safety analysis of retrieval-augmented generation for large language models.arXiv preprint arXiv:2504.18041(2025)

  2. [2]

    Rishabh Bhardwaj and Soujanya Poria. 2023. Red-teaming large language models using chain of utterances for safety-alignment.arXiv preprint arXiv:2308.09662 (2023)

  3. [3]

    Nicholas Boucher, Luca Pajola, Ilia Shumailov, Ross Anderson, and Mauro Conti

  4. [4]

    InProceed- ings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses

    Boosting big brother: Attacking search engines with encodings. InProceed- ings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses. 700–713

  5. [5]

    Kangjie Chen, Li Muyang, Guanlin Li, Shudong Zhang, Shangwei Guo, and Tian- wei Zhang. [n. d.]. TRUST-VLM: Thorough Red-Teaming for Uncovering Safety Threats in Vision-Language Models. InForty-second International Conference on Machine Learning

  6. [6]

    Mark Chen, Jerry Tworek, Heewoo Jun, et al. 2021. Evaluating Large Language Models Trained on Code. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 34. 24066–24080

  7. [7]

    Google DeepMind. 2024. Gemini: A family of multimodal models. https:// deepmind.google/technologies/gemini/

  8. [8]

    Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayi- heng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. 2023. How abilities in large language models are affected by supervised fine-tuning data composition.arXiv preprint arXiv:2310.05492(2023)

  9. [9]

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. 2022. A survey on in-context learning. arXiv preprint arXiv:2301.00234(2022)

  10. [10]

    Abhimanyu Dubey et al . 2024. The Llama 3 Herd of Models.arXiv preprint arXiv:2407.21783(2024)

  11. [11]

    Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, et al. 2024. Llama guard 3-1b-int4: Compact and efficient safeguard for human-ai conversations. arXiv preprint arXiv:2411.17713(2024)

  12. [12]

    Michael Feffer, Anusha Sinha, Wesley H Deng, Zachary C Lipton, and Hoda Heidari. 2024. Red-teaming for generative AI: Silver bullet or security theater?. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 7. 421–437

  13. [13]

    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Joh...

  14. [14]

    Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, and Yuning Mao. 2023. Mart: Improving llm safety with multi-round automatic red-teaming.arXiv preprint arXiv:2311.07689(2023)

  15. [15]

    2025.Grounding with Google Search

    Google. 2025.Grounding with Google Search. https://ai.google.dev/gemini- api/docs/google-search

  16. [16]

    2012.Google and the Culture of Search

    Ken Hillis, Michael Petit, and Kylie Jarrett. 2012.Google and the Culture of Search. Routledge

  17. [17]

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. 2023. Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674(2023)

  18. [18]

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, et al. 2023. Survey of hallucination in natural language generation.Comput. Surveys(2023)

  19. [19]

    Matthew Joslin, Neng Li, Shuang Hao, Minhui Xue, and Haojin Zhu. 2019. Mea- suring and analyzing search engine poisoning of linguistic collisions. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, 1311–1325

  20. [20]

    Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, and Seong Joon Oh. 2023. Propile: Probing privacy leakage in large language models. Advances in Neural Information Processing Systems36 (2023), 20750–20762

  21. [21]

    Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. 2023. Certifying llm safety against adversarial prompting. arXiv preprint arXiv:2309.02705(2023)

  22. [22]

    Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, et al . 2019. Natural Questions: a Benchmark for Question Answering Research. InTransactions of the Association for Computational Linguistics (TACL), Vol. 7. 453–466

  23. [23]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, et al. 2020. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 33. 9459–9474

  24. [24]

    Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, and Tianwei Zhang. 2024. ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users. arXiv:2405.19360 [cs.CR] https://arxiv.org/abs/2405.19360

  25. [25]

    Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng- Xin Yong, Suhas Kotha, et al . 2024. A safe harbor for ai evaluation and red teaming.arXiv preprint arXiv:2403.04893(2024)

  26. [26]

    Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, and Xinlei He. 2025. Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search.arXiv preprint arXiv:2502.04951(2025)

  27. [27]

    Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. 2024. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. arXiv preprint arXiv:2402.04249(2024)

  28. [28]

    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, et al . 2021. WebGPT: Browser- assisted question-answering with human feedback. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), Vol. 34. 5662–5674

  29. [29]

    OpenAI. 2024. GPT-4o System Card. https://openai.com/research/gpt-4o-system- card

  30. [30]

    2024.Moderation

    OpenAI. 2024.Moderation. https://platform.openai.com/docs/guides/moderation

  31. [31]

    2025.GPT-4o Search Preview

    OpenAI. 2025.GPT-4o Search Preview. https://platform.openai.com/docs/models/ gpt-4o-search-preview

  32. [32]

    Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022. Red Teaming Language Models with Language Models. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computa...

  33. [33]

    Chris Rowlands. 2025. Goodbye Google? People are increasingly switching to the likes of ChatGPT, according to major survey–here’s why.Techradar, https://www. techradar. com/tech/people-are-increasingly-swapping-google-for-the- likesof-chatgpt-according-to-a-major-survey-heres-why(2025)

  34. [34]

    Nathalie A Smuha. 2025. Regulation 2024/1689 of the Eur. Parl. & Council of June 13, 2024 (EU Artificial Intelligence Act).International Legal Materials(2025), 1–148. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

  35. [35]

    2025.Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next-Generation Agentic Capabilities

    Gemini Team. 2025.Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next-Generation Agentic Capabilities. Technical Report. Google DeepMind

  36. [36]

    Anuraj J Thirunavukarasu et al. 2023. Large language models in medicine.Nature Medicine29, 8 (2023), 1930–1940

  37. [37]

    GOV UK. 2023. Safety and security risks of generative artificial intelligence to 2025 (annex b).GOV. UK, Nov(2023)

  38. [38]

    Y Wang et al. 2023. AI for education: Opportunities and challenges.Computers and Education: Artificial Intelligence5 (2023), 100128

  39. [39]

    Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning.ACM computing surveys (csur)53, 3 (2020), 1–34

  40. [40]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, et al. 2022. Chain-of-thought prompt- ing elicits reasoning in large language models. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 35. 24824–24837

  41. [41]

    Nursel Yalçın and Utku Köse. 2010. What is search engine optimization: SEO? Procedia-Social and Behavioral Sciences9 (2010), 487–493

  42. [42]

    Shuo Yang, Shiyu Wu, Jiajun Chen, et al. 2024. Large Language Models in Finance: Applications, Risks, and Opportunities.arXiv preprint arXiv:2402.06196(2024)

  43. [43]

    Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu, and Li Yuan. 2023. Llm lies: Hallucinations are not bugs, but features as adversarial examples.arXiv preprint arXiv:2310.01469(2023)

  44. [44]

    Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, and Qi Li. 2024. Jailbreak attacks and defenses against large language models: A survey.arXiv preprint arXiv:2407.04295(2024)

  45. [45]

    Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A benchmarking platform for text generation models. InThe 41st international ACM SIGIR conference on research & development in information retrieval. 1097–1100

  46. [46]

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043(2023). CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search Conference acronym ’XX, June 03–0...

  47. [47]

    Through the crafted prompts, we can ensure that the generated queries are both diverse and targeted, comprehensively uncovering the risks in the LLMs with web search

    <user query 3> Below is the user prompt that specifies the concrete context for generation, including the harmful content category, its description, the selected construction strategy, and a brief explanation of that strategy. Through the crafted prompts, we can ensure that the generated queries are both diverse and targeted, comprehensively uncovering th...