Title resolution pending

Hezekiah J · 2022 · arXiv 2209.02128

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

cs.CR · 2026-05-15 · unverdicted · novelty 7.0

CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.

Prompt Injection Attack to Tool Selection in LLM Agents

cs.CR · 2025-04-28 · conditional · novelty 7.0

ToolHijacker optimizes malicious tool documents via a two-phase strategy to hijack LLM agents' tool selection in no-box settings.

CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

FlashRT delivers 2x-7x speedup and 2x-4x GPU memory reduction for prompt injection and knowledge corruption attacks on long-context LLMs versus nanoGCG.

AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

cs.CL · 2025-08-05 · unverdicted · novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.

Ignore Previous Prompt: Attack Techniques For Language Models

cs.CL · 2022-11-17 · unverdicted · novelty 6.0

PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

cs.AI · 2024-08-23 · unverdicted · novelty 4.0

The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 8 of 8 citing papers.

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation cs.CR · 2026-05-15 · unverdicted · none · ref 5
CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.
Prompt Injection Attack to Tool Selection in LLM Agents cs.CR · 2025-04-28 · conditional · none · ref 17
ToolHijacker optimizes malicious tool documents via a two-phase strategy to hijack LLM agents' tool selection in no-box settings.
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases cs.CR · 2026-05-01 · unverdicted · none · ref 42
CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption cs.CR · 2026-04-30 · unverdicted · none · ref 61
FlashRT delivers 2x-7x speedup and 2x-4x GPU memory reduction for prompt injection and knowledge corruption attacks on long-context LLMs versus nanoGCG.
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption cs.CL · 2025-08-05 · unverdicted · none · ref 8
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
Ignore Previous Prompt: Attack Techniques For Language Models cs.CL · 2022-11-17 · unverdicted · none · ref 2
PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions cs.AI · 2024-08-23 · unverdicted · none · ref 76
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 18
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer