RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
arXiv preprint arXiv:2510.25025 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.
PRA-RAG is a new aggregation algorithm for RAG that claims provable robustness bounds against poisoned retrieved texts and reduces attack success rate to 1% while keeping 71% accuracy.
citing papers explorer
-
Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
-
ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense
ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.
-
PRA-RAG: Provably Robust Aggregation in Retrieval-Augmented Generation against Retrieval Corruption
PRA-RAG is a new aggregation algorithm for RAG that claims provable robustness bounds against poisoned retrieved texts and reduces attack success rate to 1% while keeping 71% accuracy.