Poisoning language models during instruction tuning

· 2023 · arXiv 2305.00944

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

cs.RO · 2026-04-04 · unverdicted · novelty 6.0

Backdoor attacks aligned with JSON command formats in LLM robot controllers achieve 83% attack success rate while preserving over 93% clean accuracy and sub-second latency.

citing papers explorer

Showing 2 of 2 citing papers.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 43
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems cs.RO · 2026-04-04 · unverdicted · none · ref 7
Backdoor attacks aligned with JSON command formats in LLM robot controllers achieve 83% attack success rate while preserving over 93% clean accuracy and sub-second latency.

Poisoning language models during instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer