RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.
Can AI perceive physical danger and intervene?
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
TouchSafeBench evaluates VLMs on collision grounding, finding best Macro-F1 below 50% and that explicit depth does not yield reliable robot-body contact inference.
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
citing papers explorer
-
RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents
RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.
-
Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration
TouchSafeBench evaluates VLMs on collision grounding, finding best Macro-F1 below 50% and that explicit depth does not yield reliable robot-body contact inference.
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.