SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.
arXiv preprint arXiv:2411.18013 (2024) 4, 26, 60, 61
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.
citing papers explorer
-
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
SafeLens presents a fast-and-slow video guardrail framework that filters the SafeWatch dataset to 2.4% and adds Chain-of-Thought traces to achieve state-of-the-art moderation performance at reduced inference cost.
-
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.