Tokenswap: Backdoor attack on the com- positional understanding of large vision-language models

Tokenswap: Backdoor attack on the compositional understanding of large vision-language models , author= · 2025 · arXiv 2509.24566

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models

cs.CR · 2026-07-01 · unverdicted · novelty 6.0

ReShift is a reasoning-level backdoor framework for VLMs that uses poisoned data construction and joint optimization to shift CoT trajectories on trigger while preserving surface coherence.

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

cs.CR · 2026-06-06 · unverdicted · novelty 6.0

POISE is a stealthy skill-poisoning attack achieving 89.3% ASR on Skill-Inject by blending a compressed trigger into contextually appropriate positions in skill bodies, outperforming YAML and random-placement baselines while evading static scanners.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.

CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning

cs.CV · 2026-05-28 · unverdicted · novelty 3.0

CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.

citing papers explorer

Showing 1 of 1 citing paper after filters.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG cs.CV · 2026-04-07 · unverdicted · none · ref 33
VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.

Tokenswap: Backdoor attack on the com- positional understanding of large vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer