Title resolution pending

· 2024 · arXiv 2023.332669

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

cs.MM · 2026-04-15 · unverdicted · novelty 8.0

AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis

cs.CV · 2026-04-14 · conditional · novelty 7.0

Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.

Masked Diffusion Vision-Language Models for Temporal Action Localization

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

Adapts MDVLMs to TAL via planned training objective and step-level IoU reward, reporting gains over autoregressive baselines on ActivityNet and THUMOS datasets.

citing papers explorer

Showing 3 of 3 citing papers.

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction cs.MM · 2026-04-15 · unverdicted · none · ref 24
AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis cs.CV · 2026-04-14 · conditional · none · ref 23
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
Masked Diffusion Vision-Language Models for Temporal Action Localization cs.CV · 2026-05-28 · unverdicted · none · ref 43
Adapts MDVLMs to TAL via planned training objective and step-level IoU reward, reporting gains over autoregressive baselines on ActivityNet and THUMOS datasets.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer