pith. sign in

hub Mixed citations

UniVerse-1: Unified audio-video generation via stitching of experts

Mixed citation behavior. Most common role is background (67%).

20 Pith papers citing it
Background 67% of classified citations

hub tools

citation-role summary

background 4 baseline 2

citation-polarity summary

years

2026 17 2025 3

clear filters

representative citing papers

Native Audio-Visual Alignment for Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

NAVA proposes native audio-visual alignment via Align-then-Fuse MMDiT and Timbre-in-Context Conditioning for joint audio-video generation with improved synchronization and timbre control.

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.

MAVIN: Multi-Shot Audio-Visual Generation with Narrative Control

cs.CV · 2026-06-28 · unverdicted · novelty 4.0

MAVIN proposes boundary-aware attention, ID-aware propagation, a multi-agent scripting pipeline, and the MAVINSet dataset as the first framework for multi-shot audio-visual generation with narrative control, claiming SOTA results.

citing papers explorer

Showing 4 of 4 citing papers after filters.