pith. sign in

Av-dit: Efficient audio-visual diffusion transformer for joint audio and video generation

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

years

2026 6 2025 2

roles

background 3

polarities

background 3

clear filters

representative citing papers

Inference-Time Scaling for Joint Audio-Video Generation

cs.MM · 2026-06-02 · unverdicted · novelty 7.0

Presents multi-verifier framework and Adaptive Reward Weighting (ARW) for inference-time scaling in joint audio-video generation, reporting gains in alignment and synchronization on VGGSound and JavisBench-mini.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Inference-Time Scaling for Joint Audio-Video Generation cs.MM · 2026-06-02 · unverdicted · none · ref 12

    Presents multi-verifier framework and Adaptive Reward Weighting (ARW) for inference-time scaling in joint audio-video generation, reporting gains in alignment and synchronization on VGGSound and JavisBench-mini.