pith. sign in

Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Video Scene Graph Generation aims to obtain structured semantic representations of objects and their relationships in videos for high-level understanding. However, existing methods still have limitations in handling long-tail distributions. This paper proposes the Frequency-guided Relational Multi-level Reasoning (FReMuRe) model, which enhances the modeling ability of long-tail relationships from a mechanism perspective. We introduce relation-specific branches to deal gradient conflicts, yielding more balanced and tail-aware learning. And we design a frequency-aware dual-branch predicate embedding network to model high-frequency and low-frequency relationships separately and improve the recall rate of tail classes through gated fusion. Meanwhile, we propose two types of interchangeable relation classification heads: Bayesian Head for uncertainty estimation and new Gaussian Mixture Model Head to enhance intra-class diversity. Experimental results show that FReMuRe significantly improves the recall rate of long-tail relationships and overall reasoning robustness on the Action Genome dataset.

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video cs.CV · 2026-04-19 · unverdicted · none · ref 1 · internal anchor

    FReMuRe improves recall of long-tail relationships in video scene graphs via relation-specific branches, frequency-aware predicate embeddings, and new Bayesian/GMM classification heads, with reported gains on the Action Genome dataset.