pith. machine review for the scientific record. sign in

arxiv: 2512.19049 · v2 · submitted 2025-12-22 · 💻 cs.CV

Recognition: unknown

Decoupled Generative Modeling for Human-Object Interaction Synthesis

Authors on Pith no claims yet
classification 💻 cs.CV
keywords human-objectinteractionsynthesisactiondechoidecoupledgenerativegenerator
0
0 comments X
read the original abstract

Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion or penetration. To address these issues, we propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI), which separates path planning and action synthesis. A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions. To further improve contact realism, we employ adversarial training with a discriminator that focuses on the dynamics of distal joints. The framework also models a moving counterpart and supports responsive, long-sequence planning in dynamic scenes, while preserving plan consistency. Across two benchmarks, FullBodyManipulation and 3D-FUTURE, DecHOI surpasses prior methods on most quantitative metrics and qualitative evaluations, and perceptual studies likewise prefer our results.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Uni-HOI:A Unified framework for Learning the Joint distribution of Text and Human-Object Interaction

    cs.CV 2026-04 unverdicted novelty 5.0

    Uni-HOI learns the joint distribution of text, human motion, and object motion using LLMs and VQ-VAEs in a two-stage training process for multiple HOI tasks.