MM-GS combines per-instance multi-view fusion with scene-level interaction modeling on 3D Gaussians to render high-fidelity multi-human multi-object scenes from sparse views.
Physically Plausible Human-Object Rendering from Sparse Views via 3D Gaussian Splatting
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Rendering realistic human-object interactions (HOIs) from sparse-view inputs is a challenging yet crucial task for various real-world applications. Existing methods often struggle to simultaneously achieve high rendering quality, physical plausibility, and computational efficiency. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient HOI rendering with physically plausible geometric constraints from sparse views. HOGS represents both humans and objects as dynamic 3D Gaussians. Central to HOGS is a novel optimization process that operates directly on these Gaussians to enforce geometric consistency (i.e., preventing inter-penetration or floating contacts) to achieve physical plausibility. To support this core optimization under sparse-view ambiguity, our framework incorporates two pre-trained modules: an optimization-guided Human Pose Refiner for robust estimation under sparse-view occlusions, and a Human-Object Contact Predictor that efficiently identifies interaction regions to guide our novel contact and separation losses. Extensive experiments on both human-object and hand-object interaction datasets demonstrate that HOGS achieves state-of-the-art rendering quality and maintains high computational efficiency.
citation-role summary
citation-polarity summary
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Rendering Multi-Human and Multi-Object with 3D Gaussian Splatting
MM-GS combines per-instance multi-view fusion with scene-level interaction modeling on 3D Gaussians to render high-fidelity multi-human multi-object scenes from sparse views.