A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
CoRR abs/2105.15203(2021),https://arxiv.org/abs/2105.15203
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
RAIL-BENCH is the first standardized benchmark suite for railway perception with five challenges, real-world datasets, and a novel LineAP metric for rail track detection.
SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
MMSD and SAMR achieve 99 percent and 99.1 percent average data reduction for traffic images by transmitting segmentation maps, edges, text or semantically masked JPEGs and reconstructing via diffusion or inpainting models.
citing papers explorer
-
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
-
Railway Artificial Intelligence Learning Benchmark (RAIL-BENCH): A Benchmark Suite for Perception in the Railway Domain
RAIL-BENCH is the first standardized benchmark suite for railway perception with five challenges, real-world datasets, and a novel LineAP metric for rail track detection.
-
SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
-
Efficient Semantic Image Communication for Traffic Monitoring at the Edge
MMSD and SAMR achieve 99 percent and 99.1 percent average data reduction for traffic images by transmitting segmentation maps, edges, text or semantically masked JPEGs and reconstructing via diffusion or inpainting models.