Single-shot HDR is achieved by conditioning a video diffusion model on an LDR input to generate an exposure bracket and fusing the bracket with per-pixel weights from a lightweight UNet.
Title resolution pending
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
ProCompNav improves success rate and shortens user responses in ambiguous instance navigation by using comparative binary questions that prune a candidate pool rather than requesting detailed descriptions.
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
ECTraj enhances consistency models for multi-agent trajectory prediction via improved student-teacher supervision and conditional top-K generation, yielding faster inference and competitive accuracy on Argoverse 2.
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
A UAV-to-3DGS-to-MPM pipeline reconstructs real landslide sites with photorealistic visuals and runs physics-based simulations, validated on a Hong Kong event.
Comparative study of DS-NeRF, TensoRF, and HashNeRF with depth-supervision and architectural variants finds no conclusive outperformance under equal training time but identifies which design choices transfer to low-data, low-compute regimes.
citing papers explorer
-
Single-Shot HDR Recovery via a Video Diffusion Prior
Single-shot HDR is achieved by conditioning a video diffusion model on an LDR input to generate an exposure bracket and fusing the bracket with per-pixel weights from a lightweight UNet.
-
HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis
HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
-
Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
-
Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
ProCompNav improves success rate and shortens user responses in ambiguous instance navigation by using comparative binary questions that prune a candidate pool rather than requesting detailed descriptions.
-
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
-
Enhancing Consistency Models for Multi-Agent Trajectory Prediction
ECTraj enhances consistency models for multi-agent trajectory prediction via improved student-teacher supervision and conditional top-K generation, yielding faster inference and competitive accuracy on Argoverse 2.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
-
UAV-Assisted Scan-to-Simulation for Landslides Using Physics-Informed Gaussian Splatting
A UAV-to-3DGS-to-MPM pipeline reconstructs real landslide sites with photorealistic visuals and runs physics-based simulations, validated on a Hong Kong event.
-
Low-Cost Neural Radiance Fields
Comparative study of DS-NeRF, TensoRF, and HashNeRF with depth-supervision and architectural variants finds no conclusive outperformance under equal training time but identifies which design choices transfer to low-data, low-compute regimes.