arXiv preprint arXiv:2410.09400 , year=

Yifeng Xu, Zhenliang He, Shiguang Shan, Xilin Chen · 2024 · arXiv 2410.09400

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

FlowBender introduces closed-loop training that lets conditional flow models learn correction policies from their own task-specific alignment errors, outperforming supervised and guidance baselines on fidelity and plausibility.

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

RefineAnything is a multimodal diffusion model using Focus-and-Refine crop-and-resize with blended paste-back to achieve high-fidelity local image refinement and near-perfect background preservation.

T-CLIP: Enabling Thermal Perception for Contrastive Language-Image Pretraining

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

T-CLIP introduces a physics-aware thermal captioning dataset (IR-Cap) and a decoupled dual-LoRA adaptation of CLIP that improves cross-modal retrieval on thermal benchmarks by separating scene-level and object-level thermal understanding.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

citing papers explorer

Showing 4 of 4 citing papers after filters.

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows cs.CV · 2026-06-18 · unverdicted · none · ref 63
FlowBender introduces closed-loop training that lets conditional flow models learn correction policies from their own task-specific alignment errors, outperforming supervised and guidance baselines on fidelity and plausibility.
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details cs.CV · 2026-04-08 · unverdicted · none · ref 47
RefineAnything is a multimodal diffusion model using Focus-and-Refine crop-and-resize with blended paste-back to achieve high-fidelity local image refinement and near-perfect background preservation.
T-CLIP: Enabling Thermal Perception for Contrastive Language-Image Pretraining cs.CV · 2026-05-30 · unverdicted · none · ref 87
T-CLIP introduces a physics-aware thermal captioning dataset (IR-Cap) and a decoupled dual-LoRA adaptation of CLIP that improves cross-modal retrieval on thermal benchmarks by separating scene-level and object-level thermal understanding.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 55
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

arXiv preprint arXiv:2410.09400 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer