OmniAlpha is a unified multi-task RL framework that uses an alpha-aware VAE, sequence-to-sequence Diffusion Transformer, and layer-aware rewards to improve transparency-aware generation across five task categories.
Gpt-4 technical report, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2representative citing papers
XComp reaches extreme video compression (one token per selective frame) via learnable progressive token compression and question-conditioned frame selection, lifting LVBench accuracy from 42.9 percent to 46.2 percent after tuning on 2.5 percent of standard data.
citing papers explorer
-
OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning
OmniAlpha is a unified multi-task RL framework that uses an alpha-aware VAE, sequence-to-sequence Diffusion Transformer, and layer-aware rewards to improve transparency-aware generation across five task categories.
-
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
XComp reaches extreme video compression (one token per selective frame) via learnable progressive token compression and question-conditioned frame selection, lifting LVBench accuracy from 42.9 percent to 46.2 percent after tuning on 2.5 percent of standard data.