Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
Train No Evil: Selective Masking for Task-Guided Pre-Training
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Lil-Bevo applies music pretraining, curriculum learning on sequence length, and targeted masking to small LMs in the BabyLM challenge, finding modest gains from short sequences but overall limited performance.
citing papers explorer
-
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
-
Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
Lil-Bevo applies music pretraining, curriculum learning on sequence length, and targeted masking to small LMs in the BabyLM challenge, finding modest gains from short sequences but overall limited performance.