MoDS: Model-Oriented Data Selection for Instruc- tion Tuning

BERT: Pre-training of deep bidirectional transformers for language understanding · 2019 · arXiv 2311.15653

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

PROXYMIX learns a dynamic replay controller on a small proxy model and transfers it to a large target model, improving accuracy by 3.4 points and reducing forgetting by 3.5 points on LLaMA-3-8B continual tuning sequences.

MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning

cs.CL · 2026-05-29 · unverdicted · novelty 5.0

MADS selects a 15% core set from the 52K Alpaca-GPT4 dataset via activations in Llama-3.2-3B-Instruct, yielding 2.5% average gains on 7B-13B models across six benchmarks versus full-data training.

NVILA: Efficient Frontier Visual Language Models

cs.CV · 2024-12-05 · unverdicted · novelty 5.0

NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

cs.CV · 2025-02-14 · unverdicted · novelty 4.0

Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning cs.LG · 2026-05-29 · unverdicted · none · ref 31
PROXYMIX learns a dynamic replay controller on a small proxy model and transfers it to a large target model, improving accuracy by 3.4 points and reducing forgetting by 3.5 points on LLaMA-3-8B continual tuning sequences.

MoDS: Model-Oriented Data Selection for Instruc- tion Tuning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer