arXiv preprint arXiv:2412.09250 , year=

· 2024 · arXiv 2412.09250

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

cs.CL · 2026-04-23 · unverdicted · novelty 5.0

GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

cs.LG · 2026-06-05 · unverdicted · novelty 4.0

A 120B sparse MoE model with 460 experts was trained on one 8-GPU node to loss 1.78 using reversible recurrence and state-preserving scaling from a 1.78B dense seed, with 5.93B active parameters.

citing papers explorer

Showing 2 of 2 citing papers.

GiVA: Gradient-Informed Bases for Vector-Based Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 53
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.
Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling cs.LG · 2026-06-05 · unverdicted · none · ref 37
A 120B sparse MoE model with 460 experts was trained on one 8-GPU node to loss 1.78 using reversible recurrence and state-preserving scaling from a 1.78B dense seed, with 5.93B active parameters.

arXiv preprint arXiv:2412.09250 , year=

fields

years

verdicts

representative citing papers

citing papers explorer