Polynomial composition activations: Unleashing the dynamics of large language models.arXiv preprint arXiv:2411.03884, 2024

Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma · 2024 · arXiv 2411.03884

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

Mixture of Activations mixes activation functions token-adaptively in FFNs via lightweight gates, strictly more expressive than fixed or learnable activations, and yields lower pretraining loss from 0.12B to 2B models.

citing papers explorer

Showing 1 of 1 citing paper.

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations cs.LG · 2026-05-26 · unverdicted · none · ref 48
Mixture of Activations mixes activation functions token-adaptively in FFNs via lightweight gates, strictly more expressive than fixed or learnable activations, and yields lower pretraining loss from 0.12B to 2B models.

Polynomial composition activations: Unleashing the dynamics of large language models.arXiv preprint arXiv:2411.03884, 2024

fields

years

verdicts

representative citing papers

citing papers explorer