On the Number of Linear Regions of Deep Neural Networks.arXiv2014

Montúfar, G · arXiv 1402.1869

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

cs.CL · 2025-05-10 · conditional · novelty 6.0

Applying a head-specific sigmoid gate after SDPA in LLMs boosts performance and stability by adding non-linearity and query-dependent sparse modulation while reducing attention sinks.

A Transfer Learning Evaluation of Deep Neural Networks for Image Classification

cs.CV · 2026-05-12 · unverdicted · novelty 2.0

Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.

citing papers explorer

Showing 2 of 2 citing papers.

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free cs.CL · 2025-05-10 · conditional · none · ref 19
Applying a head-specific sigmoid gate after SDPA in LLMs boosts performance and stability by adding non-linearity and query-dependent sparse modulation while reducing attention sinks.
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification cs.CV · 2026-05-12 · unverdicted · none · ref 23
Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.

On the Number of Linear Regions of Deep Neural Networks.arXiv2014

fields

years

verdicts

representative citing papers

citing papers explorer