pith. machine review for the scientific record. sign in

hub

Highway Networks.arXiv2015

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it
abstract

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem. In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks. We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on "information highways". The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.

hub tools

representative citing papers

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

Transformers with Selective Access to Early Representations

cs.LG · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.

Searching for Activation Functions

cs.NE · 2017-10-16 · conditional · novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.

Wide Residual Networks

cs.CV · 2016-05-23 · accept · novelty 7.0

Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.

Set Prediction for Next-Day Active Fire Forecasting

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

WISP reformulates next-day active fire forecasting as point-set prediction and reports 38.2% AP, 53.4% FRP-weighted coverage, and 54.1% localization within 5 km on a global held-out test set.

citing papers explorer

Showing 13 of 13 citing papers.