pith. machine review for the scientific record. sign in

arxiv: 1410.0510 · v1 · submitted 2014-10-02 · 💻 cs.LG · cs.NE

Recognition: unknown

Deep Sequential Neural Network

Authors on Pith no claims yet
classification 💻 cs.LG cs.NE
keywords modeltransformationsclassicallearningnetworkneuralaccordingdifferent
0
0 comments X
read the original abstract

Neural Networks sequentially build high-level features through their successive layers. We propose here a new neural network model where each layer is associated with a set of candidate mappings. When an input is processed, at each layer, one mapping among these candidates is selected according to a sequential decision process. The resulting model is structured according to a DAG like architecture, so that a path from the root to a leaf node defines a sequence of transformations. Instead of considering global transformations, like in classical multilayer networks, this model allows us for learning a set of local transformations. It is thus able to process data with different characteristics through specific sequences of such local transformations, increasing the expression power of this model w.r.t a classical multilayered network. The learning algorithm is inspired from policy gradient techniques coming from the reinforcement learning domain and is used here instead of the classical back-propagation based gradient descent techniques. Experiments on different datasets show the relevance of this approach.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    cs.LG 2017-01 accept novelty 8.0

    A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

  2. Adaptive Computation Time for Recurrent Neural Networks

    cs.NE 2016-03 accept novelty 8.0

    ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.