pith. sign in

arxiv: 1707.06119 · v1 · pith:DPZWSD4Cnew · submitted 2017-07-19 · 💻 cs.CV

Discriminative convolutional Fisher vector network for action recognition

classification 💻 cs.CV
keywords networkarchitecturefishervectoractionclassicalconvolutionallayers
0
0 comments X
read the original abstract

In this work we propose a novel neural network architecture for the problem of human action recognition in videos. The proposed architecture expresses the processing steps of classical Fisher vector approaches, that is dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction, as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, having them defined as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. Furthermore, we show that the proposed architecture can be used as a replacement for the fully connected layers in popular convolutional networks achieving a comparable classification performance, or even significantly surpassing the performance of similar architectures while reducing the total number of trainable parameters by a factor of 5. We show that our method achieves significant improvements in comparison to the classical chain.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

    cs.CV 2019-07 unverdicted novelty 6.0

    TARN uses episode-based meta-learning with temporal attention for alignment and segment-level distance learning to outperform prior methods on few-shot action recognition while remaining competitive on zero-shot.