pith. sign in

arxiv: 1703.03130 · v1 · pith:NKK33Z2Dnew · submitted 2017-03-09 · 💻 cs.CL · cs.AI· cs.LG· cs.NE

A Structured Self-attentive Sentence Embedding

classification 💻 cs.CL cs.AIcs.LGcs.NE
keywords embeddingsentencemodeldifferentmatrixself-attentiontasksattending
0
0 comments X
read the original abstract

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

    cs.AI 2026-05 unverdicted novelty 7.0

    FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

  2. Graph Attention Networks

    stat.ML 2017-10 accept novelty 7.0

    Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein...

  3. Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

    cs.MM 2026-04 unverdicted novelty 6.0

    PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

  4. Cognitive State Inference from VR Motion via Motion Foundation Model

    cs.HC 2025-09 unverdicted novelty 6.0

    VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24...

  5. DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

    cs.CL 2019-07 unverdicted novelty 6.0

    DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.

  6. Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention

    cs.CV 2019-07 unverdicted novelty 6.0

    DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.

  7. Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information

    stat.ML 2019-06 unverdicted novelty 6.0

    DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.

  8. Universal Transformers

    cs.CL 2018-07 unverdicted novelty 6.0

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  9. AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data

    cs.LG 2019-07 unverdicted novelty 5.0

    AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.

  10. Attention Is All You Need

    cs.CL 2017-06 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.

  11. Automatically Learning Construction Injury Precursors from Text

    cs.CL 2019-07 unverdicted novelty 4.0

    Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.