pith. machine review for the scientific record. sign in

arxiv: 1703.03130 · v1 · submitted 2017-03-09 · 💻 cs.CL · cs.AI· cs.LG· cs.NE

Recognition: unknown

A Structured Self-attentive Sentence Embedding

Bing Xiang, Bowen Zhou, Cicero Nogueira dos Santos, Minwei Feng, Mo Yu, Yoshua Bengio, Zhouhan Lin

classification 💻 cs.CL cs.AIcs.LGcs.NE
keywords embeddingsentencemodeldifferentmatrixself-attentiontasksattending
0
0 comments X
read the original abstract

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

    cs.AI 2026-05 unverdicted novelty 7.0

    FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

  2. Graph Attention Networks

    stat.ML 2017-10 accept novelty 7.0

    Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein...

  3. Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

    cs.MM 2026-04 unverdicted novelty 6.0

    PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

  4. Universal Transformers

    cs.CL 2018-07 unverdicted novelty 6.0

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  5. Attention Is All You Need

    cs.CL 2017-06 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.