hub

A Structured Self-attentive Sentence Embedding

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio · 2017 · cs.CL · arXiv 1703.03130

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 other 1

citation-polarity summary

unclear 2

representative citing papers

Emergent Culture in Minimal LLM Systems

cs.NE · 2026-06-21 · unverdicted · novelty 7.0

Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Cognitive State Inference from VR Motion via Motion Foundation Model

cs.HC · 2025-09-29 · unverdicted · novelty 6.0

VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

cs.CL · 2019-07-25 · unverdicted · novelty 6.0

DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.

Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention

cs.CV · 2019-07-20 · unverdicted · novelty 6.0

DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.

Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information

stat.ML · 2019-06-21 · unverdicted · novelty 6.0

DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

cs.MM · 2026-04-07 · unverdicted · novelty 6.0

PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data

cs.LG · 2019-07-12 · unverdicted · novelty 5.0

AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Automatically Learning Construction Injury Precursors from Text

cs.CL · 2019-07-26 · unverdicted · novelty 4.0

Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.

citing papers explorer

Showing 12 of 12 citing papers.

Emergent Culture in Minimal LLM Systems cs.NE · 2026-06-21 · unverdicted · none · ref 14 · internal anchor
Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.
FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences cs.AI · 2026-05-09 · unverdicted · none · ref 38
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 11
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Cognitive State Inference from VR Motion via Motion Foundation Model cs.HC · 2025-09-29 · unverdicted · none · ref 17 · internal anchor
VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks cs.CL · 2019-07-25 · unverdicted · none · ref 9 · internal anchor
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention cs.CV · 2019-07-20 · unverdicted · none · ref 17 · internal anchor
DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.
Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information stat.ML · 2019-06-21 · unverdicted · none · ref 25 · internal anchor
DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.
Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis cs.MM · 2026-04-07 · unverdicted · none · ref 30
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 17
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data cs.LG · 2019-07-12 · unverdicted · none · ref 17 · internal anchor
AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 22
Pith review generated a malformed one-line summary.
Automatically Learning Construction Injury Precursors from Text cs.CL · 2019-07-26 · unverdicted · none · ref 51 · internal anchor
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.

A Structured Self-attentive Sentence Embedding

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer