pith. machine review for the scientific record. sign in

citation dossier

A structured self-attentive sentence embedding.arXiv preprint arXiv:1703.03130

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio · 2017 · arXiv 1703.03130

5Pith papers citing it
5reference links
cs.CLtop field · 2 papers
UNVERDICTEDtop verdict bucket · 4 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 5 reviewed papers. Its strongest current cluster is cs.CL (2 papers). The largest review-status bucket among citing papers is UNVERDICTED (4 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 5 of 5 citing papers.

  • FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences cs.AI · 2026-05-09 · unverdicted · none · ref 38

    FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

  • Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 11

    Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

  • Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis cs.MM · 2026-04-07 · unverdicted · none · ref 30

    PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

  • Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 17

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  • Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 22

    Pith review generated a malformed one-line summary.