hub

Convolutional Sequence to Sequence Learning

URL http://arxiv · 2017 · cs.CL · arXiv 1705.03122

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open full Pith review browse 13 citing papers arXiv PDF

abstract

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Generating Long Sequences with Sparse Transformers

cs.LG · 2019-04-23 · unverdicted · novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation

cs.CL · 2025-12-07 · unverdicted · novelty 6.0

Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

Time2Vec: Learning a Vector Representation of Time

cs.LG · 2019-07-11 · unverdicted · novelty 6.0

Time2Vec learns a vector representation of time that improves model performance when used in place of raw time inputs across various models and problems.

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

cs.CV · 2019-06-28 · unverdicted · novelty 6.0

Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.

Joint Detection of Malicious Domains and Infected Clients

cs.LG · 2019-06-21 · unverdicted · novelty 6.0

Sluice network transfer learning jointly detects infected clients and malicious domains from HTTPS traffic, outperforming separate models and identifying previously unknown threats.

K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.

YaRN: Efficient Context Window Extension of Large Language Models

cs.CL · 2023-08-31 · unverdicted · novelty 6.0

YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model

astro-ph.SR · 2026-04-23 · unverdicted · novelty 5.0

A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Deep Learning for Time Series Forecasting: The Electric Load Case

cs.LG · 2019-07-22 · unverdicted · novelty 4.0

Compares feedforward, recurrent, sequence-to-sequence and temporal convolutional neural networks for short-term electric load forecasting through experiments on two real datasets.

Incremental Adaptation of NMT for Professional Post-editors: A User Study

cs.CL · 2019-06-21 · unverdicted · novelty 4.0

User study with professional translators shows that incremental online adaptation of NMT reduces post-editing effort and improves translation quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.

K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data cs.LG · 2026-04-10 · unverdicted · none · ref 11
K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.

Convolutional Sequence to Sequence Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer