{"work":{"id":"eb970d64-41ff-4e44-afd0-e3bb975e0dc4","openalex_id":null,"doi":null,"arxiv_id":"1901.02860","raw_key":null,"title":"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context","authors":null,"authors_text":"Zihang Dai, Zhilin Yang, Yiming Yang, William W Cohen, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov","year":2019,"venue":"cs.LG","abstract":"Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.","external_url":"https://arxiv.org/abs/1901.02860","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-14T23:07:42.576376+00:00","pith_arxiv_id":"1901.02860","created_at":"2026-05-10T06:11:20.315770+00:00","updated_at":"2026-05-14T23:07:42.576376+00:00","title_quality_ok":true,"display_title":"Transformer-xl: Attentive language models beyond a ﬁxed-length context","render_title":"Transformer-xl: Attentive language models beyond a ﬁxed-length context"},"hub":{"state":{"work_id":"eb970d64-41ff-4e44-afd0-e3bb975e0dc4","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":17,"external_cited_by_count":null,"distinct_field_count":6,"first_pith_cited_at":"2019-09-17T19:42:54+00:00","last_pith_cited_at":"2026-05-10T08:14:14+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-15T08:27:31.501195+00:00","tier_text":"hub"},"tier":"hub","role_counts":[],"polarity_counts":[],"runs":{},"summary":{},"graph":{},"authors":[]}}