pith. sign in

hub Mixed citations

Neural Machine Translation of Rare Words with Subword Units

Mixed citation behavior. Most common role is method (67%).

49 Pith papers citing it
Method 67% of classified citations
abstract

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.1 and 1.3 BLEU, respectively.

hub tools

citation-role summary

method 4 background 2

citation-polarity summary

clear filters

representative citing papers

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

eess.AS · 2025-10-07 · unverdicted · novelty 7.0

TokenChain demonstrates that a discrete semantic-token interface can sustain effective chain learning between ASR and TTS, yielding faster convergence and lower error rates on LibriSpeech and TED-LIUM.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Fine-Tuning Language Models from Human Preferences

cs.CL · 2019-09-18 · unverdicted · novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

Multi Language Models for On-the-Fly Syntax Highlighting

cs.SE · 2025-10-05 · unverdicted · novelty 6.0

Unified multi-language deep learning model for on-the-fly syntax highlighting using normalization and few-shot learning to support six languages with lower deployment cost.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

cs.RO · 2025-01-16 · unverdicted · novelty 6.0

FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models cs.LG · 2026-05-11 · unverdicted · none · ref 34 · internal anchor

    Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.

  • FAST: Efficient Action Tokenization for Vision-Language-Action Models cs.RO · 2025-01-16 · unverdicted · none · ref 57 · internal anchor

    FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

  • Scaling Autoregressive Models for Content-Rich Text-to-Image Generation cs.CV · 2022-06-22 · unverdicted · none · ref 32 · internal anchor

    Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

  • LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 91 · internal anchor

    LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.