pith. machine review for the scientific record. sign in

hub

Neural Machine Translation of Rare Words with Subword Units

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it
abstract

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.1 and 1.3 BLEU, respectively.

hub tools

clear filters

representative citing papers

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Fine-Tuning Language Models from Human Preferences

cs.CL · 2019-09-18 · unverdicted · novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

cs.RO · 2025-01-16 · unverdicted · novelty 6.0

FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

In Search of Lost DNA Sequence Pretraining

cs.LG · 2026-04-17 · unverdicted · novelty 5.0

DNA pretraining suffers from inappropriate evaluation datasets, flawed neighbor-masking, and neglected vocabulary design; the authors supply guidelines and a reproducible testbed to fix them.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.