OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
citation dossier
A deep reinforced model for abstractive summarization
why this work matters in Pith
Pith has found this work in 4 reviewed papers. Its strongest current cluster is cs.CL (2 papers). The largest review-status bucket among citing papers is UNVERDICTED (3 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
representative citing papers
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
Pith review generated a malformed one-line summary.
citing papers explorer
-
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
-
Fine-Tuning Language Models from Human Preferences
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
-
Attention Is All You Need
Pith review generated a malformed one-line summary.