Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.
arXiv preprint arXiv:1907.07174 , Title =
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
LEASE achieves state-of-the-art unified performance on ImageNet-1K by combining masked token reconstruction and codebook contrast losses in a one-time precomputed discrete token space.
MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
citing papers explorer
-
Scaling Laws for Transfer
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.