Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A noisy quadratic system predicts large model test losses from N, B, K and outperforms Chinchilla's model for extrapolation up to 1000x compute.
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.
citing papers explorer
-
Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data
Synthetic pre-pre-training on structured data improves LLM robustness to noisy pre-training, matching baseline loss with up to 49% fewer natural tokens for a 1B model.
-
Predicting Large Model Test Losses with a Noisy Quadratic System
A noisy quadratic system predicts large model test losses from N, B, K and outperforms Chinchilla's model for extrapolation up to 1000x compute.
-
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.