LongForm: Effective Instruction Tuning with Reverse Instructions

Abdullatif K\"oksal; Anna Korhonen; Hinrich Sch\"utze; Timo Schick

arxiv: 2304.08460 · v3 · pith:PIWGKQ35new · submitted 2023-04-17 · 💻 cs.CL · cs.AI· cs.LG

LongForm: Effective Instruction Tuning with Reverse Instructions

Abdullatif K\"oksal , Timo Schick , Anna Korhonen , Hinrich Sch\"utze This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords modelsinstructionsinstructionlanguagellmslongformreversetuning

0 comments

read the original abstract

Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further. We publicly release our data and models: https://github.com/akoksal/LongForm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
QLoRA: Efficient Finetuning of Quantized LLMs
cs.LG 2023-05 conditional novelty 7.0

QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.