Pith Number

pith:J2ZP6W3C

pith:2024:J2ZP6W3C32VB7BE6W7PY6LNXBR

not attested not anchored not stored refs resolved

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Bill Yuchen Lin, Fengqing Jiang, Luyao Niu, Radha Poovendran, Yejin Choi, Yuntian Deng, Zhangchen Xu

Prompting aligned LLMs like Llama-3-Instruct with only left-side conversation templates produces millions of realistic user queries and responses for alignment training.

arxiv:2406.08464 v2 · 2024-06-12 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{J2ZP6W3C32VB7BE6W7PY6LNXBR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning.

C2weakest assumption

The generated user queries produced by prompting with left-side templates are sufficiently diverse, realistic, and representative of real user needs to support effective alignment after filtering.

C3one line summary

Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.

References

126 extracted · 126 resolved · 22 Pith anchors

[5] and Stoica, Ion and Xing, Eric P

[6] Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts , author=. EMNLP , year=

[8] Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages= 2022

[9] Improving language understanding by generative pre-training , author=. 2018 , publisher= 2018

[11] International Conference on Machine Learning , pages= 2023

Cited by

28 papers in Pith

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Enhancing Speech Large Language Models through Reinforced Behavior Alignment

NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

eLLM: Elastic Memory Management Framework for Efficient LLM Serving

Receipt and verification

First computed	2026-05-17T23:38:48.765889Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

4eb2ff5b62deaa1f849eb7df8f2db70c4f63c4ea4d62d18b0ef415a91440864b

Aliases

arxiv: 2406.08464 · arxiv_version: 2406.08464v2 · doi: 10.48550/arxiv.2406.08464 · pith_short_12: J2ZP6W3C32VB · pith_short_16: J2ZP6W3C32VB7BE6 · pith_short_8: J2ZP6W3C

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/J2ZP6W3C32VB7BE6W7PY6LNXBR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4eb2ff5b62deaa1f849eb7df8f2db70c4f63c4ea4d62d18b0ef415a91440864b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "aa7a052570c78eae2856548d7696d715493bb9648d3061b74e968934a92de08e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-06-12T17:52:30Z",
    "title_canon_sha256": "aa1bd039007ba2f07d8106782e79a7b8ca176dfec737130f7a637de5d9eec978"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.08464",
    "kind": "arxiv",
    "version": 2
  }
}