Pith Number

pith:CSY654CR

pith:2025:CSY654CRWD64OIPFZ2LWXPSWC2

not attested not anchored not stored refs resolved

Phi-4-reasoning Technical Report

Ahmed Awadallah, Arindam Mitra, Besmira Nushi, Caio C\'esar Teodoro Mendes, Dimitris Papailiopoulos, Guoqing Zheng, Gustavo de Rosa, Harkirat Behl, Lingjiao Chen, Marah Abdin, Mojan Javaheripi, Neel Joshi, Olli Saarikivi, Piero Kauffmann, Safoora Yousefi, Sahaj Agarwal, Shital Shah, Suriya Gunasekar, Vaishnavi Shrivastava, Vibhav Vineet, Vidhisha Balachandran, Yash Lara, Yue Wu

A 14-billion parameter model trained on curated teachable prompts and o3-mini demonstrations reaches performance levels of much larger reasoning systems.

arxiv:2504.21318 v1 · 2025-04-30 · cs.AI · cs.CL

Open paper page JSON What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

Claims

C1strongest claim

Across a wide range of reasoning tasks, both models outperform significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B model and approach the performance levels of full DeepSeek-R1 model.

C2weakest assumption

That the performance improvements stem primarily from the curated 'teachable' prompts and o3-mini demonstrations rather than from undisclosed details of the base Phi-4 model, evaluation choices, or overlap with the teacher model's training data.

C3one line summary

A 14B reasoning model trained via supervised fine-tuning on selected prompts and o3-mini traces, plus outcome RL, outperforms larger open models like DeepSeek-R1-Distill-Llama-70B on math, coding, planning and related benchmarks.

References

64 extracted · 64 resolved · 20 Pith anchors

[1] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 2024 · arXiv:2404.14219

[2] Phi-4 Technical Report 2024 · arXiv:2412.08905

[3] KITAB: evaluating llms on constraint satisfaction for information retrieval 2024

[4] AIME. Aime 83-24. https://huggingface.co/datasets/lchen001/AIME1983_2024, 2024. Accessed: 2025- 03-17 2024

[5] AIME. Aime 2025. https://huggingface.co/datasets/lchen001/AIME2025, 2025. Accessed: 2025-03-17 2025

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

A Survey of Reinforcement Learning for Large Reasoning Models

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning

Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling

Receipt and verification

First computed	2026-05-17T23:38:15.236139Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

14b1eef051b0fdc721e5ce976bbe56168c5f4b9b3db39f240432fa7349969614

Aliases

arxiv: 2504.21318 · arxiv_version: 2504.21318v1 · doi: 10.48550/arxiv.2504.21318

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/CSY654CRWD64OIPFZ2LWXPSWC2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 14b1eef051b0fdc721e5ce976bbe56168c5f4b9b3db39f240432fa7349969614

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "34014de1e10e2d80b34c097e7349a5a05a3af62c4f55a5452f4ea11ba606fe0f",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-04-30T05:05:09Z",
    "title_canon_sha256": "88aed280d7a9e33a84ea4a72eb25e0c5b88ae04d91c643a83976bc6cae82e7f8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.21318",
    "kind": "arxiv",
    "version": 1
  }
}