pith:4OXKXQ4P
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Visual generation ability emerges as a natural byproduct of improved visual understanding in instruction-tuned LLMs.
arxiv:2412.14164 v1 · 2024-12-18 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4OXKXQ4PBCCPHOCVZK2PKPZTA2}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
visual generation ability emerges as a natural byproduct of improved visual understanding, and can be unlocked efficiently with a small amount of generation data
That the curated instruction-following multimodal datasets are sufficient to reveal general emergence of generation from understanding and that results will transfer beyond the specific models and data mixtures tested.
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.674381Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e3aeabc38f0884f3b855cab4f53f330696ad3beecbe15677a1497d065c6c83d6
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4OXKXQ4PBCCPHOCVZK2PKPZTA2 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e3aeabc38f0884f3b855cab4f53f330696ad3beecbe15677a1497d065c6c83d6
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "76dfbffcf154571b952855eb877a4fa7e54ec04dfe3914dcbd04f4a201adbe57",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-12-18T18:58:50Z",
"title_canon_sha256": "20c025291f77e7cde39c6acc6ed7ebea9dd3abbcaa26f3b99b95e26f8161a30c"
},
"schema_version": "1.0",
"source": {
"id": "2412.14164",
"kind": "arxiv",
"version": 1
}
}