pith. sign in
Pith Number

pith:G3QQ7POG

pith:2026:G3QQ7POGVFS2KAENSVZT6KAVWH
not attested not anchored not stored refs resolved

MMSkills: Towards Multimodal Skills for General Visual Agents

Jianghao Lin, Kangning Zhang, Lingyue Fu, Qingyao Li, Shijian Wang, Shuai Shao, Weinan Zhang, Weiwen Liu, Wenxiang Jiao, Yong Yu, Yuan Lu

MMSkills equips visual agents with reusable packages of textual procedures, state cards, and multi-view keyframes derived from public trajectories.

arxiv:2605.13527 v2 · 2026-05-13 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{G3QQ7POGVFS2KAENSVZT6KAVWH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments across GUI and game-based visual-agent benchmarks show that MMSkills consistently improve both frontier and smaller multimodal agents, suggesting that external multimodal procedural knowledge complements model-internal priors.

C2weakest assumption

That the generated multimodal skills (state cards and keyframes) can be consulted at inference time without excessive image context or over-anchoring to reference screenshots, as stated in the problem formalization.

C3one line summary

MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.

References

40 extracted · 40 resolved · 21 Pith anchors

[1] Agent s: An open agentic framework that uses computers like a human
[2] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances · arXiv:2204.01691
[3] EvoSkill: Automated Skill Discovery for Multi-Agent Systems · arXiv:2603.02766
[4] Qwen3-VL Technical Report · arXiv:2511.21631
[5] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding · arXiv:2308.14508

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T02:44:24.291370Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

36e10fbdc6a965a5008d95733f2815b1c1d841e6e556ada5757e391960041f88

Aliases

arxiv: 2605.13527 · arxiv_version: 2605.13527v2 · doi: 10.48550/arxiv.2605.13527 · pith_short_12: G3QQ7POGVFS2 · pith_short_16: G3QQ7POGVFS2KAEN · pith_short_8: G3QQ7POG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/G3QQ7POGVFS2KAENSVZT6KAVWH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 36e10fbdc6a965a5008d95733f2815b1c1d841e6e556ada5757e391960041f88
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "76a3f6d7ee634029e7ff8b78b41a96b2433bb8e5c205dadbadab367e80e0734e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T13:40:31Z",
    "title_canon_sha256": "322f5e2aeeb7abcf2eaa5cfb19c085f9141a433850de629f89519f63ac7d2135"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13527",
    "kind": "arxiv",
    "version": 2
  }
}