pith. sign in
Pith Number

pith:Z4LEITEJ

pith:2026:Z4LEITEJSP34TTCZOUQJX774CI
not attested not anchored not stored refs resolved

SMA: Submodular Modality Aligner For Data Efficient Multimodal Learning

Anay Majee, Rishabh Iyer, Truong Pham

SMA aligns images and text by optimizing submodular mutual information over sets of descriptions rather than individual pairs, enabling strong zero-shot performance with only tens of thousands of samples.

arxiv:2605.12872 v1 · 2026-05-13 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Z4LEITEJSP34TTCZOUQJX774CI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SMA achieves strong multimodal generalization using only tens of thousands of samples. This is orders of magnitude fewer than standard approaches.

C2weakest assumption

That the set-based formulation with submodular mutual information captures richer cross-modal geometric structure and effectively utilizes multiple positive associations without introducing biases or requiring extensive post-hoc tuning that affects the reported gains.

C3one line summary

SMA uses a submodular mutual information objective on data sets to deliver competitive zero-shot classification and retrieval performance on CLIP benchmarks with only tens of thousands of samples, orders of magnitude fewer than standard approaches.

References

54 extracted · 54 resolved · 5 Pith anchors

[1] Liteembed: Adapting clip to rare classes, 2026 2026
[2] Theoretical analysis of submodular information measures for targeted data subset selection.ArXiv, abs/2402.13454, 2024 2024
[3] arXiv preprint arXiv:2106.15324 , year= 2021
[4] Submodularity in machine learning and artificial intelligence 2022
[5] A Simple Framework for Contrastive Learning of Visual Representations 2002 · arXiv:2002.05709

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:09:11.309725Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

cf16444c8993f7c9cc5975209bfffc1237c387fc3dd03eb1bd191ee76db88ed8

Aliases

arxiv: 2605.12872 · arxiv_version: 2605.12872v1 · doi: 10.48550/arxiv.2605.12872 · pith_short_12: Z4LEITEJSP34 · pith_short_16: Z4LEITEJSP34TTCZ · pith_short_8: Z4LEITEJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Z4LEITEJSP34TTCZOUQJX774CI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cf16444c8993f7c9cc5975209bfffc1237c387fc3dd03eb1bd191ee76db88ed8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "37d5a4f6f6b6911af0a05b1e4044f37883fd1a23a1b84f1c93f8e54194e3ba4c",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T01:36:43Z",
    "title_canon_sha256": "856537fdd4622b11ff2f965a6dfccd2428f52aac6093ff3c5b115f0a56176d62"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12872",
    "kind": "arxiv",
    "version": 1
  }
}