pith:GKKRPTQS
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
A new dataset of over 100K words benchmarks scene text recognition for 11 Indian languages and English.
arxiv:2511.23071 v2 · 2025-11-28 · cs.CV · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GKKRPTQS5RHEDHZ33SY7SZOWDC}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we introduce the Bharat Scene Text Dataset (BSTD) - a large-scale and comprehensive benchmark for studying Indian Language Scene Text Recognition. It comprises more than 100K words that span 11 Indian languages and English, sourced from over 6,500 scene images captured across various linguistic regions of India.
The collected images and annotations are sufficiently diverse, high-quality, and representative of real-world Indian script variations to meaningfully advance recognition performance when English models are fine-tuned on them.
Bharat Scene Text Dataset supplies a large-scale, multi-task benchmark for scene text understanding across 11 Indian languages and English to address data scarcity in non-Latin scripts.
Formal links
Receipt and verification
| First computed | 2026-06-19T16:12:49.349052Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
329517ce12ec4e419f3bdcb1f965d6188f622cbc16c6c61b81246484ce23447a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GKKRPTQS5RHEDHZ33SY7SZOWDC \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 329517ce12ec4e419f3bdcb1f965d6188f622cbc16c6c61b81246484ce23447a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d688b4728469a6338d6e132ed1dc1d2fac2a9a374b09b9bc7024e8c9878f698b",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2025-11-28T10:58:37Z",
"title_canon_sha256": "9bac0e4876f7109f11a8e169d2540d863b5e5ed4cdd4a7693c85ee17f8ad266e"
},
"schema_version": "1.0",
"source": {
"id": "2511.23071",
"kind": "arxiv",
"version": 2
}
}