pith. sign in
Pith Number

pith:GKKRPTQS

pith:2025:GKKRPTQS5RHEDHZ33SY7SZOWDC
not attested not anchored not stored refs pending

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

Abhirama Subramanyam Penamakuri, Aditya Rathore, Anand Mishra, Anik De, Devesh Sharma, Harshiv Shah, Pravin Kumar, Rajeev Yadav, Sagar Agarwal

A new dataset of over 100K words benchmarks scene text recognition for 11 Indian languages and English.

arxiv:2511.23071 v2 · 2025-11-28 · cs.CV · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GKKRPTQS5RHEDHZ33SY7SZOWDC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we introduce the Bharat Scene Text Dataset (BSTD) - a large-scale and comprehensive benchmark for studying Indian Language Scene Text Recognition. It comprises more than 100K words that span 11 Indian languages and English, sourced from over 6,500 scene images captured across various linguistic regions of India.

C2weakest assumption

The collected images and annotations are sufficiently diverse, high-quality, and representative of real-world Indian script variations to meaningfully advance recognition performance when English models are fine-tuned on them.

C3one line summary

Bharat Scene Text Dataset supplies a large-scale, multi-task benchmark for scene text understanding across 11 Indian languages and English to address data scarcity in non-Latin scripts.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-06-19T16:12:49.349052Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

329517ce12ec4e419f3bdcb1f965d6188f622cbc16c6c61b81246484ce23447a

Aliases

arxiv: 2511.23071 · arxiv_version: 2511.23071v2 · doi: 10.48550/arxiv.2511.23071 · pith_short_12: GKKRPTQS5RHE · pith_short_16: GKKRPTQS5RHEDHZ3 · pith_short_8: GKKRPTQS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GKKRPTQS5RHEDHZ33SY7SZOWDC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 329517ce12ec4e419f3bdcb1f965d6188f622cbc16c6c61b81246484ce23447a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d688b4728469a6338d6e132ed1dc1d2fac2a9a374b09b9bc7024e8c9878f698b",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-11-28T10:58:37Z",
    "title_canon_sha256": "9bac0e4876f7109f11a8e169d2540d863b5e5ed4cdd4a7693c85ee17f8ad266e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2511.23071",
    "kind": "arxiv",
    "version": 2
  }
}