Pith Number

pith:ZYGI7GDF

pith:2025:ZYGI7GDFMT4DDDNSWFLS4HA7KA

not attested not anchored not stored refs resolved

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability

Dongbin Zhao, Runyu Lu, Ruochuan Shi, Yuanheng Zhu

Belief preservation extends dynamic programming to partial observability for real-time robust pursuit policies.

arxiv:2511.17367 v2 · 2025-11-21 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{ZYGI7GDFMT4DDDNSWFLS4HA7KA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

After reinforcement learning, our policy achieves robust zero-shot generalization to unseen real-world graph structures and consistently outperforms the policy directly trained on the test graphs by the existing game RL approach.

C2weakest assumption

The belief preservation mechanism successfully extends the optimality properties of the dynamic programming strategies to the partially observable setting while preserving worst-case robustness against asynchronous evader moves.

C3one line summary

R2PS combines a proof that dynamic programming remains optimal under asynchronous evader moves, a belief preservation mechanism for partial observability, and integration into equilibrium policy generalization to produce real-time pursuer policies that zero-shot generalize to unseen graphs.

References

8 extracted · 8 resolved · 1 Pith anchors

[1] Self-learning exploration and mapping for mobile robots via deep reinforcement learning 2019

[2] Soft actor-critic for discrete action settings.arXiv preprint arXiv:1910.07207, 1910

[3] Soft Actor-Critic Algorithms and Applications · arXiv:1812.05905

[4] Pursuit-evasion games with unmanned ground and aerial vehicles 2001

[5] Solving urban network security games: Learning platform, benchmark, and challenge for AI research.arXiv preprint arXiv:2501.17559,

Receipt and verification

First computed	2026-05-17T23:39:00.721826Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

ce0c8f986564f8318db2b1572e1c1f5002fd766b0e0b1de7890a31a3c898dcce

Aliases

arxiv: 2511.17367 · arxiv_version: 2511.17367v2 · doi: 10.48550/arxiv.2511.17367 · pith_short_12: ZYGI7GDFMT4D · pith_short_16: ZYGI7GDFMT4DDDNS · pith_short_8: ZYGI7GDF

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZYGI7GDFMT4DDDNSWFLS4HA7KA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ce0c8f986564f8318db2b1572e1c1f5002fd766b0e0b1de7890a31a3c898dcce

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e4a8862208b98700eff74a494b7870c33cba3f365a2240feb08c0768908b52cc",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-11-21T16:34:00Z",
    "title_canon_sha256": "9ec53ac2747a1633d8e4e00ab6787dbee4d4c7e1b7ee82fc2505191314f977e7"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2511.17367",
    "kind": "arxiv",
    "version": 2
  }
}