pith. sign in
Pith Number

pith:LLYFYPQ7

pith:2026:LLYFYPQ7KKAXCZ5CC5ET4AMJQ5
not attested not anchored not stored refs pending

Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning

Jianshu Zhang, Jinzong Dong, Nanyang Ye, Qinying Gu, Wei Huang, Xinzhe Yuan, Zhaohui Jiang, Zhuo Chen

Proximal action replacement overcomes the imitation ceiling in BC-regularized actor-critic by substituting suboptimal dataset actions with value-guided improvements.

arxiv:2602.07441 v2 · 2026-02-07 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LLYFYPQ7KKAXCZ5CC5ET4AMJQ5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

PAR consistently improves performance across offline RL benchmarks and approaches state-of-the-art results simply by being combined with the basic TD3+BC.

C2weakest assumption

That actions generated by the stable target policy, guided by local ascent of the action-value function and bounded by value uncertainty, can be substituted without destabilizing training or introducing new bias when dataset actions are suboptimal.

C3one line summary

Proximal action replacement breaks the imitation ceiling in BC-regularized offline RL actor-critic by substituting suboptimal dataset actions with value-guided improvements from a stable target policy.

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-17T23:39:00.026294Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5af05c3e1f52817167a217493e018987643dfe7037bdd563807d036431926018

Aliases

arxiv: 2602.07441 · arxiv_version: 2602.07441v2 · doi: 10.48550/arxiv.2602.07441 · pith_short_12: LLYFYPQ7KKAX · pith_short_16: LLYFYPQ7KKAXCZ5C · pith_short_8: LLYFYPQ7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LLYFYPQ7KKAXCZ5CC5ET4AMJQ5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5af05c3e1f52817167a217493e018987643dfe7037bdd563807d036431926018
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "dd379b69b869ef4149d7ec98a96f6edfb06f271348c99998106e1a120b549e75",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-07T08:44:27Z",
    "title_canon_sha256": "11ceda1decf65b89cb4513c31144ecd8ad1132fdbf83a6e4a8002b631649229e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.07441",
    "kind": "arxiv",
    "version": 2
  }
}