{"paper":{"title":"Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Target-aligned Coverage Expansion uses dual score-based generation to synthesize consistent transitions across domains in offline RL.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Gwanwoo Choi, Jeongmo Kim, Minung Kim, Seungyul Han","submitted_at":"2026-05-13T06:23:51Z","abstract_excerpt":"Cross-domain offline reinforcement learning aims to adapt a policy from a source domain to a target domain using only pre-collected datasets, where environment dynamics may differ. A key challenge is to leverage source data while reducing distributional mismatch, particularly when the target dataset is extremely limited. To address this, we propose Target-aligned Coverage Expansion (TCE), a framework that decides how source data should be used, either by directly incorporating target-near transitions or by expanding state coverage through target-aligned generation, guided by theoretical analys"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The dual score-based generative model can reliably synthesize target-consistent transitions over an expanded state region without introducing harmful distribution shifts.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Target-aligned Coverage Expansion uses dual score-based generation to synthesize consistent transitions across domains in offline RL.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c8d70b2ef93b1affabb15c1fb55965232db91b99c8d33e75689d7a4e8a7023eb"},"source":{"id":"2605.13054","kind":"arxiv","version":1},"verdict":{"id":"84324f6e-93af-4db6-9394-a17840a82f72","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:09:23.535852Z","strongest_claim":"TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.","one_line_summary":"TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The dual score-based generative model can reliably synthesize target-consistent transitions over an expanded state region without introducing harmful distribution shifts.","pith_extraction_headline":"Target-aligned Coverage Expansion uses dual score-based generation to synthesize consistent transitions across domains in offline RL."},"references":{"count":43,"sample":[{"doi":"","year":2018,"title":"arXiv preprint arXiv:1805.12298 , year=","work_id":"74f189b3-0db6-4c8d-85b3-78d3c52e0acc","ref_index":1,"cited_arxiv_id":"1805.12298","is_internal_anchor":true},{"doi":"","year":2020,"title":"A survey of au- tonomous driving: Common practices and emerging technologies.IEEE access, 8:58443–58469, 2020","work_id":"65b34a9c-1054-4051-be1f-225b0123a814","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Off-dynamics reinforcement learning: Training for transfer with domain classifiers","work_id":"94cb89e3-1520-40b5-8696-e21c9c2f3c5e","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Domain adaptive imitation learning","work_id":"229fdb1b-ca1a-4f01-8675-18f50c0ec393","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2005,"title":"Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems","work_id":"597b6f46-d60f-451f-8f34-7d32876a9014","ref_index":5,"cited_arxiv_id":"2005.01643","is_internal_anchor":true}],"resolved_work":43,"snapshot_sha256":"0502fe6a9f887af6e225f7d27db62ad807bcd105382fb0ac47d1c3ea946d55b6","internal_anchors":6},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}