{"paper":{"title":"Test-Time Training for Visual Foresight Vision-Language-Action Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Test-time training on predicted future images and real observations lets visual foresight VLA models adapt to out-of-distribution shifts without any architecture changes.","cross_cats":["cs.LG","cs.RO"],"primary_cat":"cs.CV","authors_text":"Chanyoung Park, Hongseok Kang, Sangwu Park, Sein Kim, Wonjoong Kim, Yeonjun In","submitted_at":"2026-05-06T11:21:25Z","abstract_excerpt":"Visual Foresight VLA (VF-VLA) has become a prominent architectural choice in the recent VLA due to its impressive performance. Nevertheless, the inherent design of VF-VLA makes it particularly vulnerable to out-of-distribution (OOD) shifts. Because the quality of action directly depends on the accuracy of the predicted future visual information, OOD conditions affect both stages at once. To address this vulnerability, we propose Test-Time Training Visual Foresight VLA ($T^3$VF), a test-time training approach motivated by the observation that the predicted future image and its subsequent observ"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Empirically, T³VF mitigates the OOD vulnerability of VF-VLA at a modest additional inference cost, without requiring any architectural modification or auxiliary modules.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the predicted future image and its subsequent real observation form a reliable, low-noise supervision signal even under OOD shifts, and that the adaptive update filter can consistently separate useful from harmful updates without additional labeled data or validation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"T³VF applies test-time training with adaptive filtering to reduce OOD failures in VF-VLA models by treating predicted future images and actual next observations as natural training pairs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Test-time training on predicted future images and real observations lets visual foresight VLA models adapt to out-of-distribution shifts without any architecture changes.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"03e6a2bc2cda70985c96fe31306ae947abefe6800b350b34acf4fa57fdc03396"},"source":{"id":"2605.08215","kind":"arxiv","version":2},"verdict":{"id":"7a17b54a-d467-4f37-9859-71189ce7ba91","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T01:29:16.265354Z","strongest_claim":"Empirically, T³VF mitigates the OOD vulnerability of VF-VLA at a modest additional inference cost, without requiring any architectural modification or auxiliary modules.","one_line_summary":"T³VF applies test-time training with adaptive filtering to reduce OOD failures in VF-VLA models by treating predicted future images and actual next observations as natural training pairs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the predicted future image and its subsequent real observation form a reliable, low-noise supervision signal even under OOD shifts, and that the adaptive update filter can consistently separate useful from harmful updates without additional labeled data or validation.","pith_extraction_headline":"Test-time training on predicted future images and real observations lets visual foresight VLA models adapt to out-of-distribution shifts without any architecture changes."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.08215/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-20T11:34:34.583339Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T22:01:29.148440Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T14:11:03.494705Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"1d78c8832bb0a95ca967d2d02e67d4c624b9a429f5e52cfe074450ba80447c55"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}