{"paper":{"title":"JARVIS: A Just-in-Time Augmented Reality VLM-Powered Instruction System for Cross-Reality Task Guidance","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"JARVIS uses one vision-language model prompt to deliver real-time AR step-by-step guidance for tasks that combine physical objects and virtual elements.","cross_cats":[],"primary_cat":"cs.HC","authors_text":"Chenfanfu Jiang, Jiayin Lu, Ying Jiang, Yin Yang, Yong-Hong Kuo, Yusi Sun","submitted_at":"2026-04-11T09:00:54Z","abstract_excerpt":"Many everyday tasks rely on external tutorials such as manuals and videos, requiring users to constantly switch between reading instructions and performing actions, which disrupts workflow and increases cognitive load. Augmented reality (AR) enables in-situ guidance, while recent advances in large language models (LLMs) and vision-language models (VLMs) make it possible to automatically generate such guidance. However, existing AI-powered AR tutorial systems primarily focus on physical procedural tasks and provide limited support for hybrid physical and virtual workspaces. To address this gap,"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"A within-subjects study (N=14) across four domains shows JARVIS improves usability, workload, success rate, and visualization effectiveness over baselines.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a single-prompt VLM can reliably generate accurate, context-aware step-by-step guidance and perform real-time state verification across diverse cross-reality scenarios without frequent errors or hallucinations.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"JARVIS provides just-in-time AR instructions for cross-reality tasks using VLMs, with a user study showing gains in usability, workload, success rate, and visualization over baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"JARVIS uses one vision-language model prompt to deliver real-time AR step-by-step guidance for tasks that combine physical objects and virtual elements.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"70f7d6976bb6b0f74c29ae700ce5cbc5f08a676ec127ff5ba2f00f33cd3a72a3"},"source":{"id":"2604.10108","kind":"arxiv","version":3},"verdict":{"id":"22e9725c-1321-45db-b0b9-7c57d87d9115","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T16:16:09.925678Z","strongest_claim":"A within-subjects study (N=14) across four domains shows JARVIS improves usability, workload, success rate, and visualization effectiveness over baselines.","one_line_summary":"JARVIS provides just-in-time AR instructions for cross-reality tasks using VLMs, with a user study showing gains in usability, workload, success rate, and visualization over baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a single-prompt VLM can reliably generate accurate, context-aware step-by-step guidance and perform real-time state verification across diverse cross-reality scenarios without frequent errors or hallucinations.","pith_extraction_headline":"JARVIS uses one vision-language model prompt to deliver real-time AR step-by-step guidance for tasks that combine physical objects and virtual elements."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.10108/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}