{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:767OSZZBNTVRWO3HULOKJCMUJS","short_pith_number":"pith:767OSZZB","schema_version":"1.0","canonical_sha256":"ffbee967216ceb1b3b67a2dca489944c8fcb3fd85f8bb2d6adfc48dca7d2cb64","source":{"kind":"arxiv","id":"2606.31410","version":1},"attestation_state":"computed","paper":{"title":"Xiaomi-GUI-0 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Anan Du, Changqiao Wu, Cheng Tan, Chengzhen Duan, Cong Zou, Fazhan Liu, Haoyuan Sun, Heng Qu, Hui Liu, Jiahui Yang, Jian Luan, Jian Zhu, Jiatong Sun, Niu Lian, Pei Fu, Pengzhi Gao, Qinzhuo Wu, Ruoceng Zhang, Shaojie Zhang, Shiqi Cui, Shukai Jia, Tao Xiong, Tongbo Chen, Wanxia Cao, Wenchao Lu, Yajie Wang, Yike Liu, Yuanfa Li, Yuxuan Yuan, Zhehao Yu","submitted_at":"2026-06-30T09:36:35Z","abstract_excerpt":"Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigation. However, existing GUI agents are trained and evaluated largely on offline trajectories, simulated environments, and standardized benchmarks. These differ substantially from real applications in interface layout, interaction logic, and abnormal-state distribution, and cannot faithfully characterize execution stability in real-world use, where account states, permission dialogs, payment authent"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2606.31410","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.AI","submitted_at":"2026-06-30T09:36:35Z","cross_cats_sorted":[],"title_canon_sha256":"f67a5f3884fb4cff0d7804f8992fef1520b3e534839676ed7a49521886343432","abstract_canon_sha256":"7f03f4e44a632a09e60d8b96fa00e23dbdcd830d67a38eea093ada0482af68c3"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-07-01T01:18:02.332233Z","signature_b64":"K3vk9TFHgFiHGn+4gAzVtUf9sbA70tgLARl1S8xB9QMAJxc4pjGA732kmTpjNVbuJOT+AuTNaSJkmb15+4PXBA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ffbee967216ceb1b3b67a2dca489944c8fcb3fd85f8bb2d6adfc48dca7d2cb64","last_reissued_at":"2026-07-01T01:18:02.330617Z","signature_status":"signed_v1","first_computed_at":"2026-07-01T01:18:02.330617Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Xiaomi-GUI-0 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Anan Du, Changqiao Wu, Cheng Tan, Chengzhen Duan, Cong Zou, Fazhan Liu, Haoyuan Sun, Heng Qu, Hui Liu, Jiahui Yang, Jian Luan, Jian Zhu, Jiatong Sun, Niu Lian, Pei Fu, Pengzhi Gao, Qinzhuo Wu, Ruoceng Zhang, Shaojie Zhang, Shiqi Cui, Shukai Jia, Tao Xiong, Tongbo Chen, Wanxia Cao, Wenchao Lu, Yajie Wang, Yike Liu, Yuanfa Li, Yuxuan Yuan, Zhehao Yu","submitted_at":"2026-06-30T09:36:35Z","abstract_excerpt":"Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigation. However, existing GUI agents are trained and evaluated largely on offline trajectories, simulated environments, and standardized benchmarks. These differ substantially from real applications in interface layout, interaction logic, and abnormal-state distribution, and cannot faithfully characterize execution stability in real-world use, where account states, permission dialogs, payment authent"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2606.31410","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2606.31410/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2606.31410","created_at":"2026-07-01T01:18:02.330708+00:00"},{"alias_kind":"arxiv_version","alias_value":"2606.31410v1","created_at":"2026-07-01T01:18:02.330708+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2606.31410","created_at":"2026-07-01T01:18:02.330708+00:00"},{"alias_kind":"pith_short_12","alias_value":"767OSZZBNTVR","created_at":"2026-07-01T01:18:02.330708+00:00"},{"alias_kind":"pith_short_16","alias_value":"767OSZZBNTVRWO3H","created_at":"2026-07-01T01:18:02.330708+00:00"},{"alias_kind":"pith_short_8","alias_value":"767OSZZB","created_at":"2026-07-01T01:18:02.330708+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS","json":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS.json","graph_json":"https://pith.science/api/pith-number/767OSZZBNTVRWO3HULOKJCMUJS/graph.json","events_json":"https://pith.science/api/pith-number/767OSZZBNTVRWO3HULOKJCMUJS/events.json","paper":"https://pith.science/paper/767OSZZB"},"agent_actions":{"view_html":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS","download_json":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS.json","view_paper":"https://pith.science/paper/767OSZZB","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2606.31410&json=true","fetch_graph":"https://pith.science/api/pith-number/767OSZZBNTVRWO3HULOKJCMUJS/graph.json","fetch_events":"https://pith.science/api/pith-number/767OSZZBNTVRWO3HULOKJCMUJS/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS/action/timestamp_anchor","attest_storage":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS/action/storage_attestation","attest_author":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS/action/author_attestation","sign_citation":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS/action/citation_signature","submit_replication":"https://pith.science/pith/767OSZZBNTVRWO3HULOKJCMUJS/action/replication_record"}},"created_at":"2026-07-01T01:18:02.330708+00:00","updated_at":"2026-07-01T01:18:02.330708+00:00"}