{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:W26MMSH22DKSAAL7JZ42243YPN","short_pith_number":"pith:W26MMSH2","schema_version":"1.0","canonical_sha256":"b6bcc648fad0d520017f4e79ad73787b6312132e589969e5894e14da2f8f1d6c","source":{"kind":"arxiv","id":"2512.11891","version":2},"attestation_state":"computed","paper":{"title":"VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.SY","eess.SY"],"primary_cat":"cs.RO","authors_text":"Jun Cen, Shihefeng Wang, Shuang Liu, Songqiao Hu, Xiang Li, Xiao He, Zeyi Liu, Zihan Meng","submitted_at":"2025-12-09T16:53:44Z","abstract_excerpt":"Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play safety constraint (SC) layer formulated via control barrier functions. AEGIS integrates directly with "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2512.11891","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.RO","submitted_at":"2025-12-09T16:53:44Z","cross_cats_sorted":["cs.SY","eess.SY"],"title_canon_sha256":"47956b1dd077ce874e6712c54b3f881d2b1d59fde79798ec20bcf9aa236cc6e1","abstract_canon_sha256":"865d3d832281fdced59885f088336188a758882d312da668f80c14b2bc471f6b"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-07-03T01:17:54.113476Z","signature_b64":"+wvSa+tGVnBCPOphJaUoAtHqzQ83G3vYWm8X5gwSPIEc6beo3rgKGPhzdMMOjNgqgEbEf6eX3lmX/9rKxKLsDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"b6bcc648fad0d520017f4e79ad73787b6312132e589969e5894e14da2f8f1d6c","last_reissued_at":"2026-07-03T01:17:54.112898Z","signature_status":"signed_v1","first_computed_at":"2026-07-03T01:17:54.112898Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.SY","eess.SY"],"primary_cat":"cs.RO","authors_text":"Jun Cen, Shihefeng Wang, Shuang Liu, Songqiao Hu, Xiang Li, Xiao He, Zeyi Liu, Zihan Meng","submitted_at":"2025-12-09T16:53:44Z","abstract_excerpt":"Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play safety constraint (SC) layer formulated via control barrier functions. AEGIS integrates directly with "},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2512.11891","kind":"arxiv","version":2},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2512.11891/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2512.11891","created_at":"2026-07-03T01:17:54.112976+00:00"},{"alias_kind":"arxiv_version","alias_value":"2512.11891v2","created_at":"2026-07-03T01:17:54.112976+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2512.11891","created_at":"2026-07-03T01:17:54.112976+00:00"},{"alias_kind":"pith_short_12","alias_value":"W26MMSH22DKS","created_at":"2026-07-03T01:17:54.112976+00:00"},{"alias_kind":"pith_short_16","alias_value":"W26MMSH22DKSAAL7","created_at":"2026-07-03T01:17:54.112976+00:00"},{"alias_kind":"pith_short_8","alias_value":"W26MMSH2","created_at":"2026-07-03T01:17:54.112976+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":14,"internal_anchor_count":3,"sample":[{"citing_arxiv_id":"2606.20698","citing_title":"SafeDojo: Safe Reinforcement Learning for VLA via Interactive World Model","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2606.12965","citing_title":"EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2606.09749","citing_title":"Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2606.03954","citing_title":"VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring","ref_index":17,"is_internal_anchor":false},{"citing_arxiv_id":"2605.12386","citing_title":"SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation","ref_index":5,"is_internal_anchor":false},{"citing_arxiv_id":"2606.23686","citing_title":"LIBERO-Safety: A Comprehensive Benchmark for Physical and Semantic Safety in Vision-Language-Action Models","ref_index":17,"is_internal_anchor":false},{"citing_arxiv_id":"2605.28726","citing_title":"How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures","ref_index":7,"is_internal_anchor":false},{"citing_arxiv_id":"2605.02900","citing_title":"Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses","ref_index":135,"is_internal_anchor":false},{"citing_arxiv_id":"2605.11750","citing_title":"DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies","ref_index":6,"is_internal_anchor":false},{"citing_arxiv_id":"2605.12386","citing_title":"SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation","ref_index":5,"is_internal_anchor":false},{"citing_arxiv_id":"2604.23775","citing_title":"Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms","ref_index":24,"is_internal_anchor":false},{"citing_arxiv_id":"2604.21192","citing_title":"How VLAs (Really) Work In Open-World Environments","ref_index":38,"is_internal_anchor":false},{"citing_arxiv_id":"2604.12447","citing_title":"HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models","ref_index":6,"is_internal_anchor":false},{"citing_arxiv_id":"2604.17896","citing_title":"Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study","ref_index":11,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN","json":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN.json","graph_json":"https://pith.science/api/pith-number/W26MMSH22DKSAAL7JZ42243YPN/graph.json","events_json":"https://pith.science/api/pith-number/W26MMSH22DKSAAL7JZ42243YPN/events.json","paper":"https://pith.science/paper/W26MMSH2"},"agent_actions":{"view_html":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN","download_json":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN.json","view_paper":"https://pith.science/paper/W26MMSH2","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2512.11891&json=true","fetch_graph":"https://pith.science/api/pith-number/W26MMSH22DKSAAL7JZ42243YPN/graph.json","fetch_events":"https://pith.science/api/pith-number/W26MMSH22DKSAAL7JZ42243YPN/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN/action/timestamp_anchor","attest_storage":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN/action/storage_attestation","attest_author":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN/action/author_attestation","sign_citation":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN/action/citation_signature","submit_replication":"https://pith.science/pith/W26MMSH22DKSAAL7JZ42243YPN/action/replication_record"}},"created_at":"2026-07-03T01:17:54.112976+00:00","updated_at":"2026-07-03T01:17:54.112976+00:00"}