arxiv: 2604.23141 · v1 · submitted 2026-04-25 · 💻 cs.CR · cs.AI

Recognition: unknown

UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

Tianlong Yu , Yang Yang , Xiao Luo , Lihong Liu , Fudu Xing , Zui Tao , Kailong Wang , Gaoyang Liu

show 1 more author

Ting Bi

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:09 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords socialcontrolengineeringaccesstargetunseenadaptiveaddress

0 comments

The pith

UNSEEN combines AR access control, LLM unlearning to suppress profiles, and agent guardrails to defend against AR-LLM social engineering attacks, tested in a 60-person user study with 360 conversations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Attackers could soon use AR glasses to secretly record what you look like and say, feed that to an AI that figures out your identity and background, then have another AI suggest exactly what to say to gain your trust and trick you into sharing private information or money. Standard protections like passwords or data tracking do not work well here because the glasses are tiny computers and the AI reasoning is hidden inside large models. UNSEEN tries to fix this at three layers. First, the AR device only turns on its camera and microphone after checking that the user has permission for that specific person. Second, the LLM is trained to forget or block details that would let it build a useful social profile of the target. Third, the conversation-suggesting AI agents are given real-time rules that block suggestions aimed at manipulation. The authors tested the whole system with 60 volunteers in realistic social settings and collected 360 labeled conversations to measure whether the defense reduced successful attacks.

Core claim

we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.

Load-bearing premise

That F-RMU unlearning can reliably suppress sensitive profile information inside opaque LLM inference while preserving utility, and that runtime guardrails can adaptively block evolving social-engineering strategies without being bypassed by new prompts.

Figures

Figures reproduced from arXiv: 2604.23141 by Fudu Xing, Gaoyang Liu, Kailong Wang, Lihong Liu, Tianlong Yu, Ting Bi, Xiao Luo, Yang Yang, Zui Tao.

**Figure 1.** Figure 1: AR-LLM Social Engineering attacks (above) and how UNSEEN can prevent it (below). view at source ↗

**Figure 2.** Figure 2: F-RMU framework. • Challenge 2: Securing resource-constrained AR devices. AR endpoints must provide robust capture-time sensor governance under strict latency and compute constraints. • Challenge 3: Governing adaptive interactive agents. Runtime control of adaptive multi-turn responses is needed to prevent policy evasion and malicious SE strategies. To address these challenges, as shown in view at source ↗

**Figure 3.** Figure 3: Visualization of Neuron Importance via Fisher In view at source ↗

**Figure 6.** Figure 6: Comparison of subjective-experiences. the average effectiveness score by 76.3%. The ablation gaps also quantify each module’s contribution: +0.48 after removing Agent Guardrail, +0.58 after removing LLM Unlearn, and +1.10 after removing AR ACL (all relative to full UNSEEN). Overall, the ablation study confirms that UNSEEN’s defense capability is not produced by a single component; it emerges from the coor… view at source ↗

read the original abstract

Emerging AR-LLM-based Social Engineering attack (e.g., SEAR) is at the edge of posing great threats to real-world social life. In such AR-LLM-SE attack, the attacker can leverage AR (Augmented Reality) glass to capture the image and vocal information of the target, using the LLM to identify the target and generate the social profile, using the LLM agents to apply social engineering strategies for conversation suggestion to win the target trust and perform phishing afterwards. Current defensive approaches, such as role-based access control or data flow tracking, are not directly applicable to the convergent AR-LLM ecosystem (considering embedded AR device and opaque LLM inference), leaving an emerging and potent social engineering threat that existing privacy paradigms are ill-equipped to address. This necessitates a shift beyond solely human-centric measures like legislation and user education toward enforceable vendor policies and platform-level restrictions. Realizing this vision, however, faces significant technical challenges: securing resource-constrained AR-embedded devices, implementing fine-grained access control within opaque LLM inferences, and governing adaptive interactive agents. To address these challenges, we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UNSEEN puts together AR access control, F-RMU unlearning, and agent guardrails for a specific social-engineering threat, but the evaluation details are missing so the core claims stay untested.

read the letter

The main point is a three-layer system for stopping AR-LLM social engineering attacks. An attacker uses AR glasses to grab images and voice, builds a profile with an LLM, then runs an agent to steer conversations toward phishing. UNSEEN adds identity-gated sensing on the AR side, unlearning to strip sensitive profile facts from the model, and runtime guardrails on the agent to limit bad suggestions. That specific stack for this threat model is not in the prior work cited in the abstract.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes UNSEEN, a cross-stack defense against emerging AR-LLM social engineering attacks (e.g., SEAR). It integrates an AR Access Control Layer for identity-gated sensing on resource-constrained devices, F-RMU-based LLM unlearning to suppress sensitive user profiles within opaque inference, and runtime agent guardrails for adaptive interaction control. The central claim is that this coordinated approach addresses gaps in existing privacy paradigms and is evaluated via an IRB-approved user study involving 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.

Significance. If the evaluation demonstrates that the combined mechanisms reduce successful social engineering outcomes while preserving LLM utility and without introducing bypassable leakage, the work could offer a practical vendor-level response to a novel convergent threat in AR-LLM ecosystems. The cross-stack design and empirical user-study framing are strengths, but the absence of reported quantitative metrics, baselines, or component ablations in the current description prevents assessment of whether the result holds or advances the state of the art.

major comments (3)

[Evaluation] Evaluation section: The abstract and summary claim an IRB-approved study with 60 participants and 360 annotated conversations, yet no quantitative metrics (attack success rates, precision/recall on profile suppression, baseline comparisons against role-based access control or data-flow tracking, ablation results isolating AR ACL vs. F-RMU vs. guardrails, or statistical tests) are provided. This directly prevents verification of the central claim that UNSEEN constitutes an effective defense.
[§3.2] §3.2 (F-RMU unlearning description): The premise that F-RMU reliably suppresses target-specific profile knowledge inside black-box LLM inference (preventing reconstruction via multi-turn context or indirect elicitation) is load-bearing for the overall defense but is not supported by any leakage tests, residual-knowledge probes, or adversarial prompt evaluations. The user study measures only human-facing outcomes and does not isolate internal leakage or guardrail bypass.
[§4] §4 (runtime agent guardrails): The claim that adaptive guardrails can block evolving social-engineering strategies without being bypassed is not accompanied by any formalization of the guardrail policy, coverage analysis against prompt variants, or failure-case enumeration. This leaves the adaptive-interaction-control component unverified.

minor comments (2)

[Abstract] The abstract states '360 annotated conversations' but does not specify annotation criteria, inter-annotator agreement, or how conversations were sampled across scenarios; this should be clarified for reproducibility.
[§3.2] Notation for F-RMU is introduced without an explicit equation or pseudocode definition of the unlearning objective or forgetting set construction; a formal definition would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and for noting the strengths of the cross-stack design and user-study framing. We agree that the evaluation requires more explicit quantitative reporting, baselines, ablations, and component-specific analyses to fully substantiate the claims. We address each major comment below and will incorporate the requested additions in the revised manuscript.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The abstract and summary claim an IRB-approved study with 60 participants and 360 annotated conversations, yet no quantitative metrics (attack success rates, precision/recall on profile suppression, baseline comparisons against role-based access control or data-flow tracking, ablation results isolating AR ACL vs. F-RMU vs. guardrails, or statistical tests) are provided. This directly prevents verification of the central claim that UNSEEN constitutes an effective defense.

Authors: We acknowledge that the current Evaluation section would benefit from more explicit quantitative reporting to allow direct verification. The 60-participant IRB-approved study and 360 annotated conversations were conducted to assess human-facing outcomes in realistic social scenarios. In the revised manuscript we will expand this section to report attack success rates, precision/recall metrics for profile suppression, baseline comparisons (including role-based access control and data-flow tracking), component ablation results, and statistical tests (e.g., paired t-tests or ANOVA with p-values). These will be derived from re-analysis of the existing annotated dataset. revision: yes
Referee: [§3.2] §3.2 (F-RMU unlearning description): The premise that F-RMU reliably suppresses target-specific profile knowledge inside black-box LLM inference (preventing reconstruction via multi-turn context or indirect elicitation) is load-bearing for the overall defense but is not supported by any leakage tests, residual-knowledge probes, or adversarial prompt evaluations. The user study measures only human-facing outcomes and does not isolate internal leakage or guardrail bypass.

Authors: The referee correctly notes that direct evidence for the internal suppression properties of F-RMU is necessary. While the user study provides supporting evidence via reduced social-engineering success rates, it does not isolate leakage. We will add a dedicated subsection describing leakage tests, residual-knowledge probes using adversarial and multi-turn prompts, and evaluations of reconstruction attempts. These additions will quantify the unlearning component's contribution and address potential bypass vectors. revision: yes
Referee: [§4] §4 (runtime agent guardrails): The claim that adaptive guardrails can block evolving social-engineering strategies without being bypassed is not accompanied by any formalization of the guardrail policy, coverage analysis against prompt variants, or failure-case enumeration. This leaves the adaptive-interaction-control component unverified.

Authors: We agree that formalization and coverage analysis are required for verifiability of the guardrails. In the revised version we will include a formal policy description (as a decision procedure or rule set), coverage analysis over prompt variants and evolving strategies drawn from the conversation dataset, and an enumeration of failure cases with corresponding mitigations. This will be supported by additional analysis of the 360 conversations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical system design with no derivation chain

full rationale

The paper presents UNSEEN as a coordinated system combining AR ACL for identity-gated sensing, F-RMU-based LLM unlearning for profile suppression, and runtime agent guardrails, evaluated via an IRB-approved user study with 60 participants and 360 annotated conversations. No equations, parameter fitting, or mathematical derivation steps appear in the provided text. Claims rest on the empirical outcomes of the study rather than any reduction by construction, self-definition, or load-bearing self-citation of a uniqueness theorem. F-RMU is referenced as a component but not invoked to derive the overall result tautologically; the work is self-contained against external benchmarks through the described human-subject evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on domain assumptions about the feasibility of selective unlearning and adaptive guardrails rather than new mathematical entities or fitted constants.

axioms (2)

domain assumption F-RMU unlearning can suppress sensitive profile information in LLM inference without destroying general capabilities
Invoked to justify the LLM unlearning component for profile suppression.
domain assumption Runtime guardrails can detect and block adaptive social-engineering strategies in real time
Required for the agent interaction control layer.

pith-pipeline@v0.9.0 · 5589 in / 1435 out tokens · 73552 ms · 2026-05-08T08:09:31.891461+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Afane, W

K. Afane, W. Wei, Y. Mao, J. Farooq, and J. Chen. Next-generation phishing: How llm agents empower cyber attackers. In2024 IEEE International Conference on Big Data (BigData), pages 2558–2567. IEEE, 2024

2024
[2]

Bazarevsky, Y

V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann. Blazeface: Sub-millisecond neural face detection on mobile gpus.arXiv preprint arXiv:1907.05047, 2019

work page arXiv 1907
[3]

T. Bi, C. Ye, Z. Yang, Z. Zhou, C. Tang, Z. Tao, J. Zhang, K. Wang, L. Zhou, Y. Yang, and T. Yu. On the feasibility of using multimodal LLMs to execute AR social engineering attacks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 38252–38260, 2026

2026
[4]

Bilge, T

L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda. All your contacts are belong to us: automated identity theft attacks on social networks. InProceedings of the 18th international conference on World wide web, pages 551–560, 2009

2009
[5]

Burda, L

P. Burda, L. Allodi, and N. Zannone. Cognition in social engineering empirical research: a systematic literature review.ACM Transactions on Computer-Human Interaction, 31(2):1–55, 2024

2024
[6]

S. Chen, Z. Li, F. Dangelo, C. Gao, and X. Fu. A case study of security and privacy threats from augmented reality (ar). In2018 international conference on computing, networking and communications (ICNC), pages 442–446. IEEE, 2018

2018
[7]

S. Chen, Y. Liu, X. Gao, and Z. Han. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. InChinese conference on biometric recognition, pages 428–438. Springer, 2018

2018
[8]

Z. Chen, Z. Zhao, W. Qu, Z. Wen, Z. Han, Z. Zhu, J. Zhang, and H. Yao. Pandora: Detailed llm jailbreaking via collaborated phishing agents with decomposed reasoning. InICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 2024

2024
[9]

L. Choo. How 2 students used the meta ray-bans to access personal informa- tion. https://www.forbes.com/sites/lindseychoo/2024/10/04/meta-ray-bans-ai- privacy-surveillance/, 2025

2024
[10]

P. V. Falade. Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks.arXiv preprint arXiv:2310.05595, 2023

work page arXiv 2023
[11]

A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions demystified. InProceedings of the 18th ACM Conference on Computer and Com- munications Security, pages 627–638, 2011

2011
[12]

Fernandes, J

E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, and A. Prakash. Flowfence: Practical data protection for emerging iot application frameworks. InUSENIX Security Symposium, 2016

2016
[13]

Fuste and C

A. Fuste and C. Schmandt. Artextiles: Promoting social interactions around personal interests through augmented reality. InProceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pages 470–470, 2017

2017
[14]

Geng, S.-j

C. Geng, S.-j. Huang, and S. Chen. A survey on open set recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3614–3631, 2021

2021
[15]

W. He, M. Golla, R. Padhi, J. Ofek, M. Dürmuth, E. Fernandes, and B. Ur. Re- thinking access control and authentication for the home internet of things (iot). In27th {USENIX} Security Symposium ( {USENIX} Security 18), pages 255–272, 2018

2018
[16]

Hirskyj-Douglas, A

I. Hirskyj-Douglas, A. Kantosalo, A. Monroy-Hernández, J. Zimmermann, M. Nebeling, and M. Gonzalez-Franco. Social ar: Reimagining and interrogating the role of augmented reality in face to face social interactions. InCompanion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing, pages 457–465, 2020

2020
[17]

G. Ho, A. Cidon, L. Gavish, M. Schweighauser, V. Paxson, S. Savage, G. M. Voelker, and D. Wagner. Detecting and characterizing lateral phishing at scale. In28th USENIX security symposium (USENIX security 19), pages 1273–1290, 2019

2019
[18]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

2022
[19]

Editing Models with Task Arithmetic

G. Ilharco, M. T. Ribeiro, M. Wortsman, S. Gururangan, L. Schmidt, H. Ha- jishirzi, and A. Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022

work page internal anchor Pith review arXiv 2022
[20]

M. Z. Iqbal and A. G. Campbell. Adopting smart glasses responsibly: potential benefits, ethical, and privacy concerns with ray-ban stories.AI and Ethics, 3(1):325–327, 2023

2023
[21]

Jansen and F

P. Jansen and F. Fischbach. The social engineer: An immersive virtual reality educational game to raise social engineering awareness. InExtended Abstracts of the 2020 Annual Symposium on Computer-Human Interaction in Play, pages 59–63, 2020

2020
[22]

Y. J. Jia, Q. A. Chen, S. Wang, A. Rahmati, E. Fernandes, Z. M. Mao, A. Prakash, and S. J. Unviersity. Contexiot: Towards providing contextual integrity to appified iot platforms. InProceedings of The Network and Distributed System Security Symposium, volume 2017, 2017

2017
[23]

S. M. Lehman, A. S. Alrumayh, K. Kolhe, H. Ling, and C. C. Tan. Hidden in plain sight: Exploring privacy risks of mobile augmented reality applications.ACM Transactions on Privacy and Security, 25(4):1–35, 2022

2022
[24]

C. Li, G. Wu, G. Y.-Y. Chan, D. G. Turakhia, S. C. Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant with belief-desire-intention Tianlong Yu1, Yang Yang1, Xiao Luo1, Lihong Liu1, Fudu Xing2, Zui Tao3, Kailong Wang3, Gaoyang Liu3, Ting Bi3 user modeling.arXiv preprint arXiv:2410.16668, 2024

work page arXiv 2024
[25]

MediaPipe: A Framework for Building Perception Pipelines

C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, et al. Mediapipe: A framework for building perception pipelines.arXiv preprint arXiv:1906.08172, 2019

work page internal anchor Pith review arXiv 1906
[26]

K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359– 17372, 2022

2022
[27]

Roesner and T

F. Roesner and T. Kohno. Security and privacy for augmented reality: Our 10-year retrospective. InVR4Sec: 1st International Workshop on Security for XR and XR for Security, 2021

2021
[28]

S. S. Roy, P. Thota, K. V. Naragam, and S. Nilizadeh. From chatbots to phishbots?: Phishing scam generation in commercial large language models. In2024 IEEE Symposium on Security and Privacy (SP), pages 36–54. IEEE, 2024

2024
[29]

Y. Tian, N. Zhang, Y.-H. Lin, X. Wang, B. Ur, X. Guo, and P. Tague. Smartauth: User-centered authorization for the internet of things. InUSENIX Security Symposium, pages 361–378, 2017

2017
[30]

Timko, D

D. Timko, D. H. Castillo, and M. L. Rahman. Understanding influences on sms phishing detection: User behavior, demographics, and message attributes. 2025

2025
[31]

Tsai, S.-K

H.-R. Tsai, S.-K. Chiu, and B. Wang. Gazenoter: Co-piloted ar note-taking via gaze selection of llm suggestions to match users’ intentions.arXiv preprint arXiv:2407.01161, 2024

work page arXiv 2024
[32]

Ulqinaku, H

E. Ulqinaku, H. Assal, A. Abdou, S. Chiasson, and S. Capkun. Is real-time phishing eliminated with {FIDO}? social engineering downgrade attacks against {FIDO} protocols. In30th USENIX Security Symposium (USENIX Security 21), pages 3811–3828, 2021

2021
[33]

Vadrevu and R

P. Vadrevu and R. Perdisci. What you see is not what you get: Discovering and tracking social engineering attack campaigns. InProceedings of the Internet Measurement Conference, pages 308–321, 2019

2019
[34]

I. Wang, J. Smith, and J. Ruiz. Exploring virtual agents for augmented reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2019

2019
[35]

X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, and D. Xiong. Depn: Detecting and editing privacy neurons in pretrained language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2875–2886, 2023

2023
[36]

F. Xing, J. Liu, S. Chen, T. Yu, and Y. Yang. A continuous verification mechanism for ensuring client data forgetfulness in federated unlearning.Engineering Applications of Artificial Intelligence, 162:112553, 2025

2025
[37]

B. Yang, Y. Guo, L. Xu, Z. Yan, H. Chen, G. Xing, and X. Jiang. Socialmind: Llm-based proactive ar social assistive system with human-like perception for in-situ live interactions.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(1):1–30, 2025

2025
[38]

Z. Yang, J. Allen, M. Landen, R. Perdisci, and W. Lee. {TRIDENT}: Towards detecting and mitigating web-based social engineering attacks. In32nd USENIX Security Symposium (USENIX Security 23), pages 6701–6718, 2023

2023
[39]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

2023
[40]

Y. Yoon, J. Nam, H. Yun, J. Lee, D. Kim, and J. Ok. Few-shot unlearning. In2024 IEEE Symposium on Security and Privacy (SP), pages 3276–3292. IEEE, 2024

2024
[41]

T. Yu, C. Ye, Z. Yang, Z. Zhou, C. Tang, Z. Tao, J. Zhang, K. Wang, L. Zhou, Y. Yang, and T. Bi. Sear: A multimodal dataset for analyzing ar-llm-driven social engineering behaviors. InProceedings of the 33rd ACM International Conference on Multimedia, pages 12981–12987, 2025

2025
[42]

Zhang, K

G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi. Forget-me-not: Learning to forget in text-to-image diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024

2024