pith. machine review for the scientific record. sign in

arxiv: 2604.23248 · v1 · submitted 2026-04-25 · 💻 cs.CR · cs.HC

Recognition: unknown

PrivacyAssist: A User-Centric Agent Framework for Detecting Privacy Inconsistencies in Android Apps

Barbara Carminati, Edoardo Di Tullio, Elena Ferrari, Tran Thanh Lam Nguyen

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:49 UTC · model grok-4.3

classification 💻 cs.CR cs.HC
keywords privacy inconsistenciesAndroid appsLLM agentsprivacy policiespermission analysisRAGuser privacy
0
0 comments X

The pith

PrivacyAssist uses multi-agent LLMs to detect when Android apps request permissions that contradict their own privacy policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrivacyAssist, a platform of LLM agents that compares the permissions a user grants to an Android app against the data collection and sharing practices declared in its privacy policy. It applies retrieval-augmented generation to produce short explanations of any mismatches and delivers real-time on-device warnings to help users decide whether to install the app. Evaluation on 2,347 apps shows that only 16 percent are fully consistent, while the remaining majority exhibit at least one gap between what the app asks for and what its policy admits.

Core claim

PrivacyAssist is a multi-agent LLM-based platform that detects inconsistencies between user-granted permissions and developers' declared sensitive data collection and sharing practices, providing concise explanations via RAG and real-time on-device warnings.

What carries the argument

Multi-agent LLM system with Retrieval-Augmented Generation that extracts permission lists and policy text then compares them for mismatches.

If this is right

  • Users obtain concrete, on-device warnings about privacy risks before choosing to install an app.
  • The majority of apps show at least one inconsistency between requested permissions and stated data practices.
  • The framework supports real-time enforcement of individual privacy preferences during installation decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent approach could be extended to scan apps at the store level before they reach users.
  • Widespread adoption might create market pressure for developers to align policies more closely with actual behavior.
  • Similar inconsistency checks could be applied to other mobile platforms or to web services that request broad permissions.

Load-bearing premise

The LLM agents can accurately extract and compare sensitive data practices from privacy policies and permission lists without significant errors or hallucinations.

What would settle it

A manual audit of a sample of apps where the system reports inconsistencies but human reviewers find the policy and permissions actually match, or where the system reports consistency but reviewers find clear mismatches.

Figures

Figures reproduced from arXiv: 2604.23248 by Barbara Carminati, Edoardo Di Tullio, Elena Ferrari, Tran Thanh Lam Nguyen.

Figure 1
Figure 1. Figure 1: PrivacyAssist architecture declared behavior and identify the inconsistencies. Indeed, because the LLM relies heavily on training data, when that data becomes outdated, the LLM may make inaccurate inferences, especially given that the Android OS is updated annually. Therefore, to enhance the context and inference capabilities of the LLM-based AI agent, we use Retrieval-Augmented Generation (RAG),[13] a tec… view at source ↗
read the original abstract

Mobile apps offer significant benefits, but their privacy protections often remain ineffective and confusing for users. While prior work mainly analyzes app privacy vulnerabilities, few approaches help users understand, set, and enforce their privacy preferences. This paper presents PrivacyAssist, a multi-agent LLM-based platform that detects inconsistencies between user-granted permissions and developers' declared sensitive data collection and sharing practices. Using Retrieval-Augmented Generation (RAG), PrivacyAssist provides concise explanations and real-time on-device warnings to support informed installation decisions. We evaluate PrivacyAssist with 200 users and 2,347 Android apps, finding that only 16% of apps are fully consistent between granted permissions and declared data practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PrivacyAssist, a multi-agent LLM-based framework using Retrieval-Augmented Generation (RAG) to detect inconsistencies between user-granted permissions and developers' declared sensitive data collection/sharing practices in Android apps. It provides concise explanations and real-time on-device warnings to support user installation decisions. The evaluation involves 200 users and 2,347 Android apps, reporting that only 16% of apps are fully consistent between granted permissions and declared data practices.

Significance. If the central empirical result holds after proper validation, this work could meaningfully advance user-centric privacy tools by shifting focus from vulnerability analysis to practical, real-time inconsistency detection and user guidance. The multi-agent RAG approach offers a novel way to bridge permission lists and privacy policies, potentially improving informed consent in the Android ecosystem.

major comments (2)
  1. [Abstract] Abstract (evaluation paragraph): The headline claim that only 16% of 2,347 apps are fully consistent is presented without any description of the inconsistency detection methodology, error analysis, baseline comparisons, or how the 16% figure was computed. This prevents assessment of whether the result reflects actual developer practices or artifacts of the pipeline.
  2. [Evaluation] Evaluation section: The 16% consistency rate depends on the multi-agent RAG pipeline accurately extracting and comparing sensitive data practices from privacy policies and permission lists. No ground-truth validation is reported (e.g., manual audit of a random sample, inter-annotator agreement, or precision/recall against human-labeled data), leaving open the possibility that LLM hallucinations, omissions, or mis-mappings (such as conflating device identifiers with advertising IDs) systematically affect the result.
minor comments (1)
  1. [Abstract] Abstract: The description of the multi-agent platform and RAG usage could be expanded with one sentence on agent roles or retrieval mechanism to improve immediate clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects of clarity and validation in our evaluation. We agree that strengthening the presentation of the 16% consistency result and providing evidence of pipeline accuracy will improve the manuscript. Below we respond point by point and describe the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract (evaluation paragraph): The headline claim that only 16% of 2,347 apps are fully consistent is presented without any description of the inconsistency detection methodology, error analysis, baseline comparisons, or how the 16% figure was computed. This prevents assessment of whether the result reflects actual developer practices or artifacts of the pipeline.

    Authors: We agree that the abstract, being concise, does not convey the evaluation methodology or supporting analyses. The full pipeline (multi-agent RAG extraction from privacy policies and permission lists, comparison logic, and computation of the consistency rate) is described in Sections 3 and 5, with error analysis and baseline comparisons in Section 5.3. In the revised version we will expand the abstract's evaluation paragraph to include one sentence summarizing the detection approach, note that the 16% figure is obtained by strict matching across all sensitive data categories, and reference the error analysis and baselines reported in the evaluation section. revision: yes

  2. Referee: [Evaluation] Evaluation section: The 16% consistency rate depends on the multi-agent RAG pipeline accurately extracting and comparing sensitive data practices from privacy policies and permission lists. No ground-truth validation is reported (e.g., manual audit of a random sample, inter-annotator agreement, or precision/recall against human-labeled data), leaving open the possibility that LLM hallucinations, omissions, or mis-mappings (such as conflating device identifiers with advertising IDs) systematically affect the result.

    Authors: We acknowledge that the current manuscript does not report a dedicated ground-truth validation study of the extraction and mapping accuracy. While Section 5.3 presents an internal consistency check and qualitative error examples, it lacks quantitative human-labeled metrics. In the revision we will add a new subsection describing a manual audit of a random sample of 100 apps. Two independent annotators will label the sensitive data practices mentioned in each privacy policy; we will compute precision, recall, and F1 against the system's output, report inter-annotator agreement (Cohen's kappa), and explicitly discuss potential mis-mappings such as device identifiers versus advertising IDs. Any systematic errors identified will be quantified and mitigated in the pipeline description. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of LLM agent framework

full rationale

The paper presents PrivacyAssist as a multi-agent RAG system and reports a direct empirical count (16% consistency across 2,347 apps) from running the described pipeline on real apps and users. No equations, derivations, predictions, or first-principles results exist that could reduce to fitted inputs or self-referential definitions. The central finding is an observation produced by applying the system, not a quantity forced by construction from prior steps or self-citations. Self-citations to prior privacy work, if present, are not load-bearing for the reported percentage.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.0 · 5419 in / 1006 out tokens · 31151 ms · 2026-05-08T07:49:05.660355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Ali Alkinoon. 2025. A Comprehensive Analysis of Evolving Permission Usage in Android Apps: Trends, Threats, and Ecosystem Insights.arXiv preprint arXiv:2508.02008(2025)

  2. [2]

    Thomas Cory. 2026. Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models.Proceedings on Privacy Enhancing Technologies(2026)

  3. [3]

    Tuğçe Karayel. 2025. Understanding smartphone security behavior through the core constructs of protection motivation theory: A comparative study of iOS and android users.Computers & Security158 (2025), 104652

  4. [4]

    Noam Kolt. 2025. Governing AI Agents. InProceedings of the Notre Dame Law Review. University of Notre Dame. Forthcoming

  5. [5]

    Fenghua Li. 2020. Exploiting location-related behaviors without the GPS data on smartphones.Information sciences527 (2020), 444–459

  6. [6]

    Shervin Minaee. 2024. Large language models: A survey.arXiv preprint arXiv:2402.06196(2024)

  7. [7]

    Tran Thanh Lam Nguyen, Barbara Carminati, and Elena Ferrari. 2025. Detecting privacy non-compliance in wearable apps via knowledge graphs and LLMs. In 2025 21th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE, 1–7

  8. [8]

    Tran Thanh Lam Nguyen, Barbara Carminati, and Elena Ferrari. 2025. LLMs on Support of Privacy and Security of Mobile Apps: State-of-the-art and Research Directions.AI for Cybersecurity: Research and Practice(2025), 29–66

  9. [9]

    Tran Thanh Lam Nguyen, Barbara Carminati, and Elena Ferrari. 2026. ALIBIS: Assessing and mitigating the risk of sensitive metadata Leakage In moBile Image Sharing.Pervasive and Mobile Computing(2026), 102171

  10. [10]

    Shidong Pan. 2024. A {NEW} {HOPE}: Contextual privacy policies for mobile applications and an approach toward automated generation. In33rd USENIX Security Symposium (USENIX Security 24). 5699–5716

  11. [11]

    Francesco Piccialli. 2025. AgentAI: A comprehensive survey on autonomous agents in distributed AI for industry 4.0.Expert Systems with Applications291 (2025), 128404

  12. [12]

    I do (not) need that Feature!

    Sebastian Prange. 2024. “I do (not) need that Feature!” Understanding Users’ Awareness and Control of Privacy Permissions on Android Smartphones. In 15https://developer.android.com/studio/profile 16The 1.5-minute latency is measured from app installation to the display of the warning during the first run, when PrivacyAssist does not have the app’s DS. In ...

  13. [13]

    Kunal Sawarkar. 2024. Blended rag: Improving rag (retriever-augmented generation) accuracy with semantic search and hybrid query-based retrievers. In 2024 IEEE 7th international conference on multimedia information processing and retrieval (MIPR). IEEE, 155–161

  14. [14]

    Guozhu Tu. 2020. Intelligent Analysis of Android Application Privacy Policy and Permission Consistency. InProceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE