hub

Mobile-agent-e: Self-evolving mobile assistant for complex tasks

· 2025 · arXiv 2501.11733

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SkeMex distills agent trajectories into value-aware skills organized in general/task/action branches and evolves them via a closed-loop Read-Write-Assess-Govern process, outperforming prior memory agents on clinical tasks.

Rethinking Side-Channel Analysis: Automated Discovery and Analysis of Side-Channel Leakage with LLM-Assisted Agents

cs.CR · 2026-05-17 · unverdicted · novelty 7.0

SCAgent automates side-channel leakage discovery via LLM agents for target identification and few-shot foundation models for scalable analysis on iOS.

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

cs.AI · 2026-02-25 · conditional · novelty 7.0

ProactiveMobile is a new benchmark for proactive mobile agents that tests latent intent inference from context and executable API generation, where a fine-tuned 7B model reaches 19.15% success versus 15.71% for o1 and 7.39% for GPT-5.

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

cs.AI · 2025-09-08 · conditional · novelty 7.0

MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.

CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

CAPED reduces incidental visual privacy leakage in mobile GUI agents from 0.766 to 0.268 on seeded AndroidWorld tasks by selectively exposing only task-relevant screen content.

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

EpiEvolve achieves 0.629 accuracy in streaming COVID-19 forecasting by using episodic memory, reflection on delayed labels, and regime-aware retrieval, outperforming static LLMs (0.561) and CDC ensembles (0.325) while halving recovery lag after regime shifts.

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

ToolCUA introduces a trajectory scaling pipeline and staged RL to optimize GUI-tool switching, reaching 46.85% accuracy on OSWorld-MCP for a 66% relative gain over baseline.

LDMDroid: Leveraging LLMs for Detecting Data Manipulation Errors in Android Apps

cs.SE · 2026-04-01 · conditional · novelty 6.0

LDMDroid applies LLMs in a state-aware process to trigger data manipulation functions and uses visual cues to detect errors, finding 17 bugs across 24 Android apps with 14 developer confirmations.

RISK: A Framework for GUI Agents in E-commerce Risk Management

cs.AI · 2025-09-26 · unverdicted · novelty 6.0

RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

cs.AI · 2025-06-25 · unverdicted · novelty 6.0

Mobile-R1 introduces a hierarchical three-stage curriculum that combines format alignment, verifiable action feedback, and multi-turn environment training to improve exploration and self-correction in VLM-based mobile agents, plus a new Chinese GUI dataset and benchmark.

Exploring the Secondary Risks of Large Language Models

cs.LG · 2025-06-14 · unverdicted · novelty 6.0

Introduces secondary risks as a new class of LLM failures from benign prompts, defines two primitives, proposes SecLens search framework, and releases SecRiskBench showing risks are widespread across 16 models.

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.

Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

cs.AI · 2025-08-27 · unverdicted · novelty 5.0

InquireMobile applies two-stage reinforcement fine-tuning and pre-action reasoning to VLM mobile agents, raising inquiry success rate by 46.8% on the introduced InquireBench benchmark.

How Mobile World Model Guides GUI Agents?

cs.AI · 2026-05-11 · unverdicted · novelty 4.0 · 2 refs

World models trained on delta text, full text, diffusion images, and renderable code achieve SoTA on two benchmarks and improve downstream GUI agent performance on three mobile datasets with modality-specific strengths.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States

cs.CV · 2026-06-30

MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents

cs.AI · 2025-12-14

citing papers explorer

Showing 19 of 19 citing papers.

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory cs.AI · 2026-06-08 · unverdicted · none · ref 68
SkeMex distills agent trajectories into value-aware skills organized in general/task/action branches and evolves them via a closed-loop Read-Write-Assess-Govern process, outperforming prior memory agents on clinical tasks.
Rethinking Side-Channel Analysis: Automated Discovery and Analysis of Side-Channel Leakage with LLM-Assisted Agents cs.CR · 2026-05-17 · unverdicted · none · ref 46
SCAgent automates side-channel leakage discovery via LLM agents for target identification and few-shot foundation models for scalable analysis on iOS.
RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management cs.AI · 2026-04-15 · unverdicted · none · ref 49
RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.
ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices cs.AI · 2026-02-25 · conditional · none · ref 39
ProactiveMobile is a new benchmark for proactive mobile agents that tests latent intent inference from context and executable API generation, where a fine-tuned 7B model reaches 19.15% success versus 15.71% for o1 and 7.39% for GPT-5.
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents cs.AI · 2025-09-08 · conditional · none · ref 33
MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.
CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents cs.CR · 2026-06-10 · unverdicted · none · ref 33
CAPED reduces incidental visual privacy leakage in mobile GUI agents from 0.766 to 0.268 on seeded AndroidWorld tasks by selectively exposing only task-relevant screen content.
EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts cs.AI · 2026-06-03 · unverdicted · none · ref 22
EpiEvolve achieves 0.629 accuracy in streaming COVID-19 forecasting by using episodic memory, reflection on delayed labels, and regime-aware retrieval, outperforming static LLMs (0.561) and CDC ensembles (0.325) while halving recovery lag after regime shifts.
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents cs.AI · 2026-05-12 · unverdicted · none · ref 43
ToolCUA introduces a trajectory scaling pipeline and staged RL to optimize GUI-tool switching, reaching 46.85% accuracy on OSWorld-MCP for a 66% relative gain over baseline.
LDMDroid: Leveraging LLMs for Detecting Data Manipulation Errors in Android Apps cs.SE · 2026-04-01 · conditional · none · ref 49
LDMDroid applies LLMs in a state-aware process to trigger data manipulation functions and uses visual cues to detect errors, finding 17 bugs across 24 Android apps with 14 developer confirmations.
RISK: A Framework for GUI Agents in E-commerce Risk Management cs.AI · 2025-09-26 · unverdicted · none · ref 21
RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.
Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training cs.AI · 2025-06-25 · unverdicted · none · ref 19
Mobile-R1 introduces a hierarchical three-stage curriculum that combines format alignment, verifiable action feedback, and multi-turn environment training to improve exploration and self-correction in VLM-based mobile agents, plus a new Chinese GUI dataset and benchmark.
Exploring the Secondary Risks of Large Language Models cs.LG · 2025-06-14 · unverdicted · none · ref 46
Introduces secondary risks as a new class of LLM failures from benign prompts, defines two primitives, proposes SecLens search framework, and releases SecRiskBench showing risks are widespread across 16 models.
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces cs.CL · 2026-06-05 · unverdicted · none · ref 112
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents cs.CL · 2026-05-27 · unverdicted · none · ref 26
Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.
InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning cs.AI · 2025-08-27 · unverdicted · none · ref 20
InquireMobile applies two-stage reinforcement fine-tuning and pre-action reasoning to VLM mobile agents, raising inquiry success rate by 46.8% on the introduced InquireBench benchmark.
How Mobile World Model Guides GUI Agents? cs.AI · 2026-05-11 · unverdicted · none · ref 32 · 2 links
World models trained on delta text, full text, diffusion images, and renderable code achieve SoTA on two benchmarks and improve downstream GUI agent performance on three mobile datasets with modality-specific strengths.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 166
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States cs.CV · 2026-06-30 · unreviewed · ref 75
MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents cs.AI · 2025-12-14 · unreviewed · ref 35

Mobile-agent-e: Self-evolving mobile assistant for complex tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer