PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
citing papers explorer
-
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
-
Mobile GUI Agents under Real-world Threats: Are We There Yet?
Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
- MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents