Joint Agent Memory and Exploration Learning via Novelty Signals

Dawei Yin; Guohong Liu; Jiacheng Liu; Rui Kong; Shizuo Tian; Ting Cao; Xiaohong Weng; Yuanchun Li; Yuchen Li; Yuebing Song

arxiv: 2606.01528 · v1 · pith:X2EWQMQOnew · submitted 2026-06-01 · 💻 cs.AI

Joint Agent Memory and Exploration Learning via Novelty Signals

Shizuo Tian , Xiaohong Weng , Rui Kong , Yuxuan Chen , Guohong Liu , Yuebing Song , Jiacheng Liu , Yuchen Li

show 4 more authors

Dawei Yin Ting Cao Yunxin Liu Yuanchun Li

This is my paper

Pith reviewed 2026-06-28 15:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords joint agent memoryexploration learningnovelty signalslanguage model agentsopen-ended environmentscode coverageGUI agents

0 comments

The pith

JAMEL jointly trains an agent's memory module and exploration policy using novelty signals from interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that in open-ended environments, effective exploration requires memory to avoid repeating behaviors, but raw histories are too costly. JAMEL creates a loop where exploration generates novelty signals to train the memory, and the memory then supports better exploration. It uses signals like code coverage in GUI tasks as free supervision. This leads to agents that explore unseen environments better than open models and nearly as well as closed ones, using fewer tokens.

Core claim

By training memory and exploration together through novelty-driven interaction, where deterministic novelty signals provide annotation-free supervision, the JAMEL framework enables agents to generalize to unseen environments, achieving superior exploration to open-weight baselines and comparable depth to closed-source models with reduced token consumption.

What carries the argument

The mutually dependent loop between memory and exploration, where sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, and novelty-seeking provides supervision for memory.

If this is right

Memory compression becomes feasible without losing exploration utility.
Agents can handle longer trajectories in open-ended settings.
Exploration policies improve from the trained memory module.
Generalization occurs without environment-specific annotations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could apply to other environments with measurable progress metrics beyond GUI code coverage.
Reducing token consumption may allow deployment on resource-limited systems.
The approach might inspire similar joint training in other agent components like planning.
Testing in non-GUI domains would show if the novelty signal dependency holds broadly.

Load-bearing premise

That persistent novelty signals reliably indicate useful new behaviors for training the memory without external labels.

What would settle it

Observing that in a new environment the code coverage signal does not lead to improved exploration performance over time would falsify the utility of the joint learning loop.

Figures

Figures reproduced from arXiv: 2606.01528 by Dawei Yin, Guohong Liu, Jiacheng Liu, Rui Kong, Shizuo Tian, Ting Cao, Xiaohong Weng, Yuanchun Li, Yuchen Li, Yuebing Song, Yunxin Liu, Yuxuan Chen.

**Figure 1.** Figure 1: Architecture of JAMEL. 3 METHODOLOGY 3.1 EXPLORATION PROBLEM We model the exploration problem as a finite-horizon partially observable Markov decision process, P = (S, A, O, P, Ω, ρ0, H). At step t, the environment has hidden state st ∈ S, emits an observation ot ∼ Ω(· | st), receives an action at ∈ A, and transitions according to st+1 ∼ P(· | st, at). The agent observes the current observation and the pr… view at source ↗

**Figure 2.** Figure 2: Reward accumulation on test apps. Average cumulative coverage reward across 10 test apps over a 50-step session. Shaded bands denote standard error across apps. by retaining the complete explicit interaction history without any pruning. JAMEL aligns with this comprehensive retention strategy by avoiding context truncation. Instead, JAMEL compresses all historical information into latent memory tokens. This… view at source ↗

**Figure 3.** Figure 3: Per-app reward accumulation. Cumulative coverage reward trajectories evaluated on individual unseen applications. Exploration Patterns. The per-app breakdown in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce \textbf{J}oint \textbf{A}gent \textbf{M}emory and \textbf{E}xploration \textbf{L}earning (\textbf{JAMEL}), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JAMEL couples memory and exploration via novelty signals in a joint loop, but the approach stays tied to domains with clean deterministic signals like GUI code coverage.

read the letter

The paper's main contribution is JAMEL, a framework that trains agent memory compression and exploration policy together by feeding novelty signals back into both. The authors note the mutual dependency: memory needs ongoing interaction to become useful, while exploration needs memory to avoid repeating exhausted paths. They supply supervision through persistent signals such as code coverage, which requires no extra labels.

What stands out is the explicit joint training loop rather than treating memory and intrinsic motivation as separate modules. Open-sourcing the code and model is a clear positive; anyone working on agent trajectories can inspect the implementation directly.

The limitation is straightforward. The supervisory signal is described as deterministic and persistent in the GUI domain. In environments without an equivalent computable metric, the loop has no obvious source of training signal, so the generalization claim to unseen open-ended settings rests on an assumption that may not travel. The abstract gives no experiment details, baselines, or controls, which leaves the reported gains in exploration depth and token use hard to evaluate. Prior work on latent memory and curiosity-driven exploration already exists; without the full comparisons it is unclear how much the joint formulation adds.

This is for people building practical LLM agents that need to sustain exploration over long horizons. A reader who wants to test the open code on similar GUI tasks could find it useful. The work shows clear thinking about the dependency problem and honest engagement with the practical bottleneck, so it deserves a serious referee to examine the experiments and check whether the results hold beyond the signal-rich setting.

Referee Report

3 major / 1 minor

Summary. The paper introduces JAMEL, a framework that jointly trains an agent's memory module and exploration policy via novelty-driven interactions. It exploits a mutually dependent loop in which memory distinguishes exhausted from novel behaviors while novelty-seeking interactions supply supervision for memory; deterministic signals such as code coverage in GUI domains are used as annotation-free supervision. The central empirical claim is that the resulting agents generalize to unseen environments, outperform open-weight baselines in exploration depth, rival a closed-source model, and reduce token consumption.

Significance. If the results hold and the novelty signal generalizes, the approach would supply a scalable, annotation-free route to training persistent memory for long-horizon LLM agents, addressing a recognized bottleneck in open-ended exploration. The open release of code and model would further strengthen reproducibility.

major comments (3)

[Abstract] Abstract: the claim of successful generalization and performance gains is stated without any reported metrics, baselines, environment counts, or controls, so the support for the central empirical claim cannot be evaluated from the provided text.
[Method] Method / novelty-signal description: the mutual-dependency loop is presented as an observation that the framework exploits, yet no quantitative result is shown demonstrating that the memory module receives independent supervision rather than a quantity defined by the same exploration loop; this leaves the annotation-free claim vulnerable to circularity.
[Experiments] Experiments / domain discussion: the supervisory signal is explicitly tied to code coverage in the GUI domain; the manuscript supplies no evidence or argument that equivalent deterministic, persistent signals exist in arbitrary open-ended environments, so the extrapolation to general settings rests on an untested assumption.

minor comments (1)

[Abstract] Abstract: the bolding of individual letters in JAMEL is typographically inconsistent and should be rendered uniformly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of JAMEL. We respond to each major comment below and note planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of successful generalization and performance gains is stated without any reported metrics, baselines, environment counts, or controls, so the support for the central empirical claim cannot be evaluated from the provided text.

Authors: We agree that the abstract would benefit from quantitative support. In the revised manuscript we will add concise metrics drawn from the experimental section, including the number of unseen environments evaluated, specific baseline comparisons, and reported token reductions, subject to length limits. revision: yes
Referee: [Method] Method / novelty-signal description: the mutual-dependency loop is presented as an observation that the framework exploits, yet no quantitative result is shown demonstrating that the memory module receives independent supervision rather than a quantity defined by the same exploration loop; this leaves the annotation-free claim vulnerable to circularity.

Authors: The supervisory signal is code coverage, an external deterministic quantity computed by the GUI environment independently of the agent's policy or memory module. This breaks potential circularity. We will add an explicit quantitative analysis (e.g., an ablation comparing memory training with versus without the external coverage signal) to the revised method section. revision: partial
Referee: [Experiments] Experiments / domain discussion: the supervisory signal is explicitly tied to code coverage in the GUI domain; the manuscript supplies no evidence or argument that equivalent deterministic, persistent signals exist in arbitrary open-ended environments, so the extrapolation to general settings rests on an untested assumption.

Authors: The current evaluation centers on the GUI domain, yet the framework is formulated to accept any deterministic, persistent novelty signal supplied by an environment. We will expand the discussion section with concrete examples of analogous signals in other domains and will qualify the generalization claims accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with external supervision signal

full rationale

The paper introduces JAMEL as a joint training framework that exploits an observed mutual dependency between memory and exploration, using an external deterministic signal (code coverage) for supervision. No equations, fitted parameters renamed as predictions, or self-citation chains are present that reduce any claimed result to its own inputs by construction. Generalization and performance claims rest on empirical evaluations rather than a closed derivation loop. The domain-specific nature of the signal is an assumption about applicability, not a circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; full paper may contain additional parameters or assumptions not visible here.

axioms (2)

domain assumption Memory and exploration form a mutually dependent loop where each enables the other
Stated explicitly as an observation that motivates the joint training approach.
domain assumption Deterministic novelty signals such as code coverage supply reliable supervision for memory without annotations
Central premise enabling the annotation-free training described in the abstract.

pith-pipeline@v0.9.1-grok · 5771 in / 1242 out tokens · 29002 ms · 2026-06-28T15:01:55.437562+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Nature , volume=

Mastering diverse control tasks through world models , author=. Nature , volume=. 2025 , publisher=

2025
[2]

2025 , eprint=

Training Agents Inside of Scalable World Models , author=. 2025 , eprint=

2025
[3]

2026 , eprint=

Code2World: A GUI World Model via Renderable Code Generation , author=. 2026 , eprint=

2026
[4]

arXiv preprint arXiv:2602.20502 , year=

Actionengine: From reactive to programmatic gui agents via state machine memory , author=. arXiv preprint arXiv:2602.20502 , year=

work page arXiv
[5]

Advances in neural information processing systems , volume=

Learning universal policies via text-guided video generation , author=. Advances in neural information processing systems , volume=
[6]

2025 , eprint=

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models , author=. 2025 , eprint=

2025
[7]

2025 , eprint=

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators , author=. 2025 , eprint=

2025
[8]

2026 , eprint=

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising , author=. 2026 , eprint=

2026
[9]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

2020
[10]

2023 , eprint=

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection , author=. 2023 , eprint=

2023
[11]

2023 , eprint=

GPT Understands, Too , author=. 2023 , eprint=

2023
[12]

2022 , eprint=

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , author=. 2022 , eprint=

2022
[13]

2024 , eprint=

MEMORYLLM: Towards Self-Updatable Large Language Models , author=. 2024 , eprint=

2024
[14]

2025 , eprint=

M+: Extending MemoryLLM with Scalable Long-Term Memory , author=. 2025 , eprint=

2025
[15]

2026 , url =

Wu, Zijun and Hao, Yongchang and Mou, Lili , booktitle =. 2026 , url =

2026
[16]

2022 , eprint=

Recurrent Memory Transformer , author=. 2022 , eprint=

2022
[17]

and Darrell, Trevor , title =

Pathak, Deepak and Agrawal, Pulkit and Efros, Alexei A. and Darrell, Trevor , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

2017
[18]

2018 , eprint=

Exploration by Random Network Distillation , author=. 2018 , eprint=

2018
[19]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

2023
[20]

Nature , year=

First return, then explore , author=. Nature , year=
[21]

2022 , eprint=

Multi-Stage Episodic Control for Strategic Exploration in Text Games , author=. 2022 , eprint=

2022
[22]

2025 , eprint=

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models , author=. 2025 , eprint=

2025
[23]

2025 , eprint=

Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems , author=. 2025 , eprint=

2025
[24]

International Conference on Learning Representations , year=

Monte-Carlo Planning and Learning with Language Action Value Estimates , author=. International Conference on Learning Representations , year=
[25]

2025 , eprint=

Monte Carlo Planning with Large Language Model for Text-Based Game Agents , author=. 2025 , eprint=

2025
[26]

2025 , isbn =

Sun, Taize and Fujita, Katsuhide and Markov, Konstantin and Chang, Shengbo , title =. 2025 , isbn =. doi:10.1007/978-981-95-0020-8_27 , booktitle =

work page doi:10.1007/978-981-95-0020-8_27 2025
[27]

2026 , eprint=

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity , author=. 2026 , eprint=

2026
[28]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year =

Wang, Ruoyao and Jansen, Peter and C. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year =

2022
[29]

Trivedi, Harsh and Khot, Tushar and Hartmann, Mareike and Manku, Ruskin and Dong, Vinty and Li, Edward and Gupta, Shashank and Sabharwal, Ashish and Balasubramanian, Niranjan , booktitle =
[30]

2025 , eprint=

AgentEvolver: Towards Efficient Self-Evolving Agent System , author=. 2025 , eprint=

2025
[31]

Nature , volume =

First Return, Then Explore , author =. Nature , volume =
[32]

Zhang, Minpeng and Cao, Rui and others , year =. Scaling. 2409.08610 , archivePrefix =

work page arXiv
[33]

2025 , eprint=

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents , author=. 2025 , eprint=

2025
[34]

Transactions on Machine Learning Research , issn=

The BrowserGym Ecosystem for Web Agent Research , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[35]

2026 , version =

Guohong Liu , title =. 2026 , version =

2026
[36]

and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , booktitle =

Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , booktitle =
[37]

Advances in Neural Information Processing Systems , year =

Mind2Web: Towards a Generalist Agent for the Web , author =. Advances in Neural Information Processing Systems , year =
[38]

Qi, Zehan and Liu, Xiao and Iong, Iat Long and Lai, Hanyu and Wang, Xueqiao and Yang, Zhiliang and Chen, Zhizheng and Yu, Yanghua and Wang, Xinyi and Liu, Zhenyu and Yao, Jiadai and Jin, Tianjie and Zhang, Shulin and Li, Jie and Tang, Yuxiao and Dong, Jie , booktitle =
[39]

and Alon, Uri and Neubig, Graham and Bisk, Yonatan and Salakhutdinov, Ruslan , booktitle =

Pan, Hao and Zhou, Shuyan and Sclar, Meret and Xu, Frank F. and Alon, Uri and Neubig, Graham and Bisk, Yonatan and Salakhutdinov, Ruslan , booktitle =
[40]

International Conference on Machine Learning , year =

GPT-4V(ision) is a Generalist Web Agent, if Grounded , author =. International Conference on Machine Learning , year =
[41]

Cheng, Kanzhi and Sun, Qiushi and Chu, Yougang and Xu, Fangzhi and Li, Yantao and Zhang, Jianbing and Wu, Zhiyong , booktitle =
[42]

He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong , journal =
[43]

International Conference on Machine Learning , year =

Curiosity-Driven Exploration by Self-Supervised Prediction , author =. International Conference on Machine Learning , year =
[44]

International Conference on Learning Representations , year =

Large-Scale Study of Curiosity-Driven Learning , author =. International Conference on Learning Representations , year =
[45]

Advances in Neural Information Processing Systems , year =

Unifying Count-Based Exploration and Intrinsic Motivation , author =. Advances in Neural Information Processing Systems , year =
[46]

Advances in Neural Information Processing Systems , year =

\#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , author =. Advances in Neural Information Processing Systems , year =
[47]

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

Remember to be Curious: Episodic Context and Persistent World Models Enable Curiosity-Driven Exploration , author =. arXiv preprint arXiv:2605.22814 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[48]

and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E

Packer, Charles and Fang, Vivian and Patil, Shishir G. and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E. , journal =
[49]

Transactions on Machine Learning Research , year =

Cognitive Architectures for Language Agents , author =. Transactions on Machine Learning Research , year =
[50]

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =
[51]

Advances in Neural Information Processing Systems , year =

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author =. Advances in Neural Information Processing Systems , year =
[52]

Artificial Intelligence , volume =

Planning and Acting in Partially Observable Stochastic Domains , author =. Artificial Intelligence , volume =
[53]

and Verme, Manuel Del and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , title =

Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Verme, Manuel Del and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

2024
[54]

Communications of the ACM , volume =

Automatic Generation of Test Cases , author =. Communications of the ACM , volume =
[55]

2014 , howpublished =

Zalewski, Micha. 2014 , howpublished =

2014
[56]

Coverage-based Greybox Fuzzing as

B. Coverage-based Greybox Fuzzing as. ACM Conference on Computer and Communications Security , year =
[57]

Bulletin de la Soci

Jaccard, Paul , title =. Bulletin de la Soci
[58]

International Conference on Learning Representations , year =

Retrieval Meets Long Context Large Language Models , author =. International Conference on Learning Representations , year =
[59]

Compressing Long Context for Enhancing

Zhong, Wenhao and others , journal =. Compressing Long Context for Enhancing
[60]

2024 , eprint=

A Survey on the Memory Mechanism of Large Language Model based Agents , author=. 2024 , eprint=

2024
[61]

2026 , eprint=

NextMem: Towards Latent Factual Memory for LLM-based Agents , author=. 2026 , eprint=

2026
[62]

2026 , eprint=

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents , author=. 2026 , eprint=

2026
[63]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025
[64]

2025 , eprint=

Qwen2.5-VL Technical Report , author=. 2025 , eprint=

2025
[65]

2026 , month = mar, howpublished =

2026
[66]

2025 , eprint=

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration , author=. 2025 , eprint=

2025
[67]

2025 , eprint=

LLM-Explorer: Towards Efficient and Affordable LLM-based Exploration for Mobile Apps , author=. 2025 , eprint=

2025

[1] [1]

Nature , volume=

Mastering diverse control tasks through world models , author=. Nature , volume=. 2025 , publisher=

2025

[2] [2]

2025 , eprint=

Training Agents Inside of Scalable World Models , author=. 2025 , eprint=

2025

[3] [3]

2026 , eprint=

Code2World: A GUI World Model via Renderable Code Generation , author=. 2026 , eprint=

2026

[4] [4]

arXiv preprint arXiv:2602.20502 , year=

Actionengine: From reactive to programmatic gui agents via state machine memory , author=. arXiv preprint arXiv:2602.20502 , year=

work page arXiv

[5] [5]

Advances in neural information processing systems , volume=

Learning universal policies via text-guided video generation , author=. Advances in neural information processing systems , volume=

[6] [6]

2025 , eprint=

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models , author=. 2025 , eprint=

2025

[7] [7]

2025 , eprint=

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators , author=. 2025 , eprint=

2025

[8] [8]

2026 , eprint=

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising , author=. 2026 , eprint=

2026

[9] [9]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

2020

[10] [10]

2023 , eprint=

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection , author=. 2023 , eprint=

2023

[11] [11]

2023 , eprint=

GPT Understands, Too , author=. 2023 , eprint=

2023

[12] [12]

2022 , eprint=

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , author=. 2022 , eprint=

2022

[13] [13]

2024 , eprint=

MEMORYLLM: Towards Self-Updatable Large Language Models , author=. 2024 , eprint=

2024

[14] [14]

2025 , eprint=

M+: Extending MemoryLLM with Scalable Long-Term Memory , author=. 2025 , eprint=

2025

[15] [15]

2026 , url =

Wu, Zijun and Hao, Yongchang and Mou, Lili , booktitle =. 2026 , url =

2026

[16] [16]

2022 , eprint=

Recurrent Memory Transformer , author=. 2022 , eprint=

2022

[17] [17]

and Darrell, Trevor , title =

Pathak, Deepak and Agrawal, Pulkit and Efros, Alexei A. and Darrell, Trevor , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

2017

[18] [18]

2018 , eprint=

Exploration by Random Network Distillation , author=. 2018 , eprint=

2018

[19] [19]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

2023

[20] [20]

Nature , year=

First return, then explore , author=. Nature , year=

[21] [21]

2022 , eprint=

Multi-Stage Episodic Control for Strategic Exploration in Text Games , author=. 2022 , eprint=

2022

[22] [22]

2025 , eprint=

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models , author=. 2025 , eprint=

2025

[23] [23]

2025 , eprint=

Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems , author=. 2025 , eprint=

2025

[24] [24]

International Conference on Learning Representations , year=

Monte-Carlo Planning and Learning with Language Action Value Estimates , author=. International Conference on Learning Representations , year=

[25] [25]

2025 , eprint=

Monte Carlo Planning with Large Language Model for Text-Based Game Agents , author=. 2025 , eprint=

2025

[26] [26]

2025 , isbn =

Sun, Taize and Fujita, Katsuhide and Markov, Konstantin and Chang, Shengbo , title =. 2025 , isbn =. doi:10.1007/978-981-95-0020-8_27 , booktitle =

work page doi:10.1007/978-981-95-0020-8_27 2025

[27] [27]

2026 , eprint=

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity , author=. 2026 , eprint=

2026

[28] [28]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year =

Wang, Ruoyao and Jansen, Peter and C. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year =

2022

[29] [29]

Trivedi, Harsh and Khot, Tushar and Hartmann, Mareike and Manku, Ruskin and Dong, Vinty and Li, Edward and Gupta, Shashank and Sabharwal, Ashish and Balasubramanian, Niranjan , booktitle =

[30] [30]

2025 , eprint=

AgentEvolver: Towards Efficient Self-Evolving Agent System , author=. 2025 , eprint=

2025

[31] [31]

Nature , volume =

First Return, Then Explore , author =. Nature , volume =

[32] [32]

Zhang, Minpeng and Cao, Rui and others , year =. Scaling. 2409.08610 , archivePrefix =

work page arXiv

[33] [33]

2025 , eprint=

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents , author=. 2025 , eprint=

2025

[34] [34]

Transactions on Machine Learning Research , issn=

The BrowserGym Ecosystem for Web Agent Research , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025

[35] [35]

2026 , version =

Guohong Liu , title =. 2026 , version =

2026

[36] [36]

and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , booktitle =

Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , booktitle =

[37] [37]

Advances in Neural Information Processing Systems , year =

Mind2Web: Towards a Generalist Agent for the Web , author =. Advances in Neural Information Processing Systems , year =

[38] [38]

Qi, Zehan and Liu, Xiao and Iong, Iat Long and Lai, Hanyu and Wang, Xueqiao and Yang, Zhiliang and Chen, Zhizheng and Yu, Yanghua and Wang, Xinyi and Liu, Zhenyu and Yao, Jiadai and Jin, Tianjie and Zhang, Shulin and Li, Jie and Tang, Yuxiao and Dong, Jie , booktitle =

[39] [39]

and Alon, Uri and Neubig, Graham and Bisk, Yonatan and Salakhutdinov, Ruslan , booktitle =

Pan, Hao and Zhou, Shuyan and Sclar, Meret and Xu, Frank F. and Alon, Uri and Neubig, Graham and Bisk, Yonatan and Salakhutdinov, Ruslan , booktitle =

[40] [40]

International Conference on Machine Learning , year =

GPT-4V(ision) is a Generalist Web Agent, if Grounded , author =. International Conference on Machine Learning , year =

[41] [41]

Cheng, Kanzhi and Sun, Qiushi and Chu, Yougang and Xu, Fangzhi and Li, Yantao and Zhang, Jianbing and Wu, Zhiyong , booktitle =

[42] [42]

He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong , journal =

[43] [43]

International Conference on Machine Learning , year =

Curiosity-Driven Exploration by Self-Supervised Prediction , author =. International Conference on Machine Learning , year =

[44] [44]

International Conference on Learning Representations , year =

Large-Scale Study of Curiosity-Driven Learning , author =. International Conference on Learning Representations , year =

[45] [45]

Advances in Neural Information Processing Systems , year =

Unifying Count-Based Exploration and Intrinsic Motivation , author =. Advances in Neural Information Processing Systems , year =

[46] [46]

Advances in Neural Information Processing Systems , year =

\#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , author =. Advances in Neural Information Processing Systems , year =

[47] [47]

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

Remember to be Curious: Episodic Context and Persistent World Models Enable Curiosity-Driven Exploration , author =. arXiv preprint arXiv:2605.22814 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E

Packer, Charles and Fang, Vivian and Patil, Shishir G. and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E. , journal =

[49] [49]

Transactions on Machine Learning Research , year =

Cognitive Architectures for Language Agents , author =. Transactions on Machine Learning Research , year =

[50] [50]

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =

[51] [51]

Advances in Neural Information Processing Systems , year =

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author =. Advances in Neural Information Processing Systems , year =

[52] [52]

Artificial Intelligence , volume =

Planning and Acting in Partially Observable Stochastic Domains , author =. Artificial Intelligence , volume =

[53] [53]

and Verme, Manuel Del and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , title =

Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Verme, Manuel Del and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

2024

[54] [54]

Communications of the ACM , volume =

Automatic Generation of Test Cases , author =. Communications of the ACM , volume =

[55] [55]

2014 , howpublished =

Zalewski, Micha. 2014 , howpublished =

2014

[56] [56]

Coverage-based Greybox Fuzzing as

B. Coverage-based Greybox Fuzzing as. ACM Conference on Computer and Communications Security , year =

[57] [57]

Bulletin de la Soci

Jaccard, Paul , title =. Bulletin de la Soci

[58] [58]

International Conference on Learning Representations , year =

Retrieval Meets Long Context Large Language Models , author =. International Conference on Learning Representations , year =

[59] [59]

Compressing Long Context for Enhancing

Zhong, Wenhao and others , journal =. Compressing Long Context for Enhancing

[60] [60]

2024 , eprint=

A Survey on the Memory Mechanism of Large Language Model based Agents , author=. 2024 , eprint=

2024

[61] [61]

2026 , eprint=

NextMem: Towards Latent Factual Memory for LLM-based Agents , author=. 2026 , eprint=

2026

[62] [62]

2026 , eprint=

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents , author=. 2026 , eprint=

2026

[63] [63]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025

[64] [64]

2025 , eprint=

Qwen2.5-VL Technical Report , author=. 2025 , eprint=

2025

[65] [65]

2026 , month = mar, howpublished =

2026

[66] [66]

2025 , eprint=

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration , author=. 2025 , eprint=

2025

[67] [67]

2025 , eprint=

LLM-Explorer: Towards Efficient and Affordable LLM-based Exploration for Mobile Apps , author=. 2025 , eprint=

2025