TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents

arxiv: 2605.17320 · v1 · pith:DBFULCHJnew · submitted 2026-05-17 · 💻 cs.OS · cs.AI

TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents

Yutong Huang , Vikranth Srivatsa , Alex Asch , Hansin Tushar Patwa , Yiying Zhang This is my paper

Pith reviewed 2026-05-19 22:51 UTC · model grok-4.3

classification 💻 cs.OS cs.AI

keywords TCloneGUI workspace forkingcomputer-use agentslow-latency checkpointingworkspace versioningagent isolationcopy-on-write sharingspeculative execution

0 comments p. Extension

pith:DBFULCHJ Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{DBFULCHJ}

Prints a linked pith:DBFULCHJ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

TClone forks live GUI workspaces at low latency by separating fast branch creation from durable checkpointing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Computer-use agents work inside live personal workspaces where their actions can change files, applications, and authenticated sessions. This creates a need for both isolation to prevent damage and fast branching to allow speculative execution and parallel search. TClone meets this need by letting a live GUI workspace be snapshotted, forked into isolated branches, rolled back, and selectively committed or merged. The design relies on sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing. End-to-end agent-loop tests show the system reduces total task latency by 1.9 times compared with KVM and 1.5 times compared with CRIU.

Core claim

TClone enables a live GUI workspace to be snapshotted, forked into isolated branches, rolled back, and selectively committed or merged. Its design separates fast branch creation from durable checkpointing using sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing. In end-to-end agent-loop measurements this yields total task latency reductions of 1.9x over KVM and 1.5x over CRIU, turning workspace versioning into a first-class systems primitive for safer and higher-quality agent execution over real personal computing environments.

What carries the argument

Separation of fast branch creation from durable checkpointing via sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing.

If this is right

Agents gain the ability to run speculative actions in parallel isolated branches without risking the main workspace state.
Rollback becomes a low-cost operation that lets agents recover quickly from mistaken actions on files or sessions.
Selective commit and merge allow successful exploratory paths to be integrated back into the persistent workspace.
Overall agent task loops complete faster because forking overhead no longer dominates the execution time.
Workspace versioning can be treated as a routine primitive rather than an expensive external operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of fast forking from durable saves could be applied to non-GUI interactive environments such as terminal sessions or browser tabs.
Combining TClone-style branching with existing container orchestration tools could let agent frameworks scale speculative search across many machines.
Long-running authenticated sessions inside branches may still require careful handling of network state that the current design treats as local.
Measuring how often agents actually benefit from more than a handful of concurrent branches would test whether the latency gains translate to higher task success rates.

Load-bearing premise

The separation of fast branch creation from durable checkpointing using sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing delivers both low latency and adequate isolation for live interactive GUI workspaces in practice.

What would settle it

An end-to-end agent-loop run in which TClone fails to show at least a 1.5 times latency reduction versus CRIU, or a case where a forked branch corrupts state visible in the parent workspace.

Figures

Figures reproduced from arXiv: 2605.17320 by Alex Asch, Hansin Tushar Patwa, Vikranth Srivatsa, Yiying Zhang, Yutong Huang.

**Figure 1.** Figure 1: OSWorld Task System Call and File Accesses. Baseline: entire syscall and file system sets. Agent: running Agent S3. Human: human operating the same task. screens, plan actions, and complete long-horizon computer tasks [3, 5, 19, 22, 24, 27, 34]. Agent quality is increasingly tied to test-time computation. For language tasks, methods such as best-of-N sampling, beam search, self-consistency, and tree searc… view at source ↗

**Figure 2.** Figure 2: CUA Execution Time Breakdown. ory or disk blocks: it includes browser tabs, cookies, GUI windows, display buffers, clipboard contents, filesystem mutations, terminal sessions, local services, network connections, and application caches. Existing mechanisms expose lowlevel snapshot and restore operations, but not a first-class abstraction for agent trajectories, rollback points, parallel branches, branch-… view at source ↗

**Figure 3.** Figure 3: Overview of TClone Personal Workspace Versioning. handles the efficient snapshotting, cloning, rollback, and merging of containers [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: TClone Workspace Fork Procedure. TClone parallelize snapshot, clone, and memory state persistency process has the original process as its parent, so individually forking processes A, B, and C yields the inconsistent tree on the left of [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Linux Process Fork vs. TClone Process-Tree Fork. Left: native fork() duplicates one process as a child of the source process. Right: TClone creates a sibling workspace container and reconstructs the whole process tree inside an independent namespace. branches of one source are supported with no additional mechanism, structurally equivalent to invoking fork() N times against the same frozen parent. Not eve… view at source ↗

**Figure 6.** Figure 6: Lazy CoW File-Cache Versioning. TClone forking address space 2 and 3 from the original address space 1. layer chain before falling back to storage, mapping any layerresident page read-only into the branch. A write to a shared cached page materializes a private copy in the writing branch and leaves the layer untouched for the others; a further fork derives a new layer so siblings can share post-write state… view at source ↗

**Figure 7.** Figure 7: CDF of end-to-end task latency. Left: AgentLoop running the GTA benchmark. Right: Agent S3 running the OSWorld benchmark. Browser Office Creative/Comms Multi-app 0 200 400 600 800 1000 Latency (s) TClone CRIU KVM (a) Latency by category. Browser Office Creative/Comms Multi-app 10 4 Memory (MB) TClone CRIU KVM (b) Memory footprint by category [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: OSWorld task categories. End-to-end latency and memory footprint for OSWorld tasks grouped by task type. profiles, application files, and multi-step trajectories. Branching overhead remains visible in end-to-end runtime even here, and TClone completes tasks earlier with shorter tails than KVM and CRIU. The gains are largest in this setting because the desktop workspace is heavier: TClone is up to 1.9× fa… view at source ↗

**Figure 10.** Figure 10: Memory footprint vs. number of concurrent clone Sync dump + Serial clone Sync dump + Parallel clone Sync dump + memcpy clone Sync dump + CoW clone TClone = async dump + CoW clone 0 5 10 latency (s) [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 12.** Figure 12: File Operation Latency vs. OverlayFS Layers. [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

read the original abstract

Computer-use agents increasingly operate inside live personal workspaces, where their actions can modify files, applications, GUI state, credentials, and authenticated sessions. This creates a tension between safety and quality: agents need isolation and rollback to avoid damaging user state, but also need fast branching to support speculative execution and parallel search. Existing VMs, containers, and checkpoint/restore systems can isolate or recover workloads, but they do not provide low-latency versioning of a full interactive workspace. We present TClone, a forkable personal workspace system for computer-use agents. TClone enables a live GUI workspace to be snapshotted, forked into isolated branches, rolled back, and selectively committed or merged. Its design separates fast branch creation from durable checkpointing, using sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing. In our end-to-end agent-loop measurement, TClone reduces total task latency by 1.9x and 1.5x over KVM and CRIU. By making workspace versioning a first-class systems primitive, TClone supports safer and higher-quality agent execution over real personal computing environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TClone gives a practical container-based way to fork live GUI workspaces for agents with reported speedups, but thin evaluation and GUI isolation details are the main gaps.

read the letter

Hi, TClone's main idea is a system that snapshots and forks a running GUI desktop into isolated branches for computer-use agents, letting them try actions, roll back, or merge changes without wrecking the original workspace. It relies on sibling containers, copy-on-write memory sharing, filesystem versioning, and GUI-local execution to keep branch creation fast while pushing durable checkpoints to the background. The end-to-end agent-loop numbers show 1.9x lower latency than KVM and 1.5x lower than CRIU, which is the kind of concrete gain that matters for speculative execution in real user environments. That separation of fast forking from async saves is a reasonable engineering choice and seems to deliver on the safety-versus-speed tension the abstract describes. The work is aimed squarely at people building agents that operate inside personal desktops rather than clean server sandboxes, and the architecture feels like a natural extension of existing container techniques to interactive GUI state. On the soft side, the measurements are presented without workload descriptions, run counts, or variance numbers, so it is hard to judge how stable the gains are across different applications or agent behaviors. The bigger open question is whether sibling containers actually isolate display-server sockets, shared-memory mappings, and session tokens without extra handling; if those resources stay shared or get lazily copied, a branch could still see or affect parent state and the isolation premise would not fully hold. This is worth a serious referee because the core primitive is timely and the implementation choices are specific enough to review in detail, even if the experiments and GUI edge cases need more work.

Referee Report

2 major / 1 minor

Summary. The manuscript presents TClone, a system for low-latency forking of live GUI environments for computer-use agents. It enables snapshotting a live GUI workspace, forking into isolated branches, rollback, and selective commit or merge. The design separates fast branch creation (sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution) from durable checkpointing via asynchronous mechanisms. End-to-end agent-loop measurements claim 1.9x and 1.5x reductions in total task latency over KVM and CRIU.

Significance. If substantiated, TClone would provide a valuable first-class primitive for workspace versioning in interactive GUI settings, helping resolve the safety-quality tension for agents by supporting speculative execution and rollback without damaging user state. The separation of fast forking from checkpointing is a promising systems approach for real personal computing environments.

major comments (2)

[Abstract and Evaluation] Abstract and §Evaluation: the abstract reports concrete latency reductions of 1.9x over KVM and 1.5x over CRIU from end-to-end agent-loop measurements, yet provides no workload descriptions, controls, error bars, or run counts. This absence is load-bearing for assessing the central performance claim.
[Design] Design section: the approach relies on sibling containers, CoW memory sharing, and GUI-local execution for isolation during forking. It is unclear how open sockets to the display server (X11/Wayland compositor) and in-memory session state or file descriptors are duplicated or namespaced, which risks violating the isolation premise that supports both safety and the reported latency gains.

minor comments (1)

A table or figure summarizing the latency results with statistical details would improve clarity of the performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our performance results and the isolation mechanisms. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and §Evaluation: the abstract reports concrete latency reductions of 1.9x over KVM and 1.5x over CRIU from end-to-end agent-loop measurements, yet provides no workload descriptions, controls, error bars, or run counts. This absence is load-bearing for assessing the central performance claim.

Authors: We agree that the abstract would be strengthened by additional context for the reported latency numbers. In the revised manuscript we will update the abstract to include a concise description of the workloads (representative computer-use agent tasks involving GUI interactions and state changes) and will explicitly direct readers to the full details in §Evaluation. The evaluation section already documents the experimental controls, error bars computed over repeated runs, and the number of trials; we will ensure these elements are cross-referenced more prominently from the abstract and introduction. revision: yes
Referee: [Design] Design section: the approach relies on sibling containers, CoW memory sharing, and GUI-local execution for isolation during forking. It is unclear how open sockets to the display server (X11/Wayland compositor) and in-memory session state or file descriptors are duplicated or namespaced, which risks violating the isolation premise that supports both safety and the reported latency gains.

Authors: We thank the referee for highlighting this aspect of the isolation design. The current Design section emphasizes sibling containers and GUI-local execution, but we acknowledge that the handling of display-server sockets, session state, and file descriptors merits explicit description. In the revision we will add a dedicated paragraph (or short subsection) explaining that (1) display connections are isolated by instantiating per-branch virtual displays or proxies within the container namespace, (2) mutable in-memory session state is copied on write while immutable portions remain shared, and (3) file descriptors and sockets are duplicated through standard Linux namespace mechanisms (PID, network, and IPC) at fork time. These additions will make the isolation guarantees and their contribution to both safety and low latency fully transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: latency results are direct measurements from described implementation

full rationale

The paper presents TClone as a systems design using sibling containers, copy-on-write memory, filesystem versioning, GUI-local execution, and asynchronous checkpointing. End-to-end latency reductions (1.9x over KVM, 1.5x over CRIU) are reported as empirical measurements from agent-loop experiments rather than quantities derived from equations, fitted parameters, or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the provided text that would collapse the central claims back to their inputs by construction. The derivation chain is self-contained as an engineering artifact evaluated externally.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are identified; the description remains at the level of high-level design techniques and measured outcomes.

pith-pipeline@v0.9.0 · 5752 in / 1151 out tokens · 46968 ms · 2026-05-19T22:51:31.712347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Its design separates fast branch creation from durable checkpointing, using sibling containers, copy-on-write memory sharing, filesystem versioning, GUI-local execution, and asynchronous checkpointing.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TClone reconstructs the tree rather than forking it in place... recreates each task with its recorded namespace-local PID

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Firecracker: Lightweight virtualization for serverless applications

Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. Firecracker: Lightweight virtualization for serverless applications. In17th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 20), pages 419–434, Santa Clara, CA, 2020. USENIX Association

work page 2020
[2]

Agent S: An open agen- tic framework that uses computers like a human

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Eric Wang. Agent S: An open agen- tic framework that uses computers like a human. In International Conference on Learning Representations, 2025

work page 2025
[3]

Claude Cowork by anthropic

Anthropic. Claude Cowork by anthropic. https: //www.anthropic.com/product/claude-cowork,

work page
[5]

Computer use tool

Anthropic. Computer use tool. https://platform. claude.com/docs/en/agents-and-tools/ tool-use/computer-use-tool , 2026. Claude API documentation; accessed: 2026-05-15

work page 2026
[6]

Computer use tool

Anthropic. Computer use tool. https://platform. claude.com/docs/en/agents-and-tools/ tool-use/computer-use-tool , 2026. Claude API documentation; accessed: 2026-05-14

work page 2026
[7]

Let claude use your computer in cowork

Anthropic. Let claude use your computer in cowork. https://support.claude.com/en/articles/ 14128542-let-claude-use-your-computer-in-cowork , April 2026. Claude Help Center; accessed: 2026-05-14

work page 2026
[8]

Overview of virtual machine snapshots in vSphere

Broadcom. Overview of virtual machine snapshots in vSphere. https://knowledge. broadcom.com/external/article/342618/ overview-of-virtual-machine-snapshots-in. html, 2025. Accessed: 2026-05-15

work page 2025
[9]

CRIU: Checkpoint/restore in userspace

Checkpoint/Restore Project. CRIU: Checkpoint/restore in userspace. https://criu.org/Main_Page, 2025. Accessed: 2026-05-15

work page 2025
[10]

Live migration of virtual machines

Christopher Clark, Keir Fraser, Steven Hand, Ja- cob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In2nd Symposium on Networked Systems Design & Implementation (NSDI 05), Boston, MA, May

work page
[11]

Re- mus: High availability via asynchronous virtual machine replication

Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. Re- mus: High availability via asynchronous virtual machine replication. In5th USENIX Symposium on Networked Systems Design and Implementation (NSDI 08), San Francisco, CA, April 2008. USENIX Association

work page 2008
[12]

docker checkpoint

Docker Inc. docker checkpoint. https: //docs.docker.com/reference/cli/docker/ checkpoint/, 2026. Docker CLI documentation; accessed: 2026-05-15

work page 2026
[13]

gVisor: The container security platform

Google. gVisor: The container security platform. https://gvisor.dev/, 2026. Accessed: 2026-05-15

work page 2026
[14]

Wayland Project, 2012

Kristian Hoegsberg.The Wayland Protocol. Wayland Project, 2012. Accessed: 2026-05-15

work page 2012
[15]

Kata Containers: Open source con- tainer runtime software

Kata Containers. Kata Containers: Open source con- tainer runtime software. https://katacontainers. io/, 2026. Accessed: 2026-05-15

work page 2026
[16]

Andrés Lagar-Cavilla, Joseph A

H. Andrés Lagar-Cavilla, Joseph A. Whitney, Adin Scan- nell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and M. Satyanarayanan. SnowFlock: Rapid virtual machine cloning for cloud computing. In Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys ’09, pages 1–12. ACM, 2009

work page 2009
[17]

Webtop: Linux in a web browser

LinuxServer.io. Webtop: Linux in a web browser. https://github.com/linuxserver/ docker-webtop, 2026. Accessed: 2026-05-15

work page 2026
[18]

Meyer, Gitika Aggarwal, Brendan Cully, Geof- frey Lefebvre, Michael J

Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geof- frey Lefebvre, Michael J. Feeley, Norman C. Hutchin- son, and Andrew Warfield. Parallax: Virtual disks for virtual machines. InProceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems, EuroSys ’08, pages 41–54. ACM, 2008

work page 2008
[19]

Fast transparent migration for virtual machines

Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast transparent migration for virtual machines. In2005 USENIX Annual Technical Conference (USENIX ATC 05), Anaheim, CA, April 2005. USENIX Association

work page 2005
[20]

Hermes Agent: Computer use (ma- cos)

Nous Research. Hermes Agent: Computer use (ma- cos). https://hermes-agent.nousresearch.com/ docs/user-guide/features/computer-use, 2026. Accessed: 2026-05-14

work page 2026
[21]

Codex sandbox

OpenAI. Codex sandbox. https://developers. openai.com/codex/concepts/sandboxing, 2026. OpenAI developer documentation; accessed: 2026-05- 15

work page 2026
[22]

Codex Web: Delegate to codex in the cloud

OpenAI. Codex Web: Delegate to codex in the cloud. https://developers.openai.com/codex/ cloud, 2026. OpenAI developer documentation; ac- cessed: 2026-05-14

work page 2026
[23]

Codex Web: Delegate to codex in the cloud

OpenAI. Codex Web: Delegate to codex in the cloud. https://developers.openai.com/codex/ cloud, 2026. OpenAI developer documentation; ac- cessed: 2026-05-14. 13

work page 2026
[24]

Sandbox agents

OpenAI. Sandbox agents. https://developers. openai.com/api/docs/guides/agents/ sandboxes, 2026. OpenAI API documentation; accessed: 2026-05-15

work page 2026
[25]

OpenClaw: Personal ai as- sistant

OpenClaw Contributors. OpenClaw: Personal ai as- sistant. https://github.com/openclaw/openclaw,

work page
[26]

Accessed: 2026-05-14

work page 2026
[27]

Scheifler and Jim Gettys

Robert W. Scheifler and Jim Gettys. The X window system.ACM Transactions on Graphics, 5(2):79–109, April 1986

work page 1986
[28]

Selkies: Linux WebRTC HTML5 re- mote desktop streaming platform

Selkies Project. Selkies: Linux WebRTC HTML5 re- mote desktop streaming platform. https://github. com/selkies-project/selkies, 2024. Accessed: 2026-05-15

work page 2024
[29]

Agent S3: Approaching human-level com- puter use with wide scaling

Simular AI. Agent S3: Approaching human-level com- puter use with wide scaling. https://www.simular. ai/articles/agent-s3, October 2025. Accessed: 2026-05-14

work page 2025
[30]

Agent al- pha: Tree search unifying generation, exploration and evaluation for computer-use agents.arXiv preprint arXiv:2602.02995, 2026

Sizhe Tang, Rongqian Chen, and Tian Lan. Agent al- pha: Tree search unifying generation, exploration and evaluation for computer-use agents.arXiv preprint arXiv:2602.02995, 2026

work page arXiv 2026
[31]

Snoeren, Geoffrey M

Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geoffrey M. V oelker, and Stefan Savage. Scalability, fidelity, and containment in the Potemkin virtual honeyfarm. InProceedings of the 20th ACM Symposium on Operating Systems Principles, SOSP ’05, pages 148–162. ACM, 2005

work page 2005
[32]

GTA: A benchmark for general tool agents

Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cail- ian Chen, Kai Chen, and Xinyi Le. GTA: A benchmark for general tool agents. InAdvances in Neural Informa- tion Processing Systems, pages 75749–75790, 2024

work page 2024
[33]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations, 2023

work page 2023
[34]

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Vic- tor Zhong, and Tao Yu. OSWorld: Benchmarking mul- timodal agents for open-ended tasks in real computer environments.arXiv preprint arXiv:2404.07972, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Jimenez, Alexander Wettig, Kil- ian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kil- ian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems, 2024

work page 2024
[36]

GTA1: GUI test-time scaling agent

Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, Ran Xu, Liyuan Pan, Caiming Xiong, and Junnan Li. GTA1: GUI test-time scaling agent. InInternational Conference on Learning Repre- sentations, 2026

work page 2026
[37]

Griffiths, Yuan Cao, and Karthik Narasimhan

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[38]

Scaling test-time compute for LLM agents.arXiv preprint arXiv:2506.12928, 2025

King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, De- hua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jia- heng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, and Wangchunshu Zhou. Scaling test-time compute for LLM agents.arXiv preprint arXiv:2506.12928, 2025. 14

work page arXiv 2025

[1] [1]

Firecracker: Lightweight virtualization for serverless applications

Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. Firecracker: Lightweight virtualization for serverless applications. In17th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 20), pages 419–434, Santa Clara, CA, 2020. USENIX Association

work page 2020

[2] [2]

Agent S: An open agen- tic framework that uses computers like a human

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Eric Wang. Agent S: An open agen- tic framework that uses computers like a human. In International Conference on Learning Representations, 2025

work page 2025

[3] [3]

Claude Cowork by anthropic

Anthropic. Claude Cowork by anthropic. https: //www.anthropic.com/product/claude-cowork,

work page

[4] [5]

Computer use tool

Anthropic. Computer use tool. https://platform. claude.com/docs/en/agents-and-tools/ tool-use/computer-use-tool , 2026. Claude API documentation; accessed: 2026-05-15

work page 2026

[5] [6]

Computer use tool

Anthropic. Computer use tool. https://platform. claude.com/docs/en/agents-and-tools/ tool-use/computer-use-tool , 2026. Claude API documentation; accessed: 2026-05-14

work page 2026

[6] [7]

Let claude use your computer in cowork

Anthropic. Let claude use your computer in cowork. https://support.claude.com/en/articles/ 14128542-let-claude-use-your-computer-in-cowork , April 2026. Claude Help Center; accessed: 2026-05-14

work page 2026

[7] [8]

Overview of virtual machine snapshots in vSphere

Broadcom. Overview of virtual machine snapshots in vSphere. https://knowledge. broadcom.com/external/article/342618/ overview-of-virtual-machine-snapshots-in. html, 2025. Accessed: 2026-05-15

work page 2025

[8] [9]

CRIU: Checkpoint/restore in userspace

Checkpoint/Restore Project. CRIU: Checkpoint/restore in userspace. https://criu.org/Main_Page, 2025. Accessed: 2026-05-15

work page 2025

[9] [10]

Live migration of virtual machines

Christopher Clark, Keir Fraser, Steven Hand, Ja- cob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In2nd Symposium on Networked Systems Design & Implementation (NSDI 05), Boston, MA, May

work page

[10] [11]

Re- mus: High availability via asynchronous virtual machine replication

Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. Re- mus: High availability via asynchronous virtual machine replication. In5th USENIX Symposium on Networked Systems Design and Implementation (NSDI 08), San Francisco, CA, April 2008. USENIX Association

work page 2008

[11] [12]

docker checkpoint

Docker Inc. docker checkpoint. https: //docs.docker.com/reference/cli/docker/ checkpoint/, 2026. Docker CLI documentation; accessed: 2026-05-15

work page 2026

[12] [13]

gVisor: The container security platform

Google. gVisor: The container security platform. https://gvisor.dev/, 2026. Accessed: 2026-05-15

work page 2026

[13] [14]

Wayland Project, 2012

Kristian Hoegsberg.The Wayland Protocol. Wayland Project, 2012. Accessed: 2026-05-15

work page 2012

[14] [15]

Kata Containers: Open source con- tainer runtime software

Kata Containers. Kata Containers: Open source con- tainer runtime software. https://katacontainers. io/, 2026. Accessed: 2026-05-15

work page 2026

[15] [16]

Andrés Lagar-Cavilla, Joseph A

H. Andrés Lagar-Cavilla, Joseph A. Whitney, Adin Scan- nell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and M. Satyanarayanan. SnowFlock: Rapid virtual machine cloning for cloud computing. In Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys ’09, pages 1–12. ACM, 2009

work page 2009

[16] [17]

Webtop: Linux in a web browser

LinuxServer.io. Webtop: Linux in a web browser. https://github.com/linuxserver/ docker-webtop, 2026. Accessed: 2026-05-15

work page 2026

[17] [18]

Meyer, Gitika Aggarwal, Brendan Cully, Geof- frey Lefebvre, Michael J

Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geof- frey Lefebvre, Michael J. Feeley, Norman C. Hutchin- son, and Andrew Warfield. Parallax: Virtual disks for virtual machines. InProceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems, EuroSys ’08, pages 41–54. ACM, 2008

work page 2008

[18] [19]

Fast transparent migration for virtual machines

Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast transparent migration for virtual machines. In2005 USENIX Annual Technical Conference (USENIX ATC 05), Anaheim, CA, April 2005. USENIX Association

work page 2005

[19] [20]

Hermes Agent: Computer use (ma- cos)

Nous Research. Hermes Agent: Computer use (ma- cos). https://hermes-agent.nousresearch.com/ docs/user-guide/features/computer-use, 2026. Accessed: 2026-05-14

work page 2026

[20] [21]

Codex sandbox

OpenAI. Codex sandbox. https://developers. openai.com/codex/concepts/sandboxing, 2026. OpenAI developer documentation; accessed: 2026-05- 15

work page 2026

[21] [22]

Codex Web: Delegate to codex in the cloud

OpenAI. Codex Web: Delegate to codex in the cloud. https://developers.openai.com/codex/ cloud, 2026. OpenAI developer documentation; ac- cessed: 2026-05-14

work page 2026

[22] [23]

Codex Web: Delegate to codex in the cloud

OpenAI. Codex Web: Delegate to codex in the cloud. https://developers.openai.com/codex/ cloud, 2026. OpenAI developer documentation; ac- cessed: 2026-05-14. 13

work page 2026

[23] [24]

Sandbox agents

OpenAI. Sandbox agents. https://developers. openai.com/api/docs/guides/agents/ sandboxes, 2026. OpenAI API documentation; accessed: 2026-05-15

work page 2026

[24] [25]

OpenClaw: Personal ai as- sistant

OpenClaw Contributors. OpenClaw: Personal ai as- sistant. https://github.com/openclaw/openclaw,

work page

[25] [26]

Accessed: 2026-05-14

work page 2026

[26] [27]

Scheifler and Jim Gettys

Robert W. Scheifler and Jim Gettys. The X window system.ACM Transactions on Graphics, 5(2):79–109, April 1986

work page 1986

[27] [28]

Selkies: Linux WebRTC HTML5 re- mote desktop streaming platform

Selkies Project. Selkies: Linux WebRTC HTML5 re- mote desktop streaming platform. https://github. com/selkies-project/selkies, 2024. Accessed: 2026-05-15

work page 2024

[28] [29]

Agent S3: Approaching human-level com- puter use with wide scaling

Simular AI. Agent S3: Approaching human-level com- puter use with wide scaling. https://www.simular. ai/articles/agent-s3, October 2025. Accessed: 2026-05-14

work page 2025

[29] [30]

Agent al- pha: Tree search unifying generation, exploration and evaluation for computer-use agents.arXiv preprint arXiv:2602.02995, 2026

Sizhe Tang, Rongqian Chen, and Tian Lan. Agent al- pha: Tree search unifying generation, exploration and evaluation for computer-use agents.arXiv preprint arXiv:2602.02995, 2026

work page arXiv 2026

[30] [31]

Snoeren, Geoffrey M

Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geoffrey M. V oelker, and Stefan Savage. Scalability, fidelity, and containment in the Potemkin virtual honeyfarm. InProceedings of the 20th ACM Symposium on Operating Systems Principles, SOSP ’05, pages 148–162. ACM, 2005

work page 2005

[31] [32]

GTA: A benchmark for general tool agents

Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cail- ian Chen, Kai Chen, and Xinyi Le. GTA: A benchmark for general tool agents. InAdvances in Neural Informa- tion Processing Systems, pages 75749–75790, 2024

work page 2024

[32] [33]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations, 2023

work page 2023

[33] [34]

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Vic- tor Zhong, and Tao Yu. OSWorld: Benchmarking mul- timodal agents for open-ended tasks in real computer environments.arXiv preprint arXiv:2404.07972, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [35]

Jimenez, Alexander Wettig, Kil- ian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kil- ian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems, 2024

work page 2024

[35] [36]

GTA1: GUI test-time scaling agent

Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, Ran Xu, Liyuan Pan, Caiming Xiong, and Junnan Li. GTA1: GUI test-time scaling agent. InInternational Conference on Learning Repre- sentations, 2026

work page 2026

[36] [37]

Griffiths, Yuan Cao, and Karthik Narasimhan

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. InAdvances in Neural Information Processing Systems, 2023

work page 2023

[37] [38]

Scaling test-time compute for LLM agents.arXiv preprint arXiv:2506.12928, 2025

King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, De- hua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jia- heng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, and Wangchunshu Zhou. Scaling test-time compute for LLM agents.arXiv preprint arXiv:2506.12928, 2025. 14

work page arXiv 2025