Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns
Pith reviewed 2026-06-27 01:09 UTC · model grok-4.3
The pith
SkillMigrator reuses web skills across sites by matching layout structures rather than instructions or domains, reducing LLM actions 8-10% at matched success rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SkillMigrator induces skills from trajectories and stores each as a TIP consisting of the skill plus a structural sketch of the page at induction time. At test time the system retrieves TIPs whose sketches match the current page layout and grounds the skill references on the live snapshot. The approach keeps accessibility snapshots and primitive tool calling fixed. On both WebArena and Mind2Web it achieves an 8-10% reduction in LLM-action count on successful trajectories while holding success rate constant.
What carries the argument
Transferable Interaction Pattern (TIP): a skill stored together with a structural sketch of the page layout at induction time, enabling layout-similarity retrieval and live-page reference grounding.
If this is right
- Skills induced on one set of sites become usable on held-out sites without retraining or domain metadata.
- Fewer LLM actions per successful trajectory directly reduces the number of model completions required.
- Layout matching works even when element IDs and visible text differ between induction and test pages.
- The method integrates with existing accessibility-based observations and fixed tool sets without changing the agent loop.
- Maintaining success rate while shortening trajectories implies lower average horizon length on the benchmarks.
Where Pith is reading between the lines
- If sketches remain stable under minor UI changes, the same TIP library could support ongoing reuse within evolving sites over time.
- Layout-based indexing could apply to other structured interfaces that expose hierarchical or spatial layouts beyond the web.
- Combining layout retrieval with instruction similarity might raise reuse rates above what either signal achieves alone.
- The result suggests structural invariance can be a stronger transfer signal than semantic similarity for many web tasks.
Load-bearing premise
Structural sketches of page layouts at skill induction time stay similar enough across different sites to support accurate retrieval and correct reference grounding even when content and element identities change.
What would settle it
A collection of held-out sites where equivalent skills produce structurally dissimilar sketches, resulting in retrieval failures or grounding errors and no net reduction in action count.
Figures
read the original abstract
Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such as Mind2Web and WebArena. Recent systems therefore wrap repeated interaction fragments as web skills: callable tools built from successful trajectories or induced programs, so one call can replace several primitives. However, prior skill libraries are still triggered mainly by instruction similarity or coarse site metadata, which yields low skill reuse on held-out sites and leaves much of the potential step and token reduction on the table. We present SkillMigrator, an agent that learns reusable web skills and transfers them across sites by matching layout structure rather than specific element references. Each induced skill is stored as a transferable interaction pattern (TIP): the skill paired with a structural sketch of the snapshot at induction time. At test time, SkillMigrator retrieves TIPs by layout similarity and grounds their references on the live page. The rest of the stack is standard: accessibility-snapshot observations with stable references, and fixed tool calling over primitives plus skill invocations. Compared with the state-of-the-art approaches, SkillMigrator reduces the average LLM-action count on successful trajectories by 8-10% across both WebArena and Mind2Web at matched success rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SkillMigrator, which induces web skills as Transferable Interaction Patterns (TIPs) stored with structural sketches of page layouts at induction time. At test time, TIPs are retrieved by layout similarity and grounded on the live page using accessibility snapshots and standard tool calling; the central claim is an 8-10% reduction in average LLM-action count on successful trajectories at matched success rate on WebArena and Mind2Web relative to prior skill libraries triggered by instruction similarity or site metadata.
Significance. If the reduction is robust and attributable to the layout-based mechanism, the work would offer a concrete improvement in efficiency for LLM web agents by enabling higher skill reuse across held-out sites, directly addressing the low reuse rates noted for existing approaches.
major comments (2)
- [Abstract] Abstract: the 8-10% action-count reduction at matched success rate is stated without experimental details, baselines, variance, or description of how success-rate matching was performed, so the central empirical result cannot be assessed from the provided text.
- [Method (TIP retrieval)] Method section on TIP retrieval and grounding: the claim that gains arise specifically from layout-similarity retrieval (rather than simply having more skills or different prompting) rests on the untested assumption that structural sketches are invariant and discriminative across content and element-ID changes; no retrieval-precision metrics, false-positive rates, or ablation removing the layout matcher are supplied to support this.
minor comments (1)
- [Abstract] Abstract: 'state-of-the-art approaches' is referenced but the specific baselines are not named.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve the presentation of results and strengthen the supporting evidence for the core mechanism.
read point-by-point responses
-
Referee: [Abstract] Abstract: the 8-10% action-count reduction at matched success rate is stated without experimental details, baselines, variance, or description of how success-rate matching was performed, so the central empirical result cannot be assessed from the provided text.
Authors: We agree that the abstract would benefit from additional context. In the revised manuscript we will expand the abstract to briefly specify the baselines (prior skill libraries using instruction similarity or site metadata), the two benchmarks (WebArena and Mind2Web), the success-rate matching procedure (reporting average action counts on successful trajectories at comparable overall success rates), and note that variance is reported in the main results tables. revision: yes
-
Referee: [Method (TIP retrieval)] Method section on TIP retrieval and grounding: the claim that gains arise specifically from layout-similarity retrieval (rather than simply having more skills or different prompting) rests on the untested assumption that structural sketches are invariant and discriminative across content and element-ID changes; no retrieval-precision metrics, false-positive rates, or ablation removing the layout matcher are supplied to support this.
Authors: The reported gains are obtained by comparing SkillMigrator against prior skill libraries that trigger on instruction similarity or site metadata; the performance difference is therefore attributable to the change in retrieval mechanism. We nevertheless recognize that explicit diagnostics would strengthen the argument. In the revision we will add retrieval-precision and false-positive metrics for the layout matcher together with an ablation that disables layout similarity while keeping the rest of the skill library and prompting unchanged. revision: yes
Circularity Check
No circularity; empirical benchmark result with no derivation chain
full rationale
The paper reports an empirical performance gain (8-10% reduction in LLM-action count at matched success rate on WebArena and Mind2Web) from a system that retrieves TIPs via layout similarity. No equations, fitted parameters, or mathematical derivations appear in the abstract or described claims. The central result is presented as a benchmark outcome rather than a quantity derived from inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are evident. The derivation chain is therefore self-contained as an engineering evaluation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Webshop: Towards scalable real-world web interaction with grounded language agents.Advances in Neural Information Processing Systems, 35:20744–20757, 2022
Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents.Advances in Neural Information Processing Systems, 35:20744–20757, 2022
2022
-
[2]
Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023
2023
-
[3]
Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. InInternational Conference on Learning Representations, 2024
2024
-
[4]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023
2023
-
[5]
Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, and Huzefa Rangwala. Agentoccam: A simple yet strong baseline for llm-based web agents.arXiv preprint arXiv:2410.13825, 2024
arXiv 2024
-
[6]
Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. Agent workflow memory. arXiv preprint arXiv:2409.07429, 2024
Pith/arXiv arXiv 2024
-
[7]
Inducing programmatic skills for agentic tasks.arXiv preprint arXiv:2504.06821, 2025
Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig, and Daniel Fried. Inducing programmatic skills for agentic tasks.arXiv preprint arXiv:2504.06821, 2025
arXiv 2025
-
[8]
Boyuan Zheng, Michael Y . Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, and Yu Su. Skillweaver: Web agents can self-improve by discovering and honing skills.arXiv preprint arXiv:2504.07079, 2025
Pith/arXiv arXiv 2025
-
[9]
WALT: Web agents that learn tools.arXiv preprint arXiv:2510.01524, 2025
Viraj Prabhu, Yutong Dai, Matthew Fernandez, Jing Gu, Krithika Ramakrishnan, Yanqi Luo, Silvio Savarese, Caiming Xiong, Junnan Li, Zeyuan Chen, and Ran Xu. WALT: Web agents that learn tools.arXiv preprint arXiv:2510.01524, 2025
arXiv 2025
-
[10]
Webxskill: Skill learning for autonomous web agents.arXiv preprint arXiv:2604.13318, 2026
Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, et al. Webxskill: Skill learning for autonomous web agents.arXiv preprint arXiv:2604.13318, 2026
Pith/arXiv arXiv 2026
-
[11]
Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Lianyong Qi, and Shi Jin. Contractskill: Repairable contract-based skills for multimodal web agents.arXiv preprint arXiv:2603.20340, 2026
Pith/arXiv arXiv 2026
-
[12]
PolySkill: Learning generalizable skills through polymorphic abstraction for continual learning
Simon Yu, Gang Li, Weiyan Shi, and Peng Qi. PolySkill: Learning generalizable skills through polymorphic abstraction for continual learning. InInternational Conference on Learning Representations, 2026
2026
-
[13]
Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert, Megh Thakkar, Quentin Cappart, David Vazquez, et al. Workarena: How capable are web agents at solving common knowledge work tasks?arXiv preprint arXiv:2403.07718, 2024
Pith/arXiv arXiv 2024
-
[14]
Webvoyager: Building an end-to-end web agent with large multimodal models
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu. Webvoyager: Building an end-to-end web agent with large multimodal models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6864–6890, 2024
2024
-
[15]
Visualwebarena: Evaluating multimodal agents on realistic visual web tasks
Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. Visualwebarena: Evaluating multimodal agents on realistic visual web tasks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024. 10
2024
-
[16]
Sutton, Doina Precup, and Satinder Singh
Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence, 112(1–2): 181–211, 1999
1999
-
[17]
Sentence-BERT: Sentence embeddings using siamese BERT- networks
Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese BERT- networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3982–3992, 2019
2019
-
[18]
Coach: A coarse-to-fine approach for cross-domain slot filling
Zihan Liu, Genta Indra Winata, Peng Xu, and Pascale Fung. Coach: A coarse-to-fine approach for cross-domain slot filling. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 19–25, 2020. URL https://aclanthology.org/ 2020.acl-main.3/
2020
-
[19]
Bridge to target domain by prototypical contrastive learning and label confusion: Re-explore zero-shot learning for slot filling
Liwen Wang, Xuefeng Li, Jiachi Liu, Keqing He, Yuanmeng Yan, and Weiran Xu. Bridge to target domain by prototypical contrastive learning and label confusion: Re-explore zero-shot learning for slot filling. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9474–9480, 2021. URL https://aclanthology. org/...
2021
-
[20]
The tree-to-tree correction problem.Journal of the ACM, 26(3):422–433, 1979
Kuo-Chung Tai. The tree-to-tree correction problem.Journal of the ACM, 26(3):422–433, 1979
1979
-
[21]
Tree edit distance: Robust and memory-efficient
Mateusz Pawlik and Nikolaus Augsten. Tree edit distance: Robust and memory-efficient. Information Systems, 56:157–173, 2016
2016
-
[22]
Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955
1955
-
[23]
Algorithms for the assignment and transportation problems.Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957
James Munkres. Algorithms for the assignment and transportation problems.Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957
1957
-
[24]
Towards zero-shot frame semantic parsing for domain scaling
Ankur Bapna, Gokhan Tür, Dilek Hakkani-Tür, and Larry Heck. Towards zero-shot frame semantic parsing for domain scaling. InInterspeech 2017, pages 2476–2480, 2017. doi: 10.21437/Interspeech.2017-518
-
[25]
Littman, and Anthony R
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1–2):99–134, 1998
1998
-
[26]
World of bits: An open-domain platform for web-based agents
Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, and Percy Liang. World of bits: An open-domain platform for web-based agents. InInternational Conference on Machine Learning, pages 3135–3144. PMLR, 2017
2017
-
[27]
Tianci Xue, Weijian Qi, Tianneng Shi, Chan Hee Song, Boyu Gou, Dawn Song, Huan Sun, and Yu Su. An illusion of progress? assessing the current state of web agents.arXiv preprint arXiv:2504.01382, 2025
arXiv 2025
-
[28]
Dawei Gao, Zitao Li, Yuexiang Xie, Weirui Kuang, Liuyi Yao, Bingchen Qian, Zhijian Ma, Yue Cui, Haohao Luo, Shen Li, Lu Yi, Yi Yu, Shiqi He, Zhiling Luo, Wenmeng Zhou, Zhicheng Zhang, Xuguang He, Ziqian Chen, Weikai Liao, Farruh Isakulovich Kushnazarov, Yaliang Li, Bolin Ding, and Jingren Zhou. Agentscope 1.0: A developer-centric framework for building ag...
arXiv 2025
-
[29]
Shiqi He, Yue Cui, Xinyu Ma, Yaliang Li, Bolin Ding, and Mosharaf Chowdhury. Branch-and- browse: Efficient and controllable web exploration with tree-structured reasoning and action memory.arXiv preprint arXiv:2510.19838, 2025
Pith/arXiv arXiv 2025
-
[30]
Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, and Ruslan Salakhutdinov. Odysseys: Benchmarking web agents on realistic long horizon tasks.arXiv preprint arXiv:2604.24964, 2026
Pith/arXiv arXiv 2026
-
[31]
Molmoweb: Open visual web agent and open data for the open web.arXiv preprint arXiv:2604.08516, 2026
Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, et al. Molmoweb: Open visual web agent and open data for the open web.arXiv preprint arXiv:2604.08516, 2026. 11 Appendix A Detailed Problem Formulation This appendix expands the description of the web-agent setting t...
Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.