citation dossier

arXiv preprint arXiv:2305.17144 , year=

Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, et al · 2023 · arXiv 2305.17144

16Pith papers citing it

17reference links

cs.AItop field · 5 papers

UNVERDICTEDtop verdict bucket · 13 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.AI (5 papers). The largest review-status bucket among citing papers is UNVERDICTED (13 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

AIDA is the first end-to-end autonomous agent that combines a domain-specific language with Pareto-guided reinforcement learning to discover insights from complex business data.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.

2.5-D Decomposition for LLM-Based Spatial Construction

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

2.5-D decomposition lets LLMs achieve 94.6% structural accuracy on a building benchmark by handling only horizontal planning while a symbolic system manages vertical placements from occupancy.

Long-Term Memory for VLA-based Agents in Open-World Task Execution

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.

SkillDroid: Compile Once, Reuse Forever

cs.HC · 2026-04-16 · conditional · novelty 6.0

SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.

RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

RPA-Check is a new multi-stage framework using dimension definition, boolean checklist augmentation, semantic filtering, and LLM-as-judge verification to assess role-playing agents, with tests on a legal training game showing smaller instruction-tuned models can be more consistent than larger ones.

MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models

cs.SE · 2026-04-09 · unverdicted · novelty 6.0

MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.

Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game

cs.MA · 2026-04-21 · unverdicted · novelty 5.0

Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.

Experience Transfer for Multimodal LLM Agents in Minecraft Game

cs.AI · 2026-04-07 · unverdicted · novelty 5.0

Echo framework enables experience transfer for multimodal LLM agents in Minecraft by decomposing knowledge into structure, attribute, process, function, and interaction dimensions and applying in-context analogy learning, achieving 1.3x-1.7x speedup on object-unlocking tasks with burst-like chain-un

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

cs.CV · 2023-12-21 · unverdicted · novelty 5.0

InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

cs.CL · 2023-05-30 · conditional · novelty 5.0

Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 4.0

The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.

Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

cs.AI · 2026-04-22 · unverdicted · novelty 4.0

SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

citing papers explorer

Showing 16 of 16 citing papers.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 21
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent cs.AI · 2026-05-08 · unverdicted · none · ref 17 · 2 links
AIDA is the first end-to-end autonomous agent that combines a domain-specific language with Pareto-guided reinforcement learning to discover insights from complex business data.
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory cs.CL · 2026-04-29 · unverdicted · none · ref 31
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 94
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
2.5-D Decomposition for LLM-Based Spatial Construction cs.AI · 2026-05-08 · unverdicted · none · ref 12
2.5-D decomposition lets LLMs achieve 94.6% structural accuracy on a building benchmark by handling only horizontal planning while a symbolic system manages vertical placements from occupancy.
Long-Term Memory for VLA-based Agents in Open-World Task Execution cs.RO · 2026-04-17 · unverdicted · none · ref 22
ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.
SkillDroid: Compile Once, Reuse Forever cs.HC · 2026-04-16 · conditional · none · ref 31
SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents cs.CL · 2026-04-13 · unverdicted · none · ref 28
RPA-Check is a new multi-stage framework using dimension definition, boolean checklist augmentation, semantic filtering, and LLM-as-judge verification to assess role-playing agents, with tests on a legal training game showing smaller instruction-tuned models can be more consistent than larger ones.
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models cs.SE · 2026-04-09 · unverdicted · none · ref 29
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.
Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game cs.MA · 2026-04-21 · unverdicted · none · ref 48
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
Experience Transfer for Multimodal LLM Agents in Minecraft Game cs.AI · 2026-04-07 · unverdicted · none · ref 61
Echo framework enables experience transfer for multimodal LLM agents in Minecraft by decomposing knowledge into structure, attribute, process, function, and interaction dimensions and applying in-context analogy learning, achieving 1.3x-1.7x speedup on object-unlocking tasks with burst-like chain-un
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks cs.CV · 2023-12-21 · unverdicted · none · ref 189
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate cs.CL · 2023-05-30 · conditional · none · ref 86
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 31
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning cs.AI · 2026-04-22 · unverdicted · none · ref 11
SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 173
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

arXiv preprint arXiv:2305.17144 , year=

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer