An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
citation dossier
arXiv preprint arXiv:2305.17144 , year=
why this work matters in Pith
Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.AI (5 papers). The largest review-status bucket among citing papers is UNVERDICTED (13 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
representative citing papers
AIDA is the first end-to-end autonomous agent that combines a domain-specific language with Pareto-guided reinforcement learning to discover insights from complex business data.
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
2.5-D decomposition lets LLMs achieve 94.6% structural accuracy on a building benchmark by handling only horizontal planning while a symbolic system manages vertical placements from occupancy.
ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.
SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.
RPA-Check is a new multi-stage framework using dimension definition, boolean checklist augmentation, semantic filtering, and LLM-as-judge verification to assess role-playing agents, with tests on a legal training game showing smaller instruction-tuned models can be more consistent than larger ones.
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
Echo framework enables experience transfer for multimodal LLM agents in Minecraft by decomposing knowledge into structure, attribute, process, function, and interaction dimensions and applying in-context analogy learning, achieving 1.3x-1.7x speedup on object-unlocking tasks with burst-like chain-un
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
citing papers explorer
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent
AIDA is the first end-to-end autonomous agent that combines a domain-specific language with Pareto-guided reinforcement learning to discover insights from complex business data.
-
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
-
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
-
2.5-D Decomposition for LLM-Based Spatial Construction
2.5-D decomposition lets LLMs achieve 94.6% structural accuracy on a building benchmark by handling only horizontal planning while a symbolic system manages vertical placements from occupancy.
-
Long-Term Memory for VLA-based Agents in Open-World Task Execution
ChemBot adds dual-layer memory and future-state asynchronous inference to VLA models, enabling better long-horizon success in chemical lab automation on collaborative robots.
-
SkillDroid: Compile Once, Reuse Forever
SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.
-
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents
RPA-Check is a new multi-stage framework using dimension definition, boolean checklist augmentation, semantic filtering, and LLM-as-judge verification to assess role-playing agents, with tests on a legal training game showing smaller instruction-tuned models can be more consistent than larger ones.
-
MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models
MIMIC-Py provides a modular Python framework that turns personality-driven LLM agents into an extensible system for automated game testing via configurable traits, decoupled components, and multiple interaction methods.
-
Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
-
Experience Transfer for Multimodal LLM Agents in Minecraft Game
Echo framework enables experience transfer for multimodal LLM agents in Minecraft by decomposing knowledge into structure, attribute, process, function, and interaction dimensions and applying in-context analogy learning, achieving 1.3x-1.7x speedup on object-unlocking tasks with burst-like chain-un
-
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.
-
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
-
Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning
SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.