Self-Reflection in Large Language Model Agents: Effects on Problem-Solving Performance

Matthew Renze, Erhan Guven · 2024 · arXiv 3129.2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

cs.AI · 2026-03-25 · conditional · novelty 7.0

Metacognitive self- and co-regulation loops improve LLM agent performance in engineering design by mitigating fixation and enabling better exploration of design options.

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

IntentScore learns intent-conditioned action scores from offline GUI trajectories and raises task success by 6.9 points on an unseen agent and environment.

LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation

cs.SE · 2026-04-16 · unverdicted · novelty 5.0

A RAG-enhanced LLM pipeline with segmentation improves C-to-Rust transpilation correctness and eliminates raw pointer dereferences and unsafe type casts in several Coreutils programs.

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.

citing papers explorer

Showing 5 of 5 citing papers.

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design cs.AI · 2026-03-25 · conditional · none · ref 47
Metacognitive self- and co-regulation loops improve LLM agent performance in engineering design by mitigating fixation and enabling better exploration of design options.
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition cs.AI · 2026-04-20 · unverdicted · none · ref 45
Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.
IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents cs.AI · 2026-04-06 · unverdicted · none · ref 14
IntentScore learns intent-conditioned action scores from offline GUI trajectories and raises task success by 6.9 points on an unseen agent and environment.
LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation cs.SE · 2026-04-16 · unverdicted · none · ref 2
A RAG-enhanced LLM pipeline with segmentation improves C-to-Rust transpilation correctness and eliminates raw pointer dereferences and unsafe type casts in several Coreutils programs.
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking cs.AI · 2026-04-09 · unverdicted · none · ref 28
SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.

Self-Reflection in Large Language Model Agents: Effects on Problem-Solving Performance

fields

years

verdicts

representative citing papers

citing papers explorer