hub

Llm as a mastermind: A survey of strategic reasoning with large language models

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei · 2024 · arXiv 2404.01230

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

cs.CL · 2026-06-17 · unverdicted · novelty 7.0

MAFP applies fictitious play to LLM multi-agent systems to resolve stance entanglement in competitive decision-making, outperforming single-round and multi-round baselines on tournament strength and robustness.

Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

CEO-Bench evaluates LLMs on CEO-level strategic resource reallocation via multi-role agent simulations, showing high structural validity but sharp divergence on strategic calibration across five frontier models on 13 scenarios.

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

cs.AI · 2025-06-04 · unverdicted · novelty 7.0

Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

Retrieval-Augmented Generation for Natural Language Processing: A Survey

cs.CL · 2024-07-18 · accept · novelty 6.0

The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.

Scaling Synthetic Data Creation with 1,000,000,000 Personas

cs.CL · 2024-06-28 · unverdicted · novelty 6.0

A curated set of one billion personas enables scalable, diverse synthetic data generation for LLM training across reasoning, instructions, knowledge, NPCs, and tools.

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.

StaRPO: Stability-Augmented Reinforcement Policy Optimization

cs.AI · 2026-04-10 · unverdicted · novelty 5.0

StaRPO improves LLM reasoning by adding autocorrelation function and path efficiency stability metrics to RL policy optimization, yielding higher accuracy and fewer logic errors on reasoning benchmarks.

Train the Trainers -- An Agentic AI Framework for Peer-Based Mental Health Support in Battlefield Environments

cs.HC · 2026-03-31 · unverdicted · novelty 5.0

The paper introduces an agentic AI platform to train and support recovered soldiers as peer facilitators providing mental health triage and interventions in austere battlefield environments.

SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model

cs.AI · 2026-05-08 · unverdicted · novelty 4.0

SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

cs.AI · 2026-04-07 · unverdicted · novelty 3.0

Flowr is an agentic AI framework that decomposes retail supply chain workflows into coordinated LLM-based agents with human-in-the-loop oversight to automate operations in large supermarket chains.

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

cs.CL · 2026-03-18 · unverdicted · novelty 3.0

Zero-shot prompting reaches 59% accuracy at moderate temperatures while chain-of-thought prompting excels at temperature extremes on Olympiad-level math problems, with extended reasoning gains scaling to 14.3x at high temperature.

HR-Agents: Using Multiple LLM-based Agents to Improve Q&A about Brazilian Labor Legislation

cs.IR · 2026-03-13 · unverdicted · novelty 3.0

A multi-agent LLM system using CrewAI and RAG improves response coherence and correctness over a single-LLM RAG baseline for Brazilian labor law Q&A.

Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy

cs.AI · 2025-04-18 · unverdicted · novelty 3.0

A parallel compliance architecture using multi-stage LLM retrieval improves correctness and reasoning quality over a baseline for OT cybersecurity compliance queries in a railway case study.

From System 1 to System 2: A Survey of Reasoning Large Language Models

cs.AI · 2025-02-24 · accept · novelty 3.0

The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.

When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems

cs.MA · 2026-01-15

citing papers explorer

Showing 17 of 17 citing papers.

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play cs.CL · 2026-06-17 · unverdicted · none · ref 62
MAFP applies fictitious play to LLM multi-agent systems to resolve stance entanglement in competitive decision-making, outperforming single-round and multi-round baselines on tournament strength and robustness.
Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation cs.AI · 2026-06-16 · unverdicted · none · ref 24
CEO-Bench evaluates LLMs on CEO-level strategic resource reallocation via multi-role agent simulations, showing high structural validity but sharp divergence on strategic calibration across five frontier models on 13 scenarios.
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games cs.AI · 2025-06-04 · unverdicted · none · ref 92
Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training cs.CL · 2026-06-04 · unverdicted · none · ref 11
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 252
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Retrieval-Augmented Generation for Natural Language Processing: A Survey cs.CL · 2024-07-18 · accept · none · ref 198
The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.
Scaling Synthetic Data Creation with 1,000,000,000 Personas cs.CL · 2024-06-28 · unverdicted · none · ref 28
A curated set of one billion personas enables scalable, diverse synthetic data generation for LLM training across reasoning, instructions, knowledge, NPCs, and tools.
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces cs.CL · 2026-06-05 · unverdicted · none · ref 121
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
StaRPO: Stability-Augmented Reinforcement Policy Optimization cs.AI · 2026-04-10 · unverdicted · none · ref 40
StaRPO improves LLM reasoning by adding autocorrelation function and path efficiency stability metrics to RL policy optimization, yielding higher accuracy and fewer logic errors on reasoning benchmarks.
Train the Trainers -- An Agentic AI Framework for Peer-Based Mental Health Support in Battlefield Environments cs.HC · 2026-03-31 · unverdicted · none · ref 17
The paper introduces an agentic AI platform to train and support recovered soldiers as peer facilitators providing mental health triage and interventions in austere battlefield environments.
SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model cs.AI · 2026-05-08 · unverdicted · none · ref 44
SOM uses a Structural Causal Model to create an explicit graph of opponent observation-to-action links, allowing LLMs to reason along those paths for more accurate and stable predictions in multi-agent settings.
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains cs.AI · 2026-04-07 · unverdicted · none · ref 18
Flowr is an agentic AI framework that decomposes retail supply chain workflows into coordinated LLM-based agents with human-in-the-loop oversight to automate operations in large supermarket chains.
Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models cs.CL · 2026-03-18 · unverdicted · none · ref 5
Zero-shot prompting reaches 59% accuracy at moderate temperatures while chain-of-thought prompting excels at temperature extremes on Olympiad-level math problems, with extended reasoning gains scaling to 14.3x at high temperature.
HR-Agents: Using Multiple LLM-based Agents to Improve Q&A about Brazilian Labor Legislation cs.IR · 2026-03-13 · unverdicted · none · ref 29
A multi-agent LLM system using CrewAI and RAG improves response coherence and correctness over a single-LLM RAG baseline for Brazilian labor law Q&A.
Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy cs.AI · 2025-04-18 · unverdicted · none · ref 20
A parallel compliance architecture using multi-stage LLM retrieval improves correctness and reasoning quality over a baseline for OT cybersecurity compliance queries in a railway case study.
From System 1 to System 2: A Survey of Reasoning Large Language Models cs.AI · 2025-02-24 · accept · none · ref 58
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems cs.MA · 2026-01-15 · unreviewed · ref 59

Llm as a mastermind: A survey of strategic reasoning with large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer