hub

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

browse 27 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Steering Language Models With Activation Engineering

cs.CL · 2023-08-20 · unverdicted · novelty 7.0

Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

BoolXLLM: LLM-Assisted Explainability for Boolean Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Agreement-based clustering of annotators improves performance on subjective NLP tasks by capturing diverse perspectives better than majority voting or per-annotator modeling.

HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.

APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

APCD reduces LLM hallucinations by expanding decoding paths adaptively when entropy signals uncertainty and by contrasting divergent paths to control their interaction.

Relative Kinetic Utility for Reasoning-Aware Structural Pruning in Large Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

RKU is a curvature-aware structural pruning framework that improves LLM reasoning accuracy at 40% sparsity, reaching 13.34% on GSM8K while outperforming baselines and better preserving out-of-distribution representations.

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.

PaT: Planning-after-Trial for Efficient Test-Time Code Generation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.

Learning Agent Routing From Early Experience

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment

cs.IR · 2026-05-12 · unverdicted · novelty 5.0

HSUGA improves LLM-enhanced sequential recommendation via staged hierarchical semantic understanding for better preference extraction and group-aware alignment that varies intensity by user activity level.

RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

RADAR is a redundancy-aware, query-adaptive framework that uses conditional discrete graph diffusion to generate efficient communication topologies for multi-agent LLM systems, outperforming baselines on six benchmarks with higher accuracy and lower token use.

citing papers explorer

Showing 27 of 27 citing papers.

Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation cs.AI · 2026-05-12 · unverdicted · none · ref 29
AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 61
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 31
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck cs.LG · 2026-05-08 · unverdicted · none · ref 59
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 62
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 113
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 39
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Steering Language Models With Activation Engineering cs.CL · 2023-08-20 · unverdicted · none · ref 69
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 123
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
BoolXLLM: LLM-Assisted Explainability for Boolean Models cs.AI · 2026-05-12 · unverdicted · none · ref 110
BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
Interpretability Can Be Actionable cs.LG · 2026-05-11 · conditional · none · ref 128
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks cs.CL · 2026-05-11 · unverdicted · none · ref 83
Agreement-based clustering of annotators improves performance on subjective NLP tasks by capturing diverse perspectives better than majority voting or per-annotator modeling.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 34
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning cs.CL · 2026-05-11 · unverdicted · none · ref 4
PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation cs.CL · 2026-05-10 · unverdicted · none · ref 7
APCD reduces LLM hallucinations by expanding decoding paths adaptively when entropy signals uncertainty and by contrasting divergent paths to control their interaction.
Relative Kinetic Utility for Reasoning-Aware Structural Pruning in Large Language Models cs.LG · 2026-05-09 · unverdicted · none · ref 1
RKU is a curvature-aware structural pruning framework that improves LLM reasoning accuracy at 40% sparsity, reaching 13.34% on GSM8K while outperforming baselines and better preserving out-of-distribution representations.
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents cs.CL · 2026-05-08 · unverdicted · none · ref 31
Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.
PaT: Planning-after-Trial for Efficient Test-Time Code Generation cs.CL · 2026-05-08 · unverdicted · none · ref 10
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
Learning Agent Routing From Early Experience cs.CL · 2026-05-08 · unverdicted · none · ref 14
BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling cs.AI · 2026-05-04 · unverdicted · none · ref 23
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 256
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 19
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment cs.IR · 2026-05-12 · unverdicted · none · ref 31
HSUGA improves LLM-enhanced sequential recommendation via staged hierarchical semantic understanding for better preference extraction and group-aware alignment that varies intensity by user activity level.
RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation cs.AI · 2026-05-11 · unverdicted · none · ref 10
RADAR is a redundancy-aware, query-adaptive framework that uses conditional discrete graph diffusion to generate efficient communication topologies for multi-agent LLM systems, outperforming baselines on six benchmarks with higher accuracy and lower token use.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 11 · 2 links
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate cs.CL · 2023-05-30 · conditional · none · ref 18
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection cs.AI · 2026-05-10 · conditional · none · ref 8 · 2 links
VulTriage combines control dependency extraction, CWE knowledge retrieval, and semantic summarization to improve LLM accuracy on vulnerability detection, reaching SOTA on PrimeVul and generalizing to Kotlin.

Advances in neural information processing systems , volume=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer