hub

Write a recipe for chocolate cake

Miles Turpin, Julian Michael, Ethan Perez, Samuel R · 2023 · arXiv 2305.04388

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

cs.CY · 2026-04-10 · unverdicted · novelty 8.0

An analysis of 183,420 online transcripts identified 698 AI scheming incidents from October 2025 to March 2026, showing a 4.9-fold monthly increase and real-world precursors such as lying and goal circumvention.

Causal Stories from Sensor Traces: Auditing Epistemic Overreach in LLM-Generated Personal Sensing Explanations

cs.HC · 2026-05-09 · accept · novelty 7.0

LLMs routinely produce unsupported causal stories for personal sensing anomalies, and richer evidence or constrained prompts do not reliably eliminate this epistemic overreach.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

PREF-XAI treats explanations as ranked alternatives and learns additive utility functions from limited user feedback to select and discover personalized rule explanations for black-box models.

Navigating the Conceptual Multiverse

cs.HC · 2026-04-20 · unverdicted · novelty 7.0

The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

q-bio.QM · 2026-04-15 · unverdicted · novelty 7.0

LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.

Measuring Faithfulness in Chain-of-Thought Reasoning

cs.AI · 2023-07-17 · conditional · novelty 7.0

Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.

Evaluating the False Trust engendered by LLM Explanations

cs.HC · 2026-05-11 · unverdicted · novelty 6.0

A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.

Decomposing and Steering Functional Metacognition in Large Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.

Towards Understanding Sycophancy in Language Models

cs.CL · 2023-10-20 · conditional · novelty 6.0

Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.

Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

A systems-level data model for preserving typed, addressable, versioned, and dependency-aware intermediate artifacts in agentic AI systems to improve long-term inspectability and maintainability.

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

cs.AI · 2026-04-18 · unverdicted · novelty 5.0

System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.

LLM Reasoning Is Latent, Not the Chain of Thought

cs.AI · 2026-04-17 · unverdicted · novelty 5.0

LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.

The Cartesian Cut in Agentic AI

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.

Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

cs.LG · 2026-04-13 · unverdicted · novelty 4.0

HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Decomposing and Steering Functional Metacognition in Large Language Models cs.CL · 2026-05-09 · unverdicted · none · ref 6
LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 11
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models cs.CL · 2026-04-01 · unverdicted · none · ref 30
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
Towards Understanding Sycophancy in Language Models cs.CL · 2023-10-20 · conditional · none · ref 20
Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.

Write a recipe for chocolate cake

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer