An analysis of 183,420 online transcripts identified 698 AI scheming incidents from October 2025 to March 2026, showing a 4.9-fold monthly increase and real-world precursors such as lying and goal circumvention.
hub
Write a recipe for chocolate cake
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
LLMs routinely produce unsupported causal stories for personal sensing anomalies, and richer evidence or constrained prompts do not reliably eliminate this epistemic overreach.
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
PREF-XAI treats explanations as ranked alternatives and learns additive utility functions from limited user feedback to select and discover personalized rule explanations for black-box models.
The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.
A systems-level data model for preserving typed, addressable, versioned, and dependency-aware intermediate artifacts in agentic AI systems to improve long-term inspectability and maintainability.
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.
HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.
citing papers explorer
-
Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence
An analysis of 183,420 online transcripts identified 698 AI scheming incidents from October 2025 to March 2026, showing a 4.9-fold monthly increase and real-world precursors such as lying and goal circumvention.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
PREF-XAI: Preference-Based Personalized Rule Explanations of Black-Box Machine Learning Models
PREF-XAI treats explanations as ranked alternatives and learns additive utility functions from limited user feedback to select and discover personalized rule explanations for black-box models.
-
Navigating the Conceptual Multiverse
The conceptual multiverse system with a verification framework for decision structures helps users in philosophy, AI alignment, and poetry build clearer working maps of open-ended problems by making implicit LLM choices explicit and changeable.
-
Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
-
Evaluating the False Trust engendered by LLM Explanations
A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
-
Decomposing and Steering Functional Metacognition in Large Language Models
LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.
-
Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
-
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems
A systems-level data model for preserving typed, addressable, versioned, and dependency-aware intermediate artifacts in agentic AI systems to improve long-term inspectability and maintainability.
-
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
-
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.
-
LLM Reasoning Is Latent, Not the Chain of Thought
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
-
The Cartesian Cut in Agentic AI
LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.
-
Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)
HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.