BiAxisAudit measures LLM bias on two axes—across-prompt sensitivity via factorial grids and within-response divergence via split coding—revealing that task format explains as much variance as model choice and that 63.6% of bias signals appear in only one layer.
PromptBench : Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
LLMs show systematic output-mode collapse on closed-form prompts, with only ~22% of semantically equivalent variants preserving the requested bare-label format across five models and four tasks.
Paraphrase sensitivity in Lean 4 autoformalization arises from compilation failures rather than semantic divergence among successful formalizations.
LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capacity models.
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
A framework with U-statistics and kernel-based metrics quantifies AI agent consistency and robustness, showing trajectory metrics outperform pass@1 rates in diagnosing failures.
A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.
citing papers explorer
-
BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence
BiAxisAudit measures LLM bias on two axes—across-prompt sensitivity via factorial grids and within-response divergence via split coding—revealing that task format explains as much variance as model choice and that 63.6% of bias signals appear in only one layer.
-
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs
LLMs show systematic output-mode collapse on closed-form prompts, with only ~22% of semantically equivalent variants preserving the requested bare-label format across five models and four tasks.
-
Surface Sensitivity in Lean 4 Autoformalization
Paraphrase sensitivity in Lean 4 autoformalization arises from compilation failures rather than semantic divergence among successful formalizations.
-
Measuring Representation Robustness in Large Language Models for Geometry
LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capacity models.
-
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
-
Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability
A framework with U-statistics and kernel-based metrics quantifies AI agent consistency and robustness, showing trajectory metrics outperform pass@1 rates in diagnosing failures.
-
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.