COPAL reveals a 33.1% average error rate on composed-policy queries across nine LLM chatbots, showing that existing single-policy benchmarks miss common failures.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.
citing papers explorer
-
Beyond Single-Policy: Evaluating Composed Organization-Specific Policy Alignment in LLM Chatbots
COPAL reveals a 33.1% average error rate on composed-policy queries across nine LLM chatbots, showing that existing single-policy benchmarks miss common failures.