Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Anish Natekar; Ashutosh Ranjan; Kevin Kingslin; Savita Bhat; Shirish Karande; Vivek Srivastava

arxiv: 2606.28294 · v1 · pith:YUNLFGQVnew · submitted 2026-06-26 · 💻 cs.LG · cs.MA

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Kevin Kingslin , Anish Natekar , Ashutosh Ranjan , Vivek Srivastava , Savita Bhat , Shirish Karande This is my paper

Pith reviewed 2026-06-29 04:07 UTC · model grok-4.3

classification 💻 cs.LG cs.MA

keywords inverse constitutional aipreference alignmentpersona debatesteering principlesllm judgescreative benchmarksdemocratic icaipairwise preferences

0 comments

The pith

Structured persona debates among LLMs extract richer steering principles from pairwise preferences than single-pass summaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Preference alignment methods often convert human choices into natural-language principles to make AI decisions more interpretable, yet single-pass summaries lose the multiple competing factors behind each judgment. Democratic ICAI replaces the single summary step with a structured debate in which distinct personas generate and contest rationales for the same preference pair. The resulting set of rationales feeds into principle extraction and then into both LLM and decision-tree decision models. On creative-task benchmarks the extracted principles produce higher accuracy when predicting held-out preferences and receive stronger approval from separate LLM annotators than baselines that use deliberative prompting or direct principle induction.

Core claim

By collecting multiple competing rationales through structured persona debate instead of a single summarization pass, Democratic ICAI derives steering principles that yield a more faithful model of the underlying preference structure, improving average prediction accuracy across tasks while generating constitutions that LLM annotators rate more highly than those from deliberative or principle-based baselines.

What carries the argument

Structured persona debate that assembles multiple competing rationales for each pairwise preference before principle extraction.

If this is right

Steering principles derived from the debate improve average preference prediction accuracy on MuCE-Pref and LiTBench relative to deliberative prompting and principle-based baselines.
The same principles produce constitutions that LLM annotators prefer over those from the compared baselines.
The derived principles can be used by both LLM-based judges and decision-tree judges to model decisions.
The gains appear across multiple categories of creative tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on non-creative domains such as safety or ethical dilemmas to check whether debate richness remains beneficial outside the evaluated benchmarks.
If the debate step scales linearly with the number of personas, it may offer a practical route to richer signals when preference datasets grow large.
The approach implicitly treats persona diversity as a proxy for human viewpoint diversity, which could be checked by comparing debate outputs against actual multi-human rationales on the same pairs.

Load-bearing premise

Structured persona debate between LLMs produces a broader and more expressive account of the factors behind each comparison than single-pass summarization without introducing simulation artifacts.

What would settle it

A blind evaluation in which human raters or held-out preference data show no accuracy gain or no preference for the constitutions produced by the debate method over the single-pass baseline.

Figures

Figures reproduced from arXiv: 2606.28294 by Anish Natekar, Ashutosh Ranjan, Kevin Kingslin, Savita Bhat, Shirish Karande, Vivek Srivastava.

**Figure 1.** Figure 1: Architecture of Democratic ICAI. A committee of domain-expert personas first generates detailed rationales for each preference pair. These rationales are then subjected to an adversarial debate procedure, through which the evaluative principles relevant to each comparison are surfaced. Finally, the full collection of principles is clustered and abstracted to draft a concise, human-readable constitution. ev… view at source ↗

**Figure 2.** Figure 2: Distribution of average semantic distance between principles within a constitution. For each method, the distance is computed as the average cosine distance of a principle from all other principles in the constitution. Lower values indicate reduced diversity (narrower constitutional scope), while higher values reflect greater conceptual separation and normative breadth. To transform this large collection i… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of constitution for Stories (GPT-4o). ICAI repeatedly emphasizes overlapping character-growth, emotion, reflection, and description criteria. In contrast, Democratic ICAI distributes principles across character, moral reasoning, narrative structure, setting, dialogue, symbolism, tone, and interpretation, indicating broader coverage. Task CoT CoT-SC ToT Self-Refine AutoRubric ICAI DIC… view at source ↗

**Figure 4.** Figure 4: Comparison of Democratic ICAI and ICAI across five dimensions using Qwen-2.5-32B. Each subplot reports preference shares across ten datasets. Democratic ICAI consistently outperforms ICAI on structural criteria such as generality and coherence, while remaining competitive on feasibility [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Screenwriter expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Professor of literature expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Literary critic expert persona prompt for reasoning and debate agents. [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Innovation consultant expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Design engineer expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Reasoning assembly prompt using chain of thought strategy. [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Reasoning assembly prompt using reflective justification strategy. [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Reasoning assembly prompt using self consistency strategy. [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

**Figure 13.** Figure 13: Chain of Thought Prompt [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 14.** Figure 14: Chain of Thought Self Consistency Prompt [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗

**Figure 15.** Figure 15: Tree of Thought Thought Generation Prompt [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: Tree of Thought Thought Evaluation Prompt [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗

**Figure 17.** Figure 17: Self Refine Initial Generation Prompt G.4.2 Feedback Prompt You are the FEEDBACK module in an iterative SELF-REFINE loop. Provide actionable feedback to improve the decision quality and calibration. Evaluate the decision along these aspects: - Helpfulness (0–5) [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗

**Figure 18.** Figure 18: Self Refine Feedback Prompt G.4.3 Refinement Prompt You are the REFINE module in an iterative SELF-REFINE loop. Use the feedback to produce an improved final decision. Return ONLY valid JSON (no markdown, no extra text) in this schema: {{ "choice": "A" or "B", "justification": "2–4 sentences explaining your choice" }} Task Category: {task} User Request: {inp} Response A: {a} Response B: {b} Current decisi… view at source ↗

**Figure 19.** Figure 19: Self Refine Refinement Prompt [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗

**Figure 20.** Figure 20: Judge agent system prompt for parliamentary debate. [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗

**Figure 21.** Figure 21: Debater system prompt for parliamentary debate. [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: Comparison of Democratic ICAI and ICAI across five dimensions under GPT-4o (left column) and [PITH_FULL_IMAGE:figures/full_fig_p031_22.png] view at source ↗

**Figure 23.** Figure 23: Prompt used for comparative analysis of constitutions. [PITH_FULL_IMAGE:figures/full_fig_p032_23.png] view at source ↗

**Figure 24.** Figure 24: LLM prompt for feature table construction. [PITH_FULL_IMAGE:figures/full_fig_p033_24.png] view at source ↗

**Figure 25.** Figure 25: ICAI prompt for annotating according to constitution (Alpaca Eval variant). [PITH_FULL_IMAGE:figures/full_fig_p033_25.png] view at source ↗

**Figure 26.** Figure 26: Example of a story from LitBench [PITH_FULL_IMAGE:figures/full_fig_p034_26.png] view at source ↗

**Figure 27.** Figure 27: Example of a story from LitBench [PITH_FULL_IMAGE:figures/full_fig_p035_27.png] view at source ↗

**Figure 28.** Figure 28: Distribution of average semantic distance across tasks for ICAI and Democratic ICAI. Average semantic [PITH_FULL_IMAGE:figures/full_fig_p039_28.png] view at source ↗

**Figure 29.** Figure 29: Preference accuracy comparison between Democratic ICAI and ICAI with Decision Tree Judge (GPT-4o). [PITH_FULL_IMAGE:figures/full_fig_p040_29.png] view at source ↗

**Figure 30.** Figure 30: Prompt used by the external auditor (Qwen2.5-32B-Instruct) to evaluate each induced principle along [PITH_FULL_IMAGE:figures/full_fig_p040_30.png] view at source ↗

read the original abstract

Preference-based alignment often struggles to capture the reasoning that underlies human judgments. Many evaluations rely on multiple interacting criteria, yet pairwise labels reveal only the final choice rather than the considerations that shape preferences. Inverse Constitutional AI (ICAI) improves interpretability in decision making by summarizing preferences into natural-language principles, but its single-pass explanations miss much of the nuance involved in complex decisions. We introduce Democratic ICAI, a novel approach that gathers multiple competing rationales through structured persona debate, offering a broader and more expressive account of the factors influencing each comparison. From these richer signals, we derive clearer and more comprehensive steering principles and use them to guide decision modeling through both LLM-based and decision-tree judges. Experiments on creative preference benchmarks, MuCE-Pref and LiTBench, across multiple creative task categories show that Democratic ICAI yields a more faithful preference structure. It improves average preference prediction across tasks relative to deliberative prompting and principle-based baselines, while producing constitutions that LLM annotators prefer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Democratic ICAI adds persona debate to collect competing rationales before principle extraction, but the gains rest on an all-LLM loop with no human grounding shown.

read the letter

The main addition is the structured persona debate step that pulls multiple competing rationales before turning preferences into principles. This is distinct from the single-pass ICAI they reference and targets the nuance that pairwise labels miss in creative tasks.

The paper shows the approach on MuCE-Pref and LiTBench, reporting higher average preference prediction than deliberative prompting and other principle baselines, plus constitutions that LLM annotators favor. That part is straightforward and addresses a real limitation in earlier summarization methods.

The soft spot is the closed evaluation loop. Every stage—persona creation, debate, rationale collection, principle derivation, and judging—runs on the same model class. Without a human rationale baseline or cross-check against actual human reasoning, it is unclear whether the lift comes from richer signals or from model-specific effects like prompt length or consistency bias. The abstract gives no statistical details or exclusion rules, so the empirical claim stays hard to assess.

This is for people working on interpretable alignment for multi-criteria creative domains who want a practical way to surface more factors from preferences. It is not positioned as a general solution for all alignment work.

I would send it for peer review. The method is clear enough that referees can check the implementation and ask for human validation or cross-model tests, which would strengthen the result.

Referee Report

2 major / 1 minor

Summary. The paper introduces Democratic ICAI, extending Inverse Constitutional AI by using structured persona debates among LLMs to collect multiple competing rationales for pairwise preferences. These richer signals are used to derive steering principles that guide decision modeling via both LLM-based and decision-tree judges. Experiments on the MuCE-Pref and LiTBench creative preference benchmarks report higher average preference prediction accuracy than deliberative prompting and principle-based baselines, along with constitutions that LLM annotators prefer.

Significance. If the gains can be shown to reflect factors that matter to human judges rather than LLM-internal artifacts, the method would strengthen the interpretability of preference-derived principles for multi-criteria alignment tasks. The core idea of moving beyond single-pass summarization is a natural and potentially useful direction.

major comments (2)

[Experiments] Experiments section (and abstract): preference-prediction gains are measured exclusively with LLM annotators and judges drawn from the same model family that performs persona generation, debate, rationale extraction, and principle derivation. This closed loop means observed lifts could arise from verbosity, consistency, or length biases rather than from a demonstrably more faithful account of the underlying factors; no human-rationale baseline or cross-validation is reported.
[Method] Method description of structured persona debate: the claim that the debate step produces a 'broader and more expressive account' without simulation artifacts is load-bearing for attributing the MuCE-Pref and LiTBench improvements to the proposed mechanism, yet the manuscript provides no direct test (e.g., comparison of extracted rationales against human reasoning traces) that would separate this effect from prompt-engineering artifacts.

minor comments (1)

[Abstract] The abstract states improvements 'across multiple creative task categories' but does not enumerate the categories or the number of tasks per benchmark, which would help readers assess the scope of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, acknowledging where our current experiments leave open questions about generalizability beyond LLM judges.

read point-by-point responses

Referee: Experiments section (and abstract): preference-prediction gains are measured exclusively with LLM annotators and judges drawn from the same model family that performs persona generation, debate, rationale extraction, and principle derivation. This closed loop means observed lifts could arise from verbosity, consistency, or length biases rather than from a demonstrably more faithful account of the underlying factors; no human-rationale baseline or cross-validation is reported.

Authors: We agree this is a substantive limitation: all reported gains rely on LLM judges from the same model family, so we cannot rule out that improvements partly reflect model-internal biases rather than more faithful capture of human preference factors. The manuscript will be revised to state this limitation explicitly in the Experiments and Limitations sections and to add cross-family validation (e.g., using a held-out model family for final judging). We did not collect human rationale baselines or conduct human preference studies, so those comparisons are not available. revision: partial
Referee: Method description of structured persona debate: the claim that the debate step produces a 'broader and more expressive account' without simulation artifacts is load-bearing for attributing the MuCE-Pref and LiTBench improvements to the proposed mechanism, yet the manuscript provides no direct test (e.g., comparison of extracted rationales against human reasoning traces) that would separate this effect from prompt-engineering artifacts.

Authors: The design of the persona debate aims to surface competing rationales by construction, and the consistent accuracy lifts over single-pass baselines provide indirect support. However, we accept that without direct comparison of the extracted rationales to human reasoning traces we cannot fully isolate the contribution of the debate mechanism from prompt artifacts. The revision will add an explicit discussion of this point in the Method and Limitations sections. No human reasoning traces were collected in the present study. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical procedure for extracting principles via LLM persona debate and evaluates the resulting constitutions on the external benchmarks MuCE-Pref and LiTBench using both LLM-based and decision-tree judges. No equations, fitted parameters, or self-citations are presented that reduce a claimed prediction or first-principles result to the input data by construction. The comparative gains in preference prediction are reported against baselines rather than being definitionally forced by the evaluation method itself. The LLM-judge component is a methodological choice whose validity can be assessed externally; it does not create a self-definitional or load-bearing self-citation loop within the reported derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLMs can faithfully simulate diverse personas whose debates surface the true interacting criteria behind preferences; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption LLMs can simulate diverse personas whose structured debates accurately reflect the multiple interacting criteria in human judgments
Invoked when the method gathers 'multiple competing rationales through structured persona debate' to produce richer signals than single-pass ICAI.

pith-pipeline@v0.9.1-grok · 5716 in / 1434 out tokens · 67528 ms · 2026-06-29T04:07:44.790651+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

124 extracted references · 1 linked inside Pith

[1]

Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty

Ai safety via debate.arXiv preprint arXiv:1805.00899. Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty. 2025. Creative preference optimization.arXiv preprint arXiv:2505.14442. Hannah Rose Kirk, Alexander Whitefield, Paul Rottger, Andrew M Bean, Katerina Margatina, Rafael Mosque...

Pith/arXiv arXiv 2025
[2]

Hypothesis Generation

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information pro- cessing systems, 36:46595–46623. Appendix A Ethical Considerations and Societal Im- plications 13 B Constitution-Aligned Model Evaluation 13 B.1 Direct Preference Optimization Configuration . . . . . . . . . . . 13 B.2 Qualitative comparison of model responses . . ....
[3]

Select the response that demonstrates higher tension and conflict
[4]

Select the response that maintains a serious and mystical tone
[5]

Select the response that provides a more detailed narrative context
[6]

Select the response that emphasizes character transformation and moral consequences
[7]

Select the response that uses concise and snappy dialogue
[9]

Select the response that includes a reflective or personal commentary
[10]

Select the response that develops characters with detailed backstories
[11]

Select the response that provides more detailed and vivid descriptions
[12]

Select the response that avoids graphic or unsettling imagery
[13]

Select the response that uses humor and exaggerated reactions effectively
[14]

Select the response that includes detailed character interactions and emotions
[15]

Select the response that emphasizes humanity’s agricultural dominance and its consequences
[16]

C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories

Select the response that includes a resolution or character growth. C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories
[17]

Select the response that shows how major moments redirect the story’s path
[18]

Select the response that illustrates the protagonist’s transformation across the narrative
[19]

Select the response that resonates with characters whose choices feel true to life
[20]

Select the response that reveals how connections between characters grow and shift
[21]

Select the response that brings forward the character’s inner conflict and its meaning
[22]

Select the response that communicates the moral difficulty at the center of the story
[23]

Select the response that draws out the deeper philosophical idea driving the narrative
[24]

Select the response that reflects what the story communicates about its society or culture
[25]

Select the response that explains how the world’s rules shape the reader’s experience
[26]

Select the response that conveys how the setting establishes feeling and atmosphere
[27]

Select the response that shows how the arrangement of the storyline guides understanding
[28]

Select the response that captures how momentum and tension keep the narrative engaging
[29]

Select the response that brings attention to dialogue that feels natural and distinctive
[30]

Select the response that highlights symbolic details or subtext adding layered meaning
[31]

Select the response that explores character introspection and emotional depth
[32]

Select the response that expresses how the narrative maintains or adjusts its tone
[33]

Select the response that focuses on the character’s development and the moral consequences of their actions
[34]

Select the response that conveys the emotional effect the story ultimately creates
[35]

C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories

Select the response that offers the most clear, focused, and meaningful interpretation. C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories
[36]

Select the response that provides a clearer resolution or twist
[37]

Select the response that emphasizes humor and irony over detailed lore
[38]

Select the response that uses modern and relatable language style
[39]

Select the response that features a more dynamic and engaging narrative
[40]

Select the response that avoids excessive exposition or unrelated details
[41]

Select the response that maintains a calm and supportive tone
[42]

Select the response that emphasizes character interaction and emotional tension
[43]

Select the response that escalates tension with a dramatic revelation
[44]

Select the response that explores deeper emotional or moral conflicts
[45]

Select the response that includes dialogue for dynamic storytelling
[46]

Select the response that includes unique and unexpected side effects
[47]

Select the response that emphasizes humanity’s disdain for war
[48]

Select the response that incorporates modern technology in a creative way
[49]

C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories

Select the response that incorporates a more vivid and descriptive narrative. C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories
[50]

Select the response that balances wonder with grounded, human stakes
[51]

Select the response that establishes a compelling hook with a clean inciting incident
[52]

Select the response that reveals character through overheard, naturalistic dialogue instead of explanation
[53]

Select the response that maintains escalating tension through clear beats and reversals
[54]

Select the response that delivers vivid, cinematic imagery with specific sensory detail
[55]

Select the response that uses subtext to convey meaning rather than spelling everything out
[56]

Select the response that grounds the speculative element in believable relationships or family dynamics
[57]

Select the response that clarifies the world’s rules in a way that raises the stakes
[58]

Select the response that centers character agency, where choices meaningfully drive events
[59]

Select the response that engages moral complexity without resorting to didactic explanation
[60]

Select the response that offers thematically cohesive critique or insight
[61]

Select the response that maintains a consistent and confident tone across scenes
[62]

Select the response that demonstrates narrative economy without filler or recap
[63]

Select the response that lands a resonant final beat that lingers after reading
[64]

Select the response that subverts familiar tropes through character-first innovation
[65]

Select the response that uses humor to deepen tension and character rather than deflate stakes
[66]

Select the response that introduces conflict through subtle interpersonal friction instead of external spectacle
[67]

Select the response that enriches worldbuilding through concrete lived-in details rather than exposition
[68]

Select the response that builds tension through well-timed reveals rather than info-dumping
[69]

Select the response that communicates cultural or social texture through natural context not lecture
[70]

Hypothesis Generation

Select the response that escalates stakes through character choices rather than random events. C.5 Constitution generated with ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task
[71]

Select the response that provides a more complex explanation
[72]

Select the response that focuses on human interaction or behavior
[73]

Select the response that contrasts perception over factual statements
[74]

Select the response that describes personality traits over appearances
[75]

Select the response that focuses on abstract qualities like demeanor
[76]

Select the response that refers to general personality rather than talent
[77]

Select the response that includes scientific terminology and concepts
[78]

Select the response that provides a definitive and accurate explanation
[79]

Select the response that connects behavior to individuality and pressure
[80]

Hypothesis Generation

Select the response that emphasizes causal reasoning and energy sources. C.6 Constitution generated with Democratic ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task
[81]

Select the response that provides precise definitions and boundary conditions

Showing first 80 references.

[1] [1]

Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty

Ai safety via debate.arXiv preprint arXiv:1805.00899. Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty. 2025. Creative preference optimization.arXiv preprint arXiv:2505.14442. Hannah Rose Kirk, Alexander Whitefield, Paul Rottger, Andrew M Bean, Katerina Margatina, Rafael Mosque...

Pith/arXiv arXiv 2025

[2] [2]

Hypothesis Generation

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information pro- cessing systems, 36:46595–46623. Appendix A Ethical Considerations and Societal Im- plications 13 B Constitution-Aligned Model Evaluation 13 B.1 Direct Preference Optimization Configuration . . . . . . . . . . . 13 B.2 Qualitative comparison of model responses . . ....

[3] [3]

Select the response that demonstrates higher tension and conflict

[4] [4]

Select the response that maintains a serious and mystical tone

[5] [5]

Select the response that provides a more detailed narrative context

[6] [6]

Select the response that emphasizes character transformation and moral consequences

[7] [7]

Select the response that uses concise and snappy dialogue

[8] [9]

Select the response that includes a reflective or personal commentary

[9] [10]

Select the response that develops characters with detailed backstories

[10] [11]

Select the response that provides more detailed and vivid descriptions

[11] [12]

Select the response that avoids graphic or unsettling imagery

[12] [13]

Select the response that uses humor and exaggerated reactions effectively

[13] [14]

Select the response that includes detailed character interactions and emotions

[14] [15]

Select the response that emphasizes humanity’s agricultural dominance and its consequences

[15] [16]

C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories

Select the response that includes a resolution or character growth. C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories

[16] [17]

Select the response that shows how major moments redirect the story’s path

[17] [18]

Select the response that illustrates the protagonist’s transformation across the narrative

[18] [19]

Select the response that resonates with characters whose choices feel true to life

[19] [20]

Select the response that reveals how connections between characters grow and shift

[20] [21]

Select the response that brings forward the character’s inner conflict and its meaning

[21] [22]

Select the response that communicates the moral difficulty at the center of the story

[22] [23]

Select the response that draws out the deeper philosophical idea driving the narrative

[23] [24]

Select the response that reflects what the story communicates about its society or culture

[24] [25]

Select the response that explains how the world’s rules shape the reader’s experience

[25] [26]

Select the response that conveys how the setting establishes feeling and atmosphere

[26] [27]

Select the response that shows how the arrangement of the storyline guides understanding

[27] [28]

Select the response that captures how momentum and tension keep the narrative engaging

[28] [29]

Select the response that brings attention to dialogue that feels natural and distinctive

[29] [30]

Select the response that highlights symbolic details or subtext adding layered meaning

[30] [31]

Select the response that explores character introspection and emotional depth

[31] [32]

Select the response that expresses how the narrative maintains or adjusts its tone

[32] [33]

Select the response that focuses on the character’s development and the moral consequences of their actions

[33] [34]

Select the response that conveys the emotional effect the story ultimately creates

[34] [35]

C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories

Select the response that offers the most clear, focused, and meaningful interpretation. C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories

[35] [36]

Select the response that provides a clearer resolution or twist

[36] [37]

Select the response that emphasizes humor and irony over detailed lore

[37] [38]

Select the response that uses modern and relatable language style

[38] [39]

Select the response that features a more dynamic and engaging narrative

[39] [40]

Select the response that avoids excessive exposition or unrelated details

[40] [41]

Select the response that maintains a calm and supportive tone

[41] [42]

Select the response that emphasizes character interaction and emotional tension

[42] [43]

Select the response that escalates tension with a dramatic revelation

[43] [44]

Select the response that explores deeper emotional or moral conflicts

[44] [45]

Select the response that includes dialogue for dynamic storytelling

[45] [46]

Select the response that includes unique and unexpected side effects

[46] [47]

Select the response that emphasizes humanity’s disdain for war

[47] [48]

Select the response that incorporates modern technology in a creative way

[48] [49]

C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories

Select the response that incorporates a more vivid and descriptive narrative. C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories

[49] [50]

Select the response that balances wonder with grounded, human stakes

[50] [51]

Select the response that establishes a compelling hook with a clean inciting incident

[51] [52]

Select the response that reveals character through overheard, naturalistic dialogue instead of explanation

[52] [53]

Select the response that maintains escalating tension through clear beats and reversals

[53] [54]

Select the response that delivers vivid, cinematic imagery with specific sensory detail

[54] [55]

Select the response that uses subtext to convey meaning rather than spelling everything out

[55] [56]

Select the response that grounds the speculative element in believable relationships or family dynamics

[56] [57]

Select the response that clarifies the world’s rules in a way that raises the stakes

[57] [58]

Select the response that centers character agency, where choices meaningfully drive events

[58] [59]

Select the response that engages moral complexity without resorting to didactic explanation

[59] [60]

Select the response that offers thematically cohesive critique or insight

[60] [61]

Select the response that maintains a consistent and confident tone across scenes

[61] [62]

Select the response that demonstrates narrative economy without filler or recap

[62] [63]

Select the response that lands a resonant final beat that lingers after reading

[63] [64]

Select the response that subverts familiar tropes through character-first innovation

[64] [65]

Select the response that uses humor to deepen tension and character rather than deflate stakes

[65] [66]

Select the response that introduces conflict through subtle interpersonal friction instead of external spectacle

[66] [67]

Select the response that enriches worldbuilding through concrete lived-in details rather than exposition

[67] [68]

Select the response that builds tension through well-timed reveals rather than info-dumping

[68] [69]

Select the response that communicates cultural or social texture through natural context not lecture

[69] [70]

Hypothesis Generation

Select the response that escalates stakes through character choices rather than random events. C.5 Constitution generated with ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task

[70] [71]

Select the response that provides a more complex explanation

[71] [72]

Select the response that focuses on human interaction or behavior

[72] [73]

Select the response that contrasts perception over factual statements

[73] [74]

Select the response that describes personality traits over appearances

[74] [75]

Select the response that focuses on abstract qualities like demeanor

[75] [76]

Select the response that refers to general personality rather than talent

[76] [77]

Select the response that includes scientific terminology and concepts

[77] [78]

Select the response that provides a definitive and accurate explanation

[78] [79]

Select the response that connects behavior to individuality and pressure

[79] [80]

Hypothesis Generation

Select the response that emphasizes causal reasoning and energy sources. C.6 Constitution generated with Democratic ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task

[80] [81]

Select the response that provides precise definitions and boundary conditions