Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Singla, Somanshu, Wang, Zhen, Liu, Tianyang, Ashfaq, Abdullah, Hu, Zhiting, Xing, Eric P · 2024 · DOI 10.18653/v1/2024.emnlp-main.1220

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.

iPOE: Interpretable Prompt Optimization via Explanations

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

iPOE generates and optimizes annotation guidelines from explanations to produce interpretable prompts, reporting up to 39% gains over baselines on four datasets with LLM explanations substituting for human ones.

citing papers explorer

Showing 2 of 2 citing papers.

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG cs.CL · 2026-06-30 · unverdicted · none · ref 56
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
iPOE: Interpretable Prompt Optimization via Explanations cs.CL · 2026-05-18 · unverdicted · none · ref 44
iPOE generates and optimizes annotation guidelines from explanations to produce interpretable prompts, reporting up to 39% gains over baselines on four datasets with LLM explanations substituting for human ones.

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

fields

years

verdicts

representative citing papers

citing papers explorer