A comprehensive survey of llm alignment techniques: Rlhf, rlaif, ppo, dpo and more

Zhichao Wang, Bin Bi, Shiva Kumar Pentyala, Kiran Ramnath, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Zhu, Xiang-Bo Mao, Sitaram Asur, Na, Cheng · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing

cs.SE · 2026-02-02 · unverdicted · novelty 7.0

RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

A Multi-Dimensional Audit of Politically Aligned Large Language Models

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

A multi-dimensional audit framework for politically aligned LLMs finds consistent trade-offs: larger models are more effective and truthful but less fair with higher bias, while fine-tuned models reduce bias but increase hallucinations and reasoning decline, and all tested models show deficiencies.

citing papers explorer

Showing 2 of 2 citing papers.

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 50
RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.
A Multi-Dimensional Audit of Politically Aligned Large Language Models cs.CL · 2026-04-27 · unverdicted · none · ref 13
A multi-dimensional audit framework for politically aligned LLMs finds consistent trade-offs: larger models are more effective and truthful but less fair with higher bias, while fine-tuned models reduce bias but increase hallucinations and reasoning decline, and all tested models show deficiencies.

A comprehensive survey of llm alignment techniques: Rlhf, rlaif, ppo, dpo and more

fields

years

verdicts

representative citing papers

citing papers explorer