pith. sign in

A comprehensive survey of llm alignment techniques: Rlhf, rlaif, ppo, dpo and more

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.SE 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

A Multi-Dimensional Audit of Politically Aligned Large Language Models

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

A multi-dimensional audit framework for politically aligned LLMs finds consistent trade-offs: larger models are more effective and truthful but less fair with higher bias, while fine-tuned models reduce bias but increase hallucinations and reasoning decline, and all tested models show deficiencies.

citing papers explorer

Showing 2 of 2 citing papers.

  • RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 50

    RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

  • A Multi-Dimensional Audit of Politically Aligned Large Language Models cs.CL · 2026-04-27 · unverdicted · none · ref 13

    A multi-dimensional audit framework for politically aligned LLMs finds consistent trade-offs: larger models are more effective and truthful but less fair with higher bias, while fine-tuned models reduce bias but increase hallucinations and reasoning decline, and all tested models show deficiencies.