LLMs deviate from human moral preferences in kidney allocation scenarios and rarely express indecision, though low-rank fine-tuning with few examples can improve both consistency and uncertainty calibration.
Dickerson, and Vincent Conitzer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
citing papers explorer
-
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values
LLMs deviate from human moral preferences in kidney allocation scenarios and rarely express indecision, though low-rank fine-tuning with few examples can improve both consistency and uncertainty calibration.
-
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.