Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages =

Ganguli, Deep, Hernandez, Danny, Lovitt, Liane, Askell, Amanda, Bai, Yuntao, Chen, Anna · 2022 · arXiv 1146.353322

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2 extension 1

citation-polarity summary

background 2 extend 1

representative citing papers

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

cs.CL · 2023-06-28 · conditional · novelty 7.0

LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management

cs.CY · 2026-05-17 · unverdicted · novelty 5.0

Sensitivity analyses of NYC heat emergency indices show that reasonable variations in input variables and spatial scale lead to substantially different risk scores affecting downstream government decisions.

The Paradox of Prioritization in Public Sector Algorithms

cs.HC · 2026-04-03 · unverdicted · novelty 5.0

Prioritization algorithms in public services generate relative disparities among intersectional groups as resources become scarce, intensifying perceptions of inequality.

Privacy, Prediction, and Allocation

cs.CR · 2026-04-17

citing papers explorer

Showing 6 of 6 citing papers.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 159
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023-06-28 · conditional · none · ref 28
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 72
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management cs.CY · 2026-05-17 · unverdicted · none · ref 70
Sensitivity analyses of NYC heat emergency indices show that reasonable variations in input variables and spatial scale lead to substantially different risk scores affecting downstream government decisions.
The Paradox of Prioritization in Public Sector Algorithms cs.HC · 2026-04-03 · unverdicted · none · ref 22
Prioritization algorithms in public services generate relative disparities among intersectional groups as resources become scarce, intensifying perceptions of inequality.
Privacy, Prediction, and Allocation cs.CR · 2026-04-17 · unreviewed · ref 33

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer