Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
hub
Gender bias in coreference resolution
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
A unified multi-component causal tracing method that uses soft interventions and a metric transformation to efficiently select critical LLM components for a target performance metric.
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.
SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
DataComp-LM: In search of the next generation of training sets for language models
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
-
Mitigating Extrinsic Gender Bias for Bangla Classification Tasks
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.