hub

Gender bias in coreference resolution

Gender bias in coreference resolution , author= · 2002 · DOI 10.18653/v1/n18-2002

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open at publisher browse 12 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

cs.CY · 2026-05-11 · accept · novelty 7.0 · 2 refs

StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

BBQ: A Hand-Built Bias Benchmark for Question Answering

cs.CL · 2021-10-15 · accept · novelty 7.0

BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.

Multi-component Causal Tracing in Large Language Models

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

A unified multi-component causal tracing method that uses soft interventions and a metric transformation to efficiently select critical LLM components for a target performance metric.

DataComp-LM: In search of the next generation of training sets for language models

cs.LG · 2024-06-17 · unverdicted · novelty 6.0

DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

Gemini: A Family of Highly Capable Multimodal Models

cs.CL · 2023-12-19 · conditional · novelty 6.0

Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis

cs.CL · 2019-06-24 · unverdicted · novelty 6.0

Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

cs.CL · 2019-05-02 · accept · novelty 6.0

SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

cs.CL · 2024-11-16 · unverdicted · novelty 5.0

Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 2 of 2 citing papers after filters.

DataComp-LM: In search of the next generation of training sets for language models cs.LG · 2024-06-17 · unverdicted · none · ref 156
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
Mitigating Extrinsic Gender Bias for Bangla Classification Tasks cs.CL · 2024-11-16 · unverdicted · none · ref 24
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.

Gender bias in coreference resolution

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer