MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
Smith, and Luke Zettlemoyer
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5representative citing papers
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
citing papers explorer
-
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
-
SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
-
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
-
Low-Resource Languages Jailbreak GPT-4
Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.
-
PaLM: Scaling Language Modeling with Pathways
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.