GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
The woman worked as a babysitter: On biases in language generation
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
Explicit demographic statements trigger higher refusal rates and lower semantic similarity in LLMs than implicit dialect cues, which reduce refusals but also reduce content sanitization.
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
Relative density ratio optimization stabilizes direct density ratio estimation for language model alignment while preserving statistical consistency without assuming a Bradley-Terry preference model.
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
citing papers explorer
-
Language Models are Few-Shot Learners
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
-
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
-
Query-efficient model evaluation using cached responses
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
-
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles
Explicit demographic statements trigger higher refusal rates and lower semantic similarity in LLMs than implicit dialect cues, which reduce refusals but also reduce content sanitization.
-
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
-
Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment
Relative density ratio optimization stabilizes direct density ratio estimation for language model alignment while preserving statistical consistency without assuming a Bradley-Terry preference model.
-
Galactica: A Large Language Model for Science
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.