Recognition: 3 theorem links
· Lean TheoremTraining language models to follow instructions with human feedback
Pith reviewed 2026-05-10 16:43 UTC · model grok-4.3
The pith
Fine-tuning GPT-3 on human demonstrations and output rankings produces InstructGPT models that humans prefer over the original 175B GPT-3 even at 1.3B parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors collect labeler demonstrations of desired behavior on a mix of written prompts and API-submitted prompts, use them for supervised fine-tuning of GPT-3, then gather rankings of model outputs and apply reinforcement learning from human feedback to obtain InstructGPT. In human evaluations on their prompt distribution, the 1.3B InstructGPT is preferred to the 175B GPT-3, with gains in truthfulness, reductions in toxic generation, and minimal regressions on public NLP datasets.
What carries the argument
Two-stage fine-tuning that begins with supervised learning on human demonstrations of desired outputs and continues with reinforcement learning from human rankings of model responses.
If this is right
- Smaller models aligned this way can outperform much larger unaligned models on human preference judgments.
- The resulting models generate more truthful content and fewer toxic outputs.
- Standard public NLP benchmarks show only minimal performance regressions after the alignment steps.
- Fine-tuning with human feedback offers a practical route to making language models follow user instructions more reliably.
Where Pith is reading between the lines
- The same collection and ranking process could be applied to other base models to test whether the preference gains hold beyond the GPT-3 family.
- If human feedback can be gathered at scale for more complex or domain-specific prompts, the method might reduce reliance on raw parameter count for capability gains.
- Extending the ranking step to capture longer-term user satisfaction rather than single-turn preferences could further tighten alignment.
Load-bearing premise
The preferences expressed by the human labelers on the prompts they saw accurately capture what a wide range of future users will want in real applications.
What would settle it
A new human evaluation on a fresh collection of prompts drawn from actual user interactions where InstructGPT outputs are not rated higher than those from the base GPT-3.
read the original abstract
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces InstructGPT models obtained by first performing supervised fine-tuning of GPT-3 on a dataset of human-written demonstrations of desired behavior, then further training via reinforcement learning from human feedback (RLHF) using a reward model trained on human preference rankings of model outputs. On a held-out set of prompts drawn from the same distribution (labeler-written and API-submitted), human evaluators prefer outputs from the 1.3B InstructGPT over those from the 175B GPT-3; the aligned models also exhibit higher truthfulness and lower toxicity with only small regressions on public NLP benchmarks.
Significance. If the reported human-preference results hold, the work supplies direct empirical evidence that RLHF can produce substantial alignment gains on instruction-following tasks, including the striking result that a 100x smaller model can be preferred to its much larger base model. The approach is grounded in independent human evaluations rather than circular derivations, and the public benchmarks provide a useful check against capability regression. This strengthens the case for human feedback as a practical alignment technique beyond pure scaling.
major comments (2)
- [§4] §4 (Human evaluations): The central preference comparison (1.3B InstructGPT preferred to 175B GPT-3) is reported without confidence intervals, sample sizes per comparison, or inter-rater agreement statistics. Because the main claim rests entirely on these human judgments, the absence of uncertainty quantification leaves open the possibility that the observed win rates are sensitive to sampling variability or labeler idiosyncrasies.
- [§3.3] §3.3 (RLHF stage): The reward model and PPO training both involve multiple free hyperparameters (learning rates, KL coefficient, etc.). While the paper lists the chosen values, it provides no ablation or sensitivity analysis showing that the reported preference gains are robust to reasonable changes in these choices; this weakens that the gains are attributable to the RLHF procedure itself rather than a narrow hyperparameter sweet spot.
minor comments (2)
- [Table 2] Table 2 and Figure 3: the public-benchmark regressions are described as “minimal,” but the absolute deltas (e.g., on MMLU or TruthfulQA) should be stated numerically in the text for quick assessment.
- [§2.2] §2.2: the prompt distribution is described only at a high level (“labeler-written and API-submitted”); a short appendix table characterizing prompt length, topic diversity, or task type would aid readers in judging external validity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and positive assessment of the work. We address each major comment below, proposing revisions where they strengthen the manuscript without requiring new large-scale experiments.
read point-by-point responses
-
Referee: [§4] §4 (Human evaluations): The central preference comparison (1.3B InstructGPT preferred to 175B GPT-3) is reported without confidence intervals, sample sizes per comparison, or inter-rater agreement statistics. Because the main claim rests entirely on these human judgments, the absence of uncertainty quantification leaves open the possibility that the observed win rates are sensitive to sampling variability or labeler idiosyncrasies.
Authors: We agree that uncertainty quantification would improve the reporting of the human preference results. The evaluations were performed on a held-out set of prompts with multiple labelers, and we have the underlying data to compute bootstrap confidence intervals, exact sample sizes (prompts and pairwise comparisons), and inter-rater agreement (e.g., Fleiss' kappa). We will add these statistics to Section 4 and the appendix in the revised manuscript. revision: yes
-
Referee: [§3.3] §3.3 (RLHF stage): The reward model and PPO training both involve multiple free hyperparameters (learning rates, KL coefficient, etc.). While the paper lists the chosen values, it provides no ablation or sensitivity analysis showing that the reported preference gains are robust to reasonable changes in these choices; this weakens that the gains are attributable to the RLHF procedure itself rather than a narrow hyperparameter sweet spot.
Authors: The manuscript does not contain ablations on the RLHF hyperparameters; values were chosen via small-scale preliminary tuning informed by prior RLHF literature. We cannot conduct full sensitivity analyses without substantial new compute and human data collection. In revision we will expand Section 3.3 to better motivate the selected values, note the limitation, and point out that preference gains were observed consistently across model scales (1.3B, 6B, and 175B InstructGPT). revision: partial
Circularity Check
No significant circularity in the empirical results or method
full rationale
The paper presents an empirical pipeline—collecting labeler demonstrations for supervised fine-tuning of GPT-3, followed by collecting output rankings for reinforcement learning from human feedback—whose final performance claims rest on separate human preference evaluations conducted on held-out prompts from the authors' distribution. These evaluations directly compare the resulting 1.3B InstructGPT model against the 175B GPT-3 baseline and are not derived from or equivalent to the training objective itself. No equations, fitted parameters, or self-citations are invoked in a manner that reduces the reported preference gains, truthfulness improvements, or toxicity reductions to the input data by construction. The central result is therefore an independent measurement rather than a renaming or tautological restatement of the training process.
Axiom & Free-Parameter Ledger
free parameters (2)
- reward model training hyperparameters
- PPO hyperparameters
axioms (1)
- domain assumption Human preferences over text outputs can be accurately represented by a scalar reward function trained on pairwise rankings
Lean theorems connected to this paper
-
LawOfExistencedefect_zero_iff_one echoeswe show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback... outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation
Forward citations
Cited by 60 Pith papers
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
-
Generative Agents: Interactive Simulacra of Human Behavior
Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.
-
Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics
LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.
-
Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends...
-
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reaso...
-
ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.
-
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
-
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor
Political bias audits of LLMs largely capture sycophantic accommodation to the inferred political identity of the asker rather than any fixed model ideology.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Latent Space Probing for Adult Content Detection in Video Generative Models
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
-
Rates of forgetting for the sequentially Markov coalescent
SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.
-
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
R2IF improves LLM function-calling accuracy by up to 34.62% on BFCL using a composite reward system with CER and SMV components optimized via GRPO, while increasing interpretability through positive CoT effectiveness.
-
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs
HiPO improves LLM reasoning performance by optimizing preferences separately on response segments rather than entire outputs.
-
S-GRPO: Unified Post-Training for Large Vision-Language Models
S-GRPO unifies SFT and RL for LVLMs via conditional ground-truth injection that supplies a maximal-reward anchor when group exploration fails completely.
-
Reinforcement Learning via Value Gradient Flow
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
-
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
-
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
-
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation
SPASM introduces a stability-first framework with Egocentric Context Projection to maintain consistent personas and eliminate echoing in multi-turn LLM agent dialogues.
-
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security
MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
-
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent...
-
STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning
STEER represents videos as time-ordered event schemas and uses Pareto-Frontier guided Advantage Balancing in RL to train a 4B model that matches 7B baselines on video tasks with half the frames.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
Let's Verify Step by Step
Process supervision significantly outperforms outcome supervision for training models on the MATH dataset, achieving 78% accuracy on a representative test subset with active learning and a released 800k step-label dataset.
-
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Visual ChatGPT integrates visual foundation models with ChatGPT via prompts to enable multi-step image understanding, generation, and editing in conversational interactions.
-
A Generalist Agent
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
Early Data Exposure Improves Robustness to Subsequent Fine-Tuning
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.
-
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
-
Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
A new RL objective adapts trust-region and off-policy handling automatically via normalized effective sample size of batch policy ratios, matching tuned baselines without new hyperparameters.
-
Evaluating the False Trust engendered by LLM Explanations
A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models
Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing ...
-
Why Does Agentic Safety Fail to Generalize Across Tasks?
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstr...
-
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
-
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
-
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
RLearner-LLM achieves up to 6x gains in NLI entailment over standard fine-tuning by using an automated hybrid DPO pipeline that balances logic and fluency across multiple model sizes and domains.
-
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
RLearner-LLM's Hybrid-DPO fuses DeBERTa NLI and LLM verifier scores to deliver up to 6x higher NLI entailment than standard SFT while preserving answer coverage across academic domains.
-
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
-
Iterative Finetuning is Mostly Idempotent
Iterative self-finetuning of LLMs mostly fails to amplify seeded behavioral traits, with amplification limited to specific DPO setups and often harming coherence.
-
AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
AEM lifts entropy analysis to the response level and uses a derived uncertainty proxy to rescale advantages, enabling better exploration-exploitation balance and consistent gains over RL baselines on agent benchmarks.
-
Diversity in Large Language Models under Supervised Fine-Tuning
TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
-
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
-
What Did They Mean? How LLMs Resolve Ambiguous Social Situations across Perspectives and Roles
LLMs produce interpretive closure in 87.5% of ambiguous social scenarios through narrative alignment, reversal, or normative advice, with first-person perspectives increasing alignment tendencies.
-
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
Transient Turn Injection is a new attack that evades LLM moderation by spreading harmful intent over multiple isolated turns using automated agents.
-
Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework
A new five-principle framework applied to 34 practitioner AI governance prompts finds 37% lack key structural elements such as data classification and rubrics.
-
Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback
VPL learns individualized vibrotactile preferences efficiently via uncertainty-aware Gaussian process models and active query selection in a 13-participant user study on an Xbox controller.
-
Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks
Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.
-
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
ClawGuard enforces deterministic, user-derived access constraints at tool boundaries to block indirect prompt injection without changing the underlying LLM.
-
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
ClawGuard enforces user-derived access constraints at tool-call boundaries to block indirect prompt injection in tool-augmented LLM agents across web, MCP, and skill injection channels.
-
Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Relax is a new RL training engine with omni-native design and async execution that delivers up to 2x speedups over baselines like veRL while converging to equivalent reward levels on Qwen3 models.
-
Pioneer Agent: Continual Improvement of Small Language Models in Production
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on ...
-
C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination
C2T learns an LLM-derived common-sense reward function to improve cooperative multi-intersection traffic control policies, outperforming standard MARL baselines on efficiency, safety, and energy proxies while allowing...
-
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and...
-
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense
TrajGuard detects jailbreaks by tracking how hidden-state trajectories move toward high-risk regions during decoding, achieving 95% defense rate with 5.2 ms/token latency across tested attacks.
-
IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
AI models exhibit identity-contingent withholding, providing better clinical guidance on benzodiazepine tapering to physicians than laypeople in identical scenarios, with a measured decoupling gap of +0.38 and 13.1 pe...
-
SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
-
Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models
GCAN cuts LLM hallucination rates by 27.8% and raises factual accuracy by 16.4% on TruthfulQA and HotpotQA by using causal token graphs and a new Causal Contribution Score.
Reference graph
Works this paper leans on
-
[1]
Hey, what are you doing there?
Theo 3. brainstorming Tell me a list of topics related to: - interior design - sustainable ecosystems - fake plants brainstorming Name some rare gems classification This is a tweet sentiment classifier. {tweet} Sentiment: negative === {tweet} Sentiment: neutral === {tweet} Sentiment: classification The following is a list of products and the kind of product ...
work page 2048
-
[2]
Agreement on sensitive speech flagging. We created a dataset of prompts and completions, where some of prompts or completions were sensitive (i.e. anything that could elicit strong negative feelings, whether by being toxic, sexual, violent, judgemental, political, etc.). We labeled this data for sensitivity ourselves, and measured agreement between us and labelers
-
[3]
Agreement on rankings. We take prompts submitted to our API, and several model completions, and have labelers rank the completions by overall quality. We measure their agreement with researcher labels
-
[4]
Sensitive demonstration writing. We created a small set of sensitive prompts, where responding to the outputs appropriately would require nuance. We then rated each demon- stration on a 1-7 Likert scale, and computed an average “demonstration score” for each labeler
-
[5]
For what topics or cultural groups are you comfortable identifying sensitive speech?
Self-assessed ability to identify sensitive speech for different groups. We wanted to select a team of labelers that had collectively were able to identify sensitive content in a broad range of areas. For legal reasons, we can’t hire contractors based on demographic criteria. Thus, we had labelers answer the question: “For what topics or cultural groups a...
work page 2020
-
[6]
Who announced Ginsburg’s death?
-
[7]
What was Ginsburg’s statement to her granddaughter?
-
[8]
Why will Ginsburg’s death have profound consequences for the court and the country? Labeler demonstration
-
[9]
The Supreme Court announced her death
-
[11]
Ruth Badr Ginsburg was the leader of the liberal wing, and because the court is about to open a new term, the chief justice no longer holds the controlling vote in contested cases GPT-3 175B completion:
-
[12]
Chief Justice John Roberts
-
[13]
My most fervent wish is that I will not be replaced until a new president is installed
"My most fervent wish is that I will not be replaced until a new president is installed."
-
[14]
Because of the Supreme Court vacancy Ginsburg’s death creates. InstructGPT 175B completion:
-
[15]
The Supreme Court announced Ginsburg’s death
-
[16]
Ginsburg’s statement was a plea for the president not to replace her before his term ends
-
[17]
Ginsburg’s death means there will no longer be a clear majority on the court, throwing upcoming decisions into doubt. Figure 48: Labeler-written prompt from our dataset, along with the human-written demonstration, and completions from GPT-3 175B and InstructGPT175B. Prompt is lightly cherry-picked (5 selected from 15 to show a diverse range of tasks), and...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.