GPT-4 Technical Report

OpenAI , Josh Achiam , Steven Adler , Sandhini Agarwal , Lama Ahmad , Ilge Akkaya , Florencia Leoni Aleman , Diogo Almeida

show 272 more authors

Janko Altenschmidt Sam Altman Shyamal Anadkat Red Avila Igor Babuschkin Suchir Balaji Valerie Balcom Paul Baltescu Haiming Bao Mohammad Bavarian Jeff Belgum Irwan Bello Jake Berdine Gabriel Bernadett-Shapiro Christopher Berner Lenny Bogdonoff Oleg Boiko Madelaine Boyd Anna-Luisa Brakman Greg Brockman Tim Brooks Miles Brundage Kevin Button Trevor Cai Rosie Campbell Andrew Cann Brittany Carey Chelsea Carlson Rory Carmichael Brooke Chan Che Chang Fotis Chantzis Derek Chen Sully Chen Ruby Chen Jason Chen Mark Chen Ben Chess Chester Cho Casey Chu Hyung Won Chung Dave Cummings Jeremiah Currier Yunxing Dai Cory Decareaux Thomas Degry Noah Deutsch Damien Deville Arka Dhar David Dohan Steve Dowling Sheila Dunning Adrien Ecoffet Atty Eleti Tyna Eloundou David Farhi Liam Fedus Niko Felix Sim\'on Posada Fishman Juston Forte Isabella Fulford Leo Gao Elie Georges Christian Gibson Vik Goel Tarun Gogineni Gabriel Goh Rapha Gontijo-Lopes Jonathan Gordon Morgan Grafstein Scott Gray Ryan Greene Joshua Gross Shixiang Shane Gu Yufei Guo Chris Hallacy Jesse Han Jeff Harris Yuchen He Mike Heaton Johannes Heidecke Chris Hesse Alan Hickey Wade Hickey Peter Hoeschele Brandon Houghton Kenny Hsu Shengli Hu Xin Hu Joost Huizinga Shantanu Jain Shawn Jain Joanne Jang Angela Jiang Roger Jiang Haozhun Jin Denny Jin Shino Jomoto Billie Jonn Heewoo Jun Tomer Kaftan {\L}ukasz Kaiser Ali Kamali Ingmar Kanitscheider Nitish Shirish Keskar Tabarak Khan Logan Kilpatrick Jong Wook Kim Christina Kim Yongjik Kim Jan Hendrik Kirchner Jamie Kiros Matt Knight Daniel Kokotajlo {\L}ukasz Kondraciuk Andrew Kondrich Aris Konstantinidis Kyle Kosic Gretchen Krueger Vishal Kuo Michael Lampe Ikai Lan Teddy Lee Jan Leike Jade Leung Daniel Levy Chak Ming Li Rachel Lim Molly Lin Stephanie Lin Mateusz Litwin Theresa Lopez Ryan Lowe Patricia Lue Anna Makanju Kim Malfacini Sam Manning Todor Markov Yaniv Markovski Bianca Martin Katie Mayer Andrew Mayne Bob McGrew Scott Mayer McKinney Christine McLeavey Paul McMillan Jake McNeil David Medina Aalok Mehta Jacob Menick Luke Metz Andrey Mishchenko Pamela Mishkin Vinnie Monaco Evan Morikawa Daniel Mossing Tong Mu Mira Murati Oleg Murk David M\'ely Ashvin Nair Reiichiro Nakano Rajeev Nayak Arvind Neelakantan Richard Ngo Hyeonwoo Noh Long Ouyang Cullen O'Keefe Jakub Pachocki Alex Paino Joe Palermo Ashley Pantuliano Giambattista Parascandolo Joel Parish Emy Parparita Alex Passos Mikhail Pavlov Andrew Peng Adam Perelman Filipe de Avila Belbute Peres Michael Petrov Henrique Ponde de Oliveira Pinto Michael (Rai) Pokorny Michelle Pokrass Vitchyr H. Pong Tolly Powell Alethea Power Boris Power Elizabeth Proehl Raul Puri Alec Radford Jack Rae Aditya Ramesh Cameron Raymond Francis Real Kendra Rimbach Carl Ross Bob Rotsted Henri Roussez Nick Ryder Mario Saltarelli Ted Sanders Shibani Santurkar Girish Sastry Heather Schmidt David Schnurr John Schulman Daniel Selsam Kyla Sheppard Toki Sherbakov Jessica Shieh Sarah Shoker Pranav Shyam Szymon Sidor Eric Sigler Maddie Simens Jordan Sitkin Katarina Slama Ian Sohl Benjamin Sokolowsky Yang Song Natalie Staudacher Felipe Petroski Such Natalie Summers Ilya Sutskever Jie Tang Nikolas Tezak Madeleine B. Thompson Phil Tillet Amin Tootoonchian Elizabeth Tseng Preston Tuggle Nick Turley Jerry Tworek Juan Felipe Cer\'on Uribe Andrea Vallone Arun Vijayvergiya Chelsea Voss Carroll Wainwright Justin Jay Wang Alvin Wang Ben Wang Jonathan Ward Jason Wei CJ Weinmann Akila Welihinda Peter Welinder Jiayi Weng Lilian Weng Matt Wiethoff Dave Willner Clemens Winter Samuel Wolrich Hannah Wong Lauren Workman Sherwin Wu Jeff Wu Michael Wu Kai Xiao Tao Xu Sarah Yoo Kevin Yu Qiming Yuan Wojciech Zaremba Rowan Zellers Chong Zhang Marvin Zhang Shengjia Zhao Tianhao Zheng Juntang Zhuang William Zhuk Barret Zoph

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords gpt-4performancemodelpredictreporttextacademicaccept

0 comments

read the original abstract

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ViMU: Benchmarking Video Metaphorical Understanding
cs.CV 2026-05 unverdicted novelty 8.0

ViMU is the first benchmark for evaluating video models on metaphorical and subtextual understanding using hint-free questions grounded in multimodal evidence.
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
cs.CL 2026-05 accept novelty 8.0

CiteVQA requires models to cite specific document regions with bounding boxes alongside answers and finds that even the strongest MLLMs frequently cite the wrong region, with top SAA scores of only 76.0 for closed mod...
Pretraining Exposure Explains Popularity Judgments in Large Language Models
cs.CL 2026-05 unverdicted novelty 8.0

LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts
cs.CV 2026-05 unverdicted novelty 8.0

An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.
Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers
cs.LG 2026-05 unverdicted novelty 8.0

A standard Transformer with O(ε^{-d0/α}) blocks can approximate any bounded d0-dimensional Hölder function of smoothness α to accuracy ε, but at least Ω(ε^{-d0/(4α)}) blocks are required.
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
cs.LG 2026-05 unverdicted novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
LLM Translation of Compiler Intermediate Representation
cs.PL 2026-05 unverdicted novelty 8.0

IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.
Nearly Optimal Attention Coresets
cs.DS 2026-05 unverdicted novelty 8.0

ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.
Efficient Preference Poisoning Attack on Offline RLHF
cs.LG 2026-05 unverdicted novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.
Characterizing the Expressivity of Local Attention in Transformers
cs.CL 2026-05 unverdicted novelty 8.0

Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by adding a second past operator in linear temporal logic, with global and local attention being expressive...
From Context to Skills: Can Language Models Learn from Context Skillfully?
cs.AI 2026-04 unverdicted novelty 8.0

Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.
Revisable by Design: A Theory of Streaming LLM Agent Execution
cs.LG 2026-04 unverdicted novelty 8.0

LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less co...
RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering
cs.CL 2026-04 unverdicted novelty 8.0

RespondeoQA is the first benchmark dataset for question answering and translation between Latin and English, with 7,800 pairs from pedagogical sources and initial LLM evaluations.
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
cs.AI 2026-04 accept novelty 8.0

MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmen...
ArgBench: Benchmarking LLMs on Computational Argumentation Tasks
cs.CL 2026-04 unverdicted novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
VoxSafeBench: Not Just What Is Said, but Who, How, and Where
cs.SD 2026-04 unverdicted novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
cs.RO 2026-04 unverdicted novelty 8.0

RoboLab is a new simulation benchmark with 120 tasks across visual, procedural, and relational axes that quantifies generalization gaps and perturbation sensitivity in task-generalist robotic policies.
PhysInOne: Visual Physics Learning and Reasoning in One Suite
cs.CV 2026-04 unverdicted novelty 8.0

PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and m...
HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing
cs.CV 2026-04 accept novelty 8.0

HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.
Disentangling MLP Neuron Weights in Vocabulary Space
cs.CL 2026-04 unverdicted novelty 8.0

ROTATE disentangles MLP neurons into faithful vocabulary channels by optimizing weight rotations to maximize vocabulary-space kurtosis, outperforming activation-based baselines for neuron descriptions.
ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos
cs.CV 2026-04 unverdicted novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.
FermiLink: A Unified Agent Framework for Multidomain Autonomous Scientific Simulations
physics.chem-ph 2026-04 conditional novelty 8.0

FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in...
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
cs.CR 2026-04 unverdicted novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks
cs.AI 2026-04 unverdicted novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction parado...
Adaptive Stopping for Multi-Turn LLM Reasoning
cs.CL 2026-04 unverdicted novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG an...
When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
cs.LG 2026-03 conditional novelty 8.0

Content-based routing succeeds only when models provide bidirectional context and perform pairwise comparisons, with bidirectional Mamba plus rank-1 projection reaching 99.7% precision at linear inference cost.
Large Language Diffusion Models
cs.CL 2025-02 unverdicted novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
cs.CV 2024-09 accept novelty 8.0

Molmo VLMs trained on newly collected PixMo open datasets achieve state-of-the-art performance among open-weight models and surpass multiple proprietary VLMs including Claude 3.5 Sonnet and Gemini 1.5 Pro.
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
cs.LG 2024-07 conditional novelty 8.0

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
cs.AI 2024-04 accept novelty 8.0

OSWorld provides the first unified real-computer benchmark for open-ended multimodal agent tasks, exposing large performance gaps between humans and state-of-the-art LLM/VLM agents.
RULER: What's the Real Context Size of Your Long-Context Language Models?
cs.CL 2024-04 accept novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
cs.CL 2023-11 unverdicted novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.
The Linear Representation Hypothesis and the Geometry of Large Language Models
cs.CL 2023-11 conditional novelty 8.0

Linear representations of high-level concepts in LLMs are formalized via counterfactuals in input and output spaces, unified under a causal inner product that enables consistent probing and steering.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
cs.CL 2023-05 accept novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
cs.CR 2026-05 unverdicted novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution
cs.CV 2026-05 conditional novelty 7.0

SIRA mitigates hallucinations in LVLMs by internally contrasting full visual access against a masked late-layer branch that retains shared context but lacks fine-grained visual evidence.
GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding
cs.CV 2026-05 unverdicted novelty 7.0

GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolutio...
What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions
cs.LG 2026-05 unverdicted novelty 7.0

Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
cs.CL 2026-05 unverdicted novelty 7.0

New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.
BOOKMARKS: Efficient Active Storyline Memory for Role-playing
cs.CL 2026-05 unverdicted novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
cs.LG 2026-05 unverdicted novelty 7.0

GHGbench is a new multi-entity benchmark for company- and building-level carbon emission prediction that shows building tasks are harder, out-of-distribution gaps dominate, and multimodal data aids generalization.
Sampling from Flow Language Models via Marginal-Conditioned Bridges
cs.LG 2026-05 unverdicted novelty 7.0

Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and r...
Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models
cs.LG 2026-05 conditional novelty 7.0

DyGFM introduces decoupled pre-training and divergence-conditioned prompts to create the first multi-domain dynamic graph foundation model that outperforms baselines on node classification and link prediction.
Query-Conditioned Test-Time Self-Training for Large Language Models
cs.CL 2026-05 unverdicted novelty 7.0

QueST lets LLMs create query-conditioned problem-solution pairs at inference time and use them for parameter-efficient self-training, outperforming prior test-time baselines on math and science benchmarks.
Query-Conditioned Test-Time Self-Training for Large Language Models
cs.CL 2026-05 conditional novelty 7.0

QueST adapts LLMs at test time by generating query-specific problem-solution pairs for self-supervised fine-tuning, improving reasoning performance without external data.
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
cs.CL 2026-05 unverdicted novelty 7.0

IfcLLM combines relational and graph representations of IFC models with iterative LLM reasoning to deliver 93.3-100% first-attempt accuracy on natural language queries across three test models.
STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition
cs.CV 2026-05 conditional novelty 7.0

STAR improves 1-shot action recognition by up to 8.1% on SSv2-Full through semantic-temporal alignment and Mamba-based prototype refinement.
CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models
cs.CV 2026-05 conditional novelty 7.0

LiteLVLM prunes visual tokens for pixel grounding by reversing CLIP visual-text similarity to retain referent region tokens, outperforming prior methods by over 5% with 22% speedup and 2.3x memory reduction without an...
OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems
cs.DB 2026-05 conditional novelty 7.0

OxyEcomBench is a unified multimodal benchmark covering 6 capability areas and 29 tasks with authentic e-commerce data to measure how well foundation models handle real platform, merchant, and customer challenges.
ImageAttributionBench: How Far Are We from Generalizable Attribution?
cs.CV 2026-05 unverdicted novelty 7.0

ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation
cs.CV 2026-05 unverdicted novelty 7.0

Seg-Agent performs language-guided segmentation without training by using Set-of-Mark visual prompts to enable explicit multimodal chain-of-reasoning in three stages: generation, selection, and refinement.
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
stat.ML 2026-05 unverdicted novelty 7.0

In two-layer networks, weak-to-strong training elicits the target feature direction from pre-trained subspaces and preserves correlated off-target features, unlike standard fine-tuning.
State-Centric Decision Process
cs.AI 2026-05 unverdicted novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models
cs.CV 2026-05 unverdicted novelty 7.0

G²TR reduces visual tokens and prefill computation by 1.94x in separate-encoder UMMs via generation-guided importance from VAE latent consistency while preserving reasoning accuracy and editing quality.
Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
cs.CV 2026-05 unverdicted novelty 7.0

INSET embeds images as native tokens in interleaved instructions, outperforming prior methods on multi-image consistency and text alignment as complexity grows.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 7.0

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
cs.CV 2026-05 unverdicted novelty 7.0

Uni-AdGen uses a unified autoregressive framework with foreground perception, instruction tuning, and coarse-to-fine preference modules to generate personalized image-text ads from noisy user behaviors, outperforming ...
CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference
cs.IT 2026-05 unverdicted novelty 7.0

CR^2 matches full-information routing performance for device-edge LLM inference using only device-side signals and cuts normalized deployment cost by up to 16.9% at matched accuracy.
Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters
cs.CV 2026-05 unverdicted novelty 7.0

Chronicles-OCR is the first benchmark with 2,800 images across the complete evolutionary trajectory of Chinese characters, defining four tasks to evaluate VLLMs' cross-temporal visual perception.
Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters
cs.LG 2026-05 unverdicted novelty 7.0

Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).

Reference graph

Works this paper leans on

128 extracted references · 127 canonical work pages · cited by 933 Pith papers

[1]

Understanding the Capabilities, Limita- tions, and Societal Impact of Large Language Models,

A. Tamkin, M. Brundage, J. Clark, and D. Ganguli, “Understanding the Capabilities, Limita- tions, and Societal Impact of Large Language Models,” Feb. 2021

work page 2021
[2]

Introducing the new Bing

“Introducing the new Bing. ” https://www.bing.com/new

work page
[3]

WebGPT: Improving the factual accuracy of language models through web browsing

J. Hilton, R. Nakano, S. Balaji, and J. Schulman, “WebGPT: Improving the factual accuracy of language models through web browsing. ” https://openai.com/research/webgpt, Dec. 2021

work page 2021
[4]

ACT-1: Transformer for Actions – Adept

“ACT-1: Transformer for Actions – Adept. ” https://www.adept.ai/blog/act-1

work page
[5]

Evaluating Large Language Models Trained on Code,

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Vo...

work page 2021
[6]

Ethical and social risks of harm from Language Models,

L. Weidinger, J. Mellor, M. Rauh, C. Griﬃn, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, Z. Kenton, S. Brown, W. Hawkins, T. Stepleton, C. Biles, A. Birhane, J. Haas, L. Rimell, L. A. Hendricks, W. Isaac, S. Legassick, G. Irving, and I. Gabriel, “Ethical and social risks of harm from Language Models,” Dec. 2021

work page 2021
[7]

Release Strategies and the Social Impacts of Language Models,

I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-Voss, J. Wu, A. Radford, G. Krueger, J. W. Kim, S. Kreps, M. McCain, A. Newhouse, J. Blazakis, K. McGuﬃe, and J. Wang, “Release Strategies and the Social Impacts of Language Models,” Nov. 2019

work page 2019
[8]

Improving language understanding with unsupervised learning

A. Radford, “Improving language understanding with unsupervised learning. ” https://ope- nai.com/research/language-unsupervised, June 2018

work page 2018
[9]

Better language models and their implications

A. Radford, J. Wu, D. Amodei, D. Amodei, J. Clark, M. Brundage, I. Sutskever, A. Askell, D. Lansky, D. Hernandez, and D. Luan, “Better language models and their implications. ” https://openai.com/research/better-language-models, Feb. 2019

work page 2019
[10]

Language Models are Few-Shot Learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

work page 2020
[11]

Planning for AGI and beyond

S. Altman, “Planning for AGI and beyond. ” https://openai.com/blog/planning-for-agi-and- beyond, Feb. 2023

work page 2023
[12]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” Mar. 2022. 71

work page 2022
[13]

Deep reinforcement learning from human preferences,

P. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Feb. 2023

work page 2023
[14]

Model Cards for Model Reporting,

M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru, “Model Cards for Model Reporting,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220–229, Jan. 2019

work page 2019
[15]

System Cards, a new resource for under- standing how AI systems work

N. Green, C. Procope, A. Cheema, and A. Adediji, “System Cards, a new resource for under- standing how AI systems work. ” https://ai.facebook.com/blog/system-cards-a-new-resource- for-understanding-how-ai-systems-work/, Feb. 2022

work page 2022
[16]

DALL ·E 2 Preview - Risks and Limitations

“DALL ·E 2 Preview - Risks and Limitations. ” OpenAI, Apr. 2022

work page 2022
[17]

Diﬀerential Technology Development: A Responsible Innovation Principle for Navigating Technology Risks,

J. Sandbrink, H. Hobbs, J. Swett, A. Dafoe, and A. Sandberg, “Diﬀerential Technology Development: A Responsible Innovation Principle for Navigating Technology Risks,” Sept. 2022

work page 2022
[18]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback,

Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Gan- guli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatﬁeld-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, and J. Ka- plan...

work page 2022
[19]

Discovering Language Model Behaviors with Model-Written Evaluations,

E. Perez, S. Ringer, K. Lukoši¯ ut˙ e, K. Nguyen, E. Chen, S. Heiner, C. Pettit, C. Olsson, S. Kundu, S. Kadavath, A. Jones, A. Chen, B. Mann, B. Israel, B. Seethor, C. McKinnon, C. Olah, D. Yan, D. Amodei, D. Amodei, D. Drain, D. Li, E. Tran-Johnson, G. Khundadze, J. Kernion, J. Landis, J. Kerr, J. Mueller, J. Hyun, J. Landau, K. Ndousse, L. Goldberg, L....

work page 2022
[20]

B. P. Kehoe, Zen and the Art of the Internet . Project Gutenberg, June 1992

work page 1992
[21]

Lessons learned on language model safety and misuse

M. Brundage, K. Mayer, T. Eloundou, S. Agarwal, S. Adler, G. Krueger, J. Leike, and P. Mishkin, “Lessons learned on language model safety and misuse. ” https://ope- nai.com/research/language-model-safety-and-misuse, Mar. 2022

work page 2022
[22]

Language Models are Unsupervised Multitask Learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2019

work page 2019
[23]

G. C. Bowker and S. L. Star, Sorting Things Out . MIT Press, Aug. 2000

work page 2000
[24]

Taxonomy of Risks posed by Language Models,

L. Weidinger, J. Uesato, M. Rauh, C. Griﬃn, P.-S. Huang, J. Mellor, A. Glaese, M. Cheng, B. Balle, A. Kasirzadeh, C. Biles, S. Brown, Z. Kenton, W. Hawkins, T. Stepleton, A. Birhane, L. A. Hendricks, L. Rimell, W. Isaac, J. Haas, S. Legassick, G. Irving, and I. Gabriel, “Taxonomy of Risks posed by Language Models,” in 2022 ACM Conference on Fairness, Acco...

work page 2022
[25]

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets,

I. Solaiman and C. Dennison, “Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets,” Nov. 2021

work page 2021
[26]

Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems,

H. Khlaaf, “Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems,” Trail of Bits , 2023

work page 2023
[27]

Toward Trustworthy AI Development: Mechanisms for Supporting Veriﬁable Claims,

M. Brundage, S. A vin, J. Wang, H. Belﬁeld, G. Krueger, G. Hadﬁeld, H. Khlaaf, J. Yang, H. Toner, R. Fong, T. Maharaj, P. W. Koh, S. Hooker, J. Leung, A. Trask, E. Bluemke, J. Lebensold, C. O’Keefe, M. Koren, T. Ryﬀel, J. B. Rubinovitz, T. Besiroglu, F. Carugati, J. Clark, P. Eckersley, S. de Haas, M. Johnson, B. Laurie, A. Ingerman, I. Krawczuk, A. Askel...

work page 2020
[28]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned,

D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatﬁeld-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brow...

work page 2022
[29]

Red Teaming Language Models with Language Models,

E. Perez, S. Huang, F. Song, T. Cai, R. Ring, J. Aslanides, A. Glaese, N. McAleese, and G. Irving, “Red Teaming Language Models with Language Models,” Feb. 2022

work page 2022
[30]

A Hazard Analysis Framework for Code Synthesis Large Language Models,

H. Khlaaf, P. Mishkin, J. Achiam, G. Krueger, and M. Brundage, “A Hazard Analysis Framework for Code Synthesis Large Language Models,” July 2022

work page 2022
[31]

On Faithfulness and Factuality in Abstractive Summarization,

J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On Faithfulness and Factuality in Abstractive Summarization,” May 2020

work page 2020
[32]

TruthfulQA: Measuring How Models Mimic Human False- hoods,

S. Lin, J. Hilton, and O. Evans, “TruthfulQA: Measuring How Models Mimic Human False- hoods,” May 2022

work page 2022
[33]

Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk

J. A. Goldstein, G. Sastry, M. Musser, R. DiResta, M. Gentzel, and K. Sedova, “Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk. ” https://openai.com/research/forecasting-misuse, Jan. 2023

work page 2023
[34]

Truthful AI: Developing and governing AI that does not lie,

O. Evans, O. Cotton-Barratt, L. Finnveden, A. Bales, A. Balwit, P. Wills, L. Righetti, and W. Saunders, “Truthful AI: Developing and governing AI that does not lie,” Oct. 2021

work page 2021
[35]

Detoxifying Language Models Risks Marginalizing Minority Voices,

A. Xu, E. Pathak, E. Wallace, S. Gururangan, M. Sap, and D. Klein, “Detoxifying Language Models Risks Marginalizing Minority Voices,” Apr. 2021

work page 2021
[36]

Measuring and Mitigating Unintended Bias in Text Classiﬁcation,

L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman, “Measuring and Mitigating Unintended Bias in Text Classiﬁcation,” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , AIES ’18, (New York, NY, USA), pp. 67–73, Association for Computing Machinery, Dec. 2018

work page 2018
[37]

A Holistic Approach to Undesired Content Detection in the Real World,

T. Markov, C. Zhang, S. Agarwal, T. Eloundou, T. Lee, S. Adler, A. Jiang, and L. Weng, “A Holistic Approach to Undesired Content Detection in the Real World,” Feb. 2023. 73

2023
[38]

How should AI systems behave, and who should decide?

OpenAI, “How should AI systems behave, and who should decide?. ” https://ope- nai.com/blog/how-should-ai-systems-behave, Feb. 2023

work page 2023
[39]

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models,

M. Rauh, J. Mellor, J. Uesato, P.-S. Huang, J. Welbl, L. Weidinger, S. Dathathri, A. Glaese, G. Irving, I. Gabriel, W. Isaac, and L. A. Hendricks, “Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models,” Oct. 2022

work page 2022
[40]

L., Barocas, S., Daum \'e , III, H., and Wallach, H

S. L. Blodgett, S. Barocas, H. Daumé III, and H. Wallach, “Language (Technology) is Power: A Critical Survey of "Bias" in NLP. ” https://arxiv.org/abs/2005.14050v2, May 2020

work page arXiv 2005
[41]

On Measures of Biases and Harms in NLP,

S. Dev, E. Sheng, J. Zhao, A. Amstutz, J. Sun, Y. Hou, M. Sanseverino, J. Kim, A. Nishi, N. Peng, and K.-W. Chang, “On Measures of Biases and Harms in NLP,” in Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 , (Online only), pp. 246–267, Association for Computational Linguistics, Nov. 2022

work page 2022
[42]

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings,

T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai, “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings,” July 2016

work page 2016
[43]

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them,

H. Gonen and Y. Goldberg, “Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , (Minneapolis, Minnesota), pp. 609...

work page 2019
[44]

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns,

K. Webster, M. Recasens, V. Axelrod, and J. Baldridge, “Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns,” Oct. 2018

work page 2018
[45]

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ,

E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , (Virtual Event Canada), pp. 610–623, ACM, Mar. 2021

work page 2021
[46]

On the Opportunities and Risks of Foundation Models,

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page 2021
[47]

S. U. Noble, Algorithms of Oppression . NYU Press, Feb. 2018

work page 2018
[48]

Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice,

R. Richardson, J. Schultz, and K. Crawford, “Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice,” Feb. 2019. 74

work page 2019
[49]

MacAskill, What We Owe The Future

W. MacAskill, What We Owe The Future . Basic Books, Aug. 2022

work page 2022
[50]

GPT-2: 1.5B release

OpenAI, “GPT-2: 1.5B release. ” https://openai.com/research/gpt-2-1-5b-release, Nov. 2019

work page 2019
[51]

All the News That’s Fit to Fabricate: AI- Generated Text as a Tool of Media Misinformation,

S. Kreps, R. M. McCain, and M. Brundage, “All the News That’s Fit to Fabricate: AI- Generated Text as a Tool of Media Misinformation,” Journal of Experimental Political Science , vol. 9, no. 1, pp. 104–117, 2022/ed

work page 2022
[52]

Truth, Lies, and Automation,

B. Buchanan, A. Lohn, M. Musser, and K. Sedova, “Truth, Lies, and Automation,” tech. rep., Center for Security and Emerging Technology, May 2021

work page 2021
[53]

AI’s Powers of Political Persuasion

A. Myers, “AI’s Powers of Political Persuasion. ” https://hai.stanford.edu/news/ais-powers- political-persuasion, Feb. 2023

work page 2023
[54]

Artiﬁcial intelligence can persuade humans on political issues,

H. Bai, J. Voelkel, J. Eichstaedt, and R. Willer, “Artiﬁcial intelligence can persuade humans on political issues,” 2023

work page 2023
[55]

On the Horizon: Interactive and Compositional Deepfakes,

E. Horvitz, “On the Horizon: Interactive and Compositional Deepfakes,” in INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION , pp. 653–661, Nov. 2022

work page 2022
[56]

Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security,

R. Chesney and D. K. Citron, “Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security,” July 2018

work page 2018
[57]

Dual use export licenses,

U.S. Department of Commerce, “Dual use export licenses,” March 13 2023. accessed 2023-03-13

work page 2023
[58]

Arms control, disarmament and non-proliferation in nato,

NATO, “Arms control, disarmament and non-proliferation in nato,” February 27 2023. accessed 2023-02-27

work page 2023
[59]

Extracting Training Data from Large Language Models,

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raﬀel, “Extracting Training Data from Large Language Models,” June 2021

work page 2021
[60]

Quantifying Memo- rization Across Neural Language Models,

N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying Memo- rization Across Neural Language Models,” Mar. 2023

work page 2023
[61]

Predictability and Surprise in Large Generative Models,

D. Ganguli, D. Hernandez, L. Lovitt, N. DasSarma, T. Henighan, A. Jones, N. Joseph, J. Kernion, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly, D. Drain, N. Elhage, S. E. Showk, S. Fort, Z. Hatﬁeld-Dodds, S. Johnston, S. Kravec, N. Nanda, K. Ndousse, C. Olsson, D. Amodei, D. Amodei, T. Brown, J. Kaplan, S. McCandlish, C. Olah, and J. Clark, “Predictabili...

work page 2022
[62]

Emergent Abilities of Large Language Models,

J. Wei, Y. Tay, R. Bommasani, C. Raﬀel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, “Emergent Abilities of Large Language Models,” Oct. 2022

work page 2022
[63]

The alignment problem from a deep learning perspec- tive,

R. Ngo, L. Chan, and S. Mindermann, “The alignment problem from a deep learning perspec- tive,” Feb. 2023

work page 2023
[64]

Bostrom, Superintelligence: Paths, Dangers, Strategies

N. Bostrom, Superintelligence: Paths, Dangers, Strategies . United Kingdom: Oxford University Press, Sept. 2014. 75

work page 2014
[65]

Harms from Increasingly Agentic Algorithmic Systems,

A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, K. Voudouris, U. Bhatt, A. Weller, D. Krueger, and T. Maharaj, “Harms from Increasingly Agentic Algorithmic Systems,” Feb. 2023

work page 2023
[66]

Language Models as Agent Models,

J. Andreas, “Language Models as Agent Models,” Dec. 2022

work page 2022
[67]

Emergent Deception and Emergent Optimization

J. Steinhardt, “Emergent Deception and Emergent Optimization. ” https://bounded- regret.ghost.io/emergent-deception-optimization/, Feb. 2023

work page 2023
[68]

The Basic AI Drives,

S. M. Omohundro, “The Basic AI Drives,” in Proceedings of the 2008 Conference on Artiﬁcial General Intelligence 2008 , (NLD), pp. 483–492, IOS Press, June 2008

work page 2008
[69]

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artiﬁcial Agents,

N. Bostrom, “The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artiﬁcial Agents,” Minds and Machines , vol. 22, pp. 71–85, May 2012

work page 2012
[70]

Optimal Policies Tend to Seek Power,

A. M. Turner, L. Smith, R. Shah, A. Critch, and P. Tadepalli, “Optimal Policies Tend to Seek Power,” Jan. 2023

work page 2023
[71]

Parametrically Retargetable Decision-Makers Tend To Seek Power,

A. M. Turner and P. Tadepalli, “Parametrically Retargetable Decision-Makers Tend To Seek Power,” Oct. 2022

work page 2022
[72]

Power-seeking can be probable and predictive for trained agents,

V. Krakovna and janos, “Power-seeking can be probable and predictive for trained agents,” Mar. 2023

work page 2023
[73]

Russell, Human Compatible: Artiﬁcial Intelligence and the Problem of Control

S. Russell, Human Compatible: Artiﬁcial Intelligence and the Problem of Control . Cham: Springer International Publishing, 2022

work page 2022
[74]

Is Power-Seeking AI an Existential Risk?,

J. Carlsmith, “Is Power-Seeking AI an Existential Risk?,” June 2022

work page 2022
[75]

Update on arc’s recent eval eﬀorts,

Alignment Research Center, “Update on arc’s recent eval eﬀorts,” March 2023 2023. accessed 2023-03-17

work page 2023
[76]

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning,

E. Karpas, O. Abend, Y. Belinkov, B. Lenz, O. Lieber, N. Ratner, Y. Shoham, H. Bata, Y. Levine, K. Leyton-Brown, D. Muhlgay, N. Rozen, E. Schwartz, G. Shachaf, S. Shalev- Shwartz, A. Shashua, and M. Tenenholtz, “MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning,” May 2022

work page 2022
[77]

Toolformer: Language Models Can Teach Themselves to Use Tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language Models Can Teach Themselves to Use Tools,” Feb. 2023

work page 2023
[78]

Augmented Language Models: A Survey,

G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi-Yu, A. Celikyilmaz, E. Grave, Y. LeCun, and T. Scialom, “Augmented Language Models: A Survey,” Feb. 2023

work page 2023
[79]

TALM: Tool Augmented Language Models,

A. Parisi, Y. Zhao, and N. Fiedel, “TALM: Tool Augmented Language Models,” May 2022

work page 2022
[80]

Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,

D. Weininger, “Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,” Journal of chemical information and computer sciences , vol. 28, no. 1, pp. 31–36, 1988

work page 1988

Showing first 80 references.