pith. sign in

super hub Canonical reference

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Canonical reference. 73% of citing Pith papers cite this work as background.

136 Pith papers citing it
Background 73% of classified citations
abstract

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

hub tools

citation-role summary

background 35 method 4 baseline 1 dataset 1

citation-polarity summary

claims ledger

  • abstract Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example)

authors

co-cited works

clear filters

representative citing papers

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

cs.SE · 2024-03-25 · conditional · novelty 8.0

RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.

ROSE: Retrieval-Oriented Segmentation Enhancement

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

ROSE is a retrieval-augmented plug-in that improves MLLM segmentation on novel and emerging entities by fetching web text and images and deciding when to use them.

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

cs.CL · 2025-11-02 · unverdicted · novelty 7.0

TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

CodeMind: Evaluating Large Language Models for Code Reasoning

cs.SE · 2024-02-15 · unverdicted · novelty 7.0

CodeMind evaluates ten LLMs on four benchmarks using three new code reasoning tasks, finding performance varies by model size and drops with complexity while showing no correlation with bug repair ability.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Generative Agents: Interactive Simulacra of Human Behavior cs.HC · 2023-04-07 · accept · none · ref 22 · internal anchor

    Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.

  • Making Abstraction Concrete: A Design Space and Interaction Model of Abstraction in Interactive Systems cs.HC · 2026-05-11 · unverdicted · none · ref 25 · internal anchor

    A survey of 457 papers yields a six-dimensional design space for abstraction in interactive systems that reframes gulfs of execution and evaluation while articulating cognitive and design processes for bridging abstraction gaps.

  • Large Language Models have Chain-of-Affect cs.HC · 2025-12-13 · unverdicted · none · ref 16 · internal anchor

    LLMs exhibit structured chain-of-affect dynamics with stable family fingerprints, convergence to accumulation-overload-numbing under negative exposure, and downstream effects on generation, human interaction, and group polarization.

  • Regimes of Scale in AI Meteorology cs.HC · 2026-04-07 · unverdicted · none · ref 8 · internal anchor

    AI/ML weather tools face integration challenges from mismatched 'regimes of scale' in how data and models are organized compared to traditional meteorology practices.

  • Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale cs.HC · 2026-03-09 · unverdicted · none · ref 2 · internal anchor

    Sandpiper orchestrates AI and human efforts for scalable, privacy-preserving annotation of educational discourse with built-in benchmarking against human labels.