pith. sign in

hub Canonical reference

A Generalist Agent

Canonical reference. 94% of citing Pith papers cite this work as background.

62 Pith papers citing it
Background 94% of classified citations
abstract

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

hub tools

citation-role summary

background 16 baseline 1

citation-polarity summary

representative citing papers

Aero-World: Action-Conditioned Aerial Video Generation from Inertial Controls

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Aero-World adapts a pretrained latent diffusion transformer for action-conditioned aerial video generation by injecting inertial action tokens and using a frozen latent-space Physics Probe for inertial consistency supervision during LoRA finetuning, with a new AeroBench benchmark showing improved AA

TokaMind for Power Grid: Cross-Domain Transfer from Fusion Plasma

physics.plasm-ph · 2026-05-10 · unverdicted · novelty 7.0

TokaMind, pre-trained on MAST tokamak data, transfers to power grid PMU data for severe event classification with F1 0.837, where difficulty depends on grid topology and CSD indicators boost early-warning performance over CNN baselines.

Automatic Generation of High-Performance RL Environments

cs.LG · 2026-03-12 · conditional · novelty 7.0

Closed-loop prompt-based translation with hierarchical verification and iterative repair produces equivalent high-performance RL environments across five cases including new TCGJax.

Any-point Trajectory Modeling for Policy Learning

cs.RO · 2023-12-28 · conditional · novelty 7.0

ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

RELO: Reinforcement Learning to Localize for Visual Object Tracking

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 57.5% AUC on LaSOText without template updates.

citing papers explorer

Showing 50 of 62 citing papers.