GPC: Large-Scale Generative Pretraining for Transferable Motor Control
Pith reviewed 2026-06-30 08:01 UTC · model grok-4.3
The pith
A GPT-style transformer on learned motion tokens produces reusable controllers for physics-based characters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that jointly optimizing a motion vocabulary via Finite Scalar Quantization with a control policy through end-to-end reinforcement learning, followed by training a GPT-style autoregressive transformer on the resulting discrete codes, produces a generative controller that generates controls for a physically simulated character by performing next-token prediction, yielding robust general-purpose controllers for downstream applications.
What carries the argument
The autoregressive transformer that performs next-token prediction over the learned discrete motion vocabulary to output control sequences.
Load-bearing premise
That the discrete codes capture structure an autoregressive model can predict to generate controls that transfer beyond the original training motions.
What would settle it
Measuring whether an adapted controller achieves high success on a novel task whose required motions do not appear in the pretraining dataset.
Figures
read the original abstract
Developing controllers capable of completing a wide range of tasks in a natural and life-like manner is a key challenge in enabling practical applications of physics-based character animation. In this work, we introduce Generative Pretrained Controllers (GPC), which leverage tokenization and next-token modeling to create general-purpose, reusable generative controllers from large-scale motion datasets. Our framework utilizes end-to-end reinforcement learning to jointly optimize a "motion vocabulary", modeled via Finite Scalar Quantization (FSQ), along with a corresponding control policy that can map the discrete codes to physics-based controls. After the "codebook" has been learned, the underlying structure of this large vocabulary is modeled by training a GPT-style autoregressive transformer, leading to a powerful generative controller that generates controls for a physically simulated character by performing next-token prediction. Once the generative controller has been trained, we propose a suite of adaptation techniques for finetuning the controller for new downstream tasks. Our proposed framework greatly simplifies the training process compared to previous tokenized methods, and achieves a 99.98% success rate in reproducing a vast corpus of motion clips. The generative controller exhibits a variety of natural emergent behaviors, such as responsive behaviors to perturbations and recovery behaviors after falling. This results in highly robust general purpose controllers for a variety of downstream applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Generative Pretrained Controllers (GPC) that jointly optimize a motion vocabulary via Finite Scalar Quantization (FSQ) and a control policy through end-to-end reinforcement learning on large-scale motion datasets. After codebook learning, a GPT-style autoregressive transformer models the vocabulary structure to generate physics-based controls via next-token prediction; the resulting controller is adapted to downstream tasks and reportedly achieves 99.98% success in reproducing the motion corpus along with emergent behaviors such as perturbation response and recovery.
Significance. If the discrete codes prove to support genuine extrapolation beyond the training distribution, the approach would offer a scalable pretraining route for reusable controllers in physics-based animation, potentially simplifying pipelines relative to prior tokenized methods while enabling robust transfer.
major comments (2)
- [Abstract] Abstract: the 99.98% reproduction success rate is presented without any description of held-out motion clips, train/test splits, error bars, or baseline comparisons, which directly undermines the central claim that the learned codes support transferable, general-purpose controllers rather than in-distribution reconstruction.
- [Framework description] Framework (joint FSQ+RL optimization): no analysis or ablation is supplied to show that the codes learned under the RL objective form sequences whose statistics are sufficiently regular and hierarchically structured for an autoregressive transformer to extrapolate to unseen motion statistics; the reported reproduction rate is consistent with codes optimized only for task fidelity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our work. We address each major comment below and will revise the manuscript to improve the presentation of results and add supporting analysis.
read point-by-point responses
-
Referee: [Abstract] Abstract: the 99.98% reproduction success rate is presented without any description of held-out motion clips, train/test splits, error bars, or baseline comparisons, which directly undermines the central claim that the learned codes support transferable, general-purpose controllers rather than in-distribution reconstruction.
Authors: We agree that the abstract would benefit from greater clarity on the evaluation. The 99.98% figure measures reproduction success on the motion corpus used during pretraining. The manuscript demonstrates transfer through adaptation to downstream tasks and emergent robustness properties. We will revise the abstract to explicitly note the in-distribution nature of the reproduction metric, reference the train/test protocol and error bars from the experiments section, and highlight baseline comparisons already present in the full paper. This will better contextualize the result without overstating generalization from reproduction alone. revision: yes
-
Referee: [Framework description] Framework (joint FSQ+RL optimization): no analysis or ablation is supplied to show that the codes learned under the RL objective form sequences whose statistics are sufficiently regular and hierarchically structured for an autoregressive transformer to extrapolate to unseen motion statistics; the reported reproduction rate is consistent with codes optimized only for task fidelity.
Authors: The end-to-end RL objective is intended to yield codes that are both task-effective and statistically regular enough for autoregressive modeling, which is evidenced by the successful training of the GPT-style transformer and the emergence of perturbation response and recovery behaviors. We acknowledge, however, that dedicated ablations examining code sequence statistics, hierarchical structure, and extrapolation to unseen motion distributions would strengthen the argument that the codes are not merely task-optimized. We will incorporate such analysis and ablations in the revised manuscript. revision: yes
Circularity Check
No significant circularity; derivation chain is self-contained
full rationale
The described pipeline (end-to-end RL joint optimization of FSQ vocabulary + policy, followed by separate GPT-style transformer training on extracted token sequences, then downstream adaptation) does not reduce any load-bearing claim to its inputs by construction. The 99.98% reproduction metric is an in-distribution evaluation on the training corpus but is not presented as a 'prediction' that is definitionally identical to the fit; the transferability claims rest on the autoregressive modeling step and adaptation techniques, which are independent modeling choices rather than tautological. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked in the provided text to justify core steps. The framework is a standard tokenization + autoregressive pretraining approach and remains externally falsifiable on held-out tasks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Patricia S. Abril and Robert Plant. The patent holder's dilemma: Buy, sell, or troll?. Communications of the ACM. 2007. doi:10.1145/1188913.1188915
-
[2]
Deciding equivalances among conjunctive aggregate queries
Sarah Cohen and Werner Nutt and Yehoshua Sagic. Deciding equivalances among conjunctive aggregate queries. 2007. doi:10.1145/1219092.1219093
-
[3]
Special issue: Digital Libraries. 1996
1996
-
[6]
The title of book two. 2008. doi:10.1007/3-540-09237-4
-
[7]
Asad Z. Spector. Achieving application requirements. Distributed Systems. 1990. doi:10.1145/90417.90738
-
[8]
Douglass and David Harel and Mark B
Bruce P. Douglass and David Harel and Mark B. Trakhtenbrot. Statecarts in use: structured analysis and object-orientation. Lectures on Embedded Systems. 1998. doi:10.1007/3-540-65193-4_29
-
[9]
Donald E. Knuth. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). 1997
1997
-
[10]
Donald E. Knuth. The Art of Computer Programming. 1998
1998
-
[11]
Structured Variational Inference Procedures and their Realizations (as incol)
Dan Geiger and Christopher Meek. Structured Variational Inference Procedures and their Realizations (as incol). Proceedings of Tenth International Workshop on Artificial Intelligence and Statistics, The Barbados
-
[12]
Catch me, if you can: Evading network signatures with web-based polymorphic worms
Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna. Catch me, if you can: Evading network signatures with web-based polymorphic worms. Proceedings of the first USENIX workshop on Offensive Technologies. 2007
2007
-
[13]
Catch me, if you can: Evading network signatures with web-based polymorphic worms
Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna. Catch me, if you can: Evading network signatures with web-based polymorphic worms. Proceedings of the first USENIX workshop on Offensive Technologies. 2008
2008
-
[14]
Catch me, if you can: Evading network signatures with web-based polymorphic worms
Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna. Catch me, if you can: Evading network signatures with web-based polymorphic worms. Proceedings of the first USENIX workshop on Offensive Technologies. 2009
2009
-
[15]
Sten Andler. Predicate Path expressions. Proceedings of the 6th. ACM SIGACT-SIGPLAN symposium on Principles of Programming Languages. 1979. doi:10.1145/567752.567774
-
[16]
LOGICS of Programs: AXIOMATICS and DESCRIPTIVE POWER
David Harel. LOGICS of Programs: AXIOMATICS and DESCRIPTIVE POWER. 1978
1978
-
[17]
Anisi , title =
David A. Anisi , title =
-
[18]
Clarkson
Kenneth L. Clarkson. Algorithms for Closest-Point Problems (Computational Geometry). 1985
1985
-
[19]
Introduction to Bayesian Statistics
Harry Thornburg. Introduction to Bayesian Statistics. 2001
2001
-
[20]
CLIFFORD: a Maple 11 Package for Clifford Algebra Computations, version 11
Rafal Ablamowicz and Bertfried Fauser. CLIFFORD: a Maple 11 Package for Clifford Algebra Computations, version 11. 2007
2007
-
[21]
Stats and Analysis
Poker-Edge.Com. Stats and Analysis. 2006
2006
-
[22]
A more perfect union
Barack Obama. A more perfect union. 2008
2008
-
[23]
The fountain of youth
Joseph Scientist. The fountain of youth. 2009
2009
-
[24]
Solder man
Dave Novak. Solder man. ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol. 145 (July 27--27, 2003). 2003. doi:99.9999/woot07-S422
2003
-
[25]
Interview with Bill Kinder: January 13, 2005
Newton Lee. Interview with Bill Kinder: January 13, 2005. Comput. Entertain. 2005. doi:10.1145/1057270.1057278
-
[26]
The Enabling of Digital Libraries
Bernard Rous. The Enabling of Digital Libraries. Digital Libraries. 2008
2008
-
[28]
(new) Finding minimum congestion spanning trees , journal =
Werneck, Renato and Setubal, Jo\. (new) Finding minimum congestion spanning trees , journal =. 2000 , issn =. doi:10.1145/351827.384253 , acmid =
-
[30]
Conti, Mauro and Di Pietro, Roberto and Mancini, Luigi V. and Mei, Alessandro , title =. Inf. Fusion , volume =. 2009 , issn =. doi:10.1016/j.inffus.2009.01.002 , acmid =
-
[31]
Li, Cheng-Lun and Buyuktur, Ayse G. and Hutchful, David K. and Sant, Natasha B. and Nainwal, Satyendra K. , title =. CHI '08 extended abstracts on Human factors in computing systems , year =. doi:10.1145/1358628.1358946 , acmid =
-
[32]
, title =
Hollis, Billy S. , title =. 1999 , isbn =
1999
-
[33]
Goossens, Michel and Rahtz, S. P. and Moore, Ross and Sutor, Robert S. , title =. 1999 , isbn =
1999
-
[34]
and Rosenberg, Arnold L
Buss, Jonathan F. and Rosenberg, Arnold L. and Knott, Judson D. , title =. 1987 , source =
1987
-
[35]
CHI '08: CHI '08 extended abstracts on Human factors in computing systems , year =
, note =. CHI '08: CHI '08 extended abstracts on Human factors in computing systems , year =
-
[36]
Algorithms for Closest-Point Problems (Computational Geometry) , year =
Clarkson, Kenneth Lee , advisor =. Algorithms for Closest-Point Problems (Computational Geometry) , year =
-
[37]
SIGCOMM Comput. Commun. Rev. , year =
-
[38]
IEEE TCSC Executive Committee , booktitle =. 2004 , isbn =. doi:http://dx.doi.org/10.1109/ICWS.2004.64 , acmid =
-
[39]
Distributed systems (2nd Ed.) , year =
-
[40]
, title =
Petrie, Charles J. , title =. 1986 , source =
1986
-
[41]
Donald E. Knuth. Seminumerical Algorithms. 1981
1981
-
[42]
E-commerce and cultural values , year =
Kong, Wei-Chang , Title =. E-commerce and cultural values , year =
-
[43]
E-commerce and cultural values , year =
Kong, Wei-Chang , type =. E-commerce and cultural values , year =
-
[44]
Chapter 9 , booktitle =
Kong, Wei-Chang , editor =. Chapter 9 , booktitle =. 2002 , address =
2002
-
[45]
E-commerce and cultural values , editor =
Kong, Wei-Chang , title =. E-commerce and cultural values , editor =. 2003 , isbn =
2003
-
[46]
E-commerce and cultural values - (InBook-num-in-chap) , chapter =
Kong, Wei-Chang , editor =. E-commerce and cultural values - (InBook-num-in-chap) , chapter =. 2004 , address =
2004
-
[47]
E-commerce and cultural values (Inbook-text-in-chap) , chapter =
Kong, Wei-Chang , editor =. E-commerce and cultural values (Inbook-text-in-chap) , chapter =. 2005 , address =
2005
-
[48]
E-commerce and cultural values (Inbook-num chap) , chapter =
Kong, Wei-Chang , editor =. E-commerce and cultural values (Inbook-num chap) , chapter =. 2006 , address =
2006
-
[49]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
RoHM: Robust Human Motion Reconstruction via Diffusion , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[50]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
TMR: Text-to-motion retrieval using contrastive 3D human motion synthesis , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[51]
ACM Transactions On Graphics (TOG) , volume=
Deepmimic: Example-guided deep reinforcement learning of physics-based character skills , author=. ACM Transactions On Graphics (TOG) , volume=. 2018 , publisher=
2018
-
[52]
ACM Transactions on Graphics (ToG) , volume=
Amp: Adversarial motion priors for stylized physics-based character control , author=. ACM Transactions on Graphics (ToG) , volume=. 2021 , publisher=
2021
-
[53]
ACM Transactions On Graphics (TOG) , volume=
Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters , author=. ACM Transactions On Graphics (TOG) , volume=. 2022 , publisher=
2022
-
[54]
ACM Transactions on Graphics (TOG) , volume=
Interactive character control with auto-regressive motion diffusion models , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=
2024
-
[55]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
AMASS: Archive of motion capture as surface shapes , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[56]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Generating diverse and natural 3d human motions from text , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[57]
ACM SIGGRAPH 2023 Conference Proceedings , keywords =
Tessler, Chen and Kasten, Yoni and Guo, Yunrong and Mannor, Shie and Chechik, Gal and Peng, Xue Bin , title =. ACM SIGGRAPH 2023 Conference Proceedings , keywords =. 2023 , isbn =
2023
-
[58]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Momask: Generative masked modeling of 3d human motions , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[59]
ACM Transactions on Graphics (TOG) , volume=
Object motion guided human motion synthesis , author=. ACM Transactions on Graphics (TOG) , volume=. 2023 , publisher=
2023
-
[60]
ACM Transactions on Graphics (TOG) , volume=
Robust solving of optical motion capture data by denoising , author=. ACM Transactions on Graphics (TOG) , volume=. 2018 , publisher=
2018
-
[61]
European Conference on Computer Vision , pages=
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation , author=. European Conference on Computer Vision , pages=. 2025 , organization=
2025
-
[62]
ACM SIGGRAPH 2024 Conference Papers , pages=
Flexible motion in-betweening with diffusion models , author=. ACM SIGGRAPH 2024 Conference Papers , pages=
2024
-
[63]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Humor: 3d human motion model for robust pose estimation , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[64]
The Eleventh International Conference on Learning Representations , year=
Human Motion Diffusion Model , author=. The Eleventh International Conference on Learning Representations , year=
-
[65]
Advances in Neural Information Processing Systems , volume=
Motion-x: A large-scale 3d expressive whole-body human motion dataset , author=. Advances in Neural Information Processing Systems , volume=
-
[66]
Advances in neural information processing systems , volume=
Decision transformer: Reinforcement learning via sequence modeling , author=. Advances in neural information processing systems , volume=
-
[67]
Advances in neural information processing systems , volume=
Offline reinforcement learning as one big sequence modeling problem , author=. Advances in neural information processing systems , volume=
-
[68]
ACM Transactions on Graphics (TOG) , volume=
Listen, denoise, action! audio-driven motion synthesis with diffusion models , author=. ACM Transactions on Graphics (TOG) , volume=. 2023 , publisher=
2023
-
[69]
International Conference on Learning Representations , year=
Should I Run Offline Reinforcement Learning or Behavioral Cloning? , author=. International Conference on Learning Representations , year=
-
[70]
Transactions on Machine Learning Research , issn=
Soft Diffusion: Score Matching with General Corruptions , author=. Transactions on Machine Learning Research , issn=. 2023 , url=
2023
-
[71]
arXiv preprint arXiv:2411.02780 , year=
How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion , author=. arXiv preprint arXiv:2411.02780 , year=
-
[72]
Advances in Neural Information Processing Systems , volume=
Ambient diffusion: Learning clean distributions from corrupted data , author=. Advances in Neural Information Processing Systems , volume=
-
[73]
Forty-first International Conference on Machine Learning , year=
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data , author=. Forty-first International Conference on Machine Learning , year=
-
[74]
The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
Learning Diffusion Priors from Observations by Expectation Maximization , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
-
[75]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Leveraging contaminated datasets to learn clean-data distribution with purified generative adversarial networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[76]
Forty-first International Conference on Machine Learning , year=
Contamination-Resilient Anomaly Detection via Adversarial Learning on Partially-Observed Normal and Anomalous Data , author=. Forty-first International Conference on Machine Learning , year=
-
[77]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Noisier2noise: Learning to denoise from unpaired noisy data , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[78]
International Conference on Machine Learning , pages=
Noise2Noise: Learning image restoration without clean data , author=. International Conference on Machine Learning , pages=. 2018 , organization=
2018
-
[79]
Dimakis , booktitle=
Ashish Bora and Eric Price and Alexandros G. Dimakis , booktitle=. Ambient. 2018 , url=
2018
-
[80]
Motion in Games: First International Workshop, MIG 2008, Utrecht, The Netherlands, June 14-17, 2008
Automatic estimation of skeletal motion from optical motion capture data , author=. Motion in Games: First International Workshop, MIG 2008, Utrecht, The Netherlands, June 14-17, 2008. Revised Papers 1 , pages=. 2008 , organization=
2008
-
[81]
ACM SIGGRAPH 2010 papers , pages=
Sampling-based contact-rich motion control , author=. ACM SIGGRAPH 2010 papers , pages=. 2010 , publisher=
2010
-
[82]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Perpetual humanoid control for real-time simulated avatars , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[83]
ACM Transactions on Graphics (TOG) , volume=
Moconvq: Unified physics-based motion control via scalable discrete representations , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=
2024
-
[84]
2024 , journal=
Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal and Peng, Xue Bin , title =. 2024 , journal=
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.