Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Pith reviewed 2026-05-18 06:33 UTC · model grok-4.3
The pith
Scaling inference compute with advanced strategies can outperform scaling model size for language models on math problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance. For example, the Llemma-7B model, when paired with our novel tree search algorithm, consistently outperforms the Llemma-34B model across all tested inference strategies on the MATH benchmark.
What carries the argument
Empirical cost-performance curves comparing inference strategies (greedy search, majority voting, best-of-n, weighted voting, and two tree search algorithms) across model sizes and total token budgets on the MATH benchmark.
If this is right
- For a fixed compute budget, allocating more operations to inference steps on a smaller model produces higher accuracy than using those operations to run a larger model.
- Tree search algorithms create better cost-performance frontiers than voting or greedy methods across the tested range.
- There exist model-plus-strategy pairs that dominate others in the accuracy-versus-compute plane on MATH.
Where Pith is reading between the lines
- Model developers might gain more by designing architectures that support efficient long-horizon search than by maximizing parameter count alone.
- The same inference-scaling pattern could appear on other reasoning benchmarks if the underlying error-correction mechanism is not MATH-specific.
- Hardware systems optimized for variable-length tree search rather than fixed batch inference could unlock further efficiency.
Load-bearing premise
The measured cost and accuracy differences arise mainly from model size and inference strategy rather than from unmeasured details of prompts, formatting, or benchmark artifacts.
What would settle it
Re-running the same model sizes and strategies on MATH while equalizing total floating-point operations shows the 34B model with basic inference matching or exceeding the 7B model with tree search.
read the original abstract
While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies such as greedy search, majority voting, best-of-$n$, weighted voting, and two different tree search algorithms, using different model sizes and compute budgets. Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Additionally, smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance. For example, the Llemma-7B model, when paired with our novel tree search algorithm, consistently outperforms the Llemma-34B model across all tested inference strategies on the MATH benchmark. We hope these insights contribute to a deeper understanding of inference scaling laws (test-time scaling laws) for LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper empirically studies inference scaling laws for LLMs on mathematical problem-solving, comparing inference strategies (greedy search, majority voting, best-of-n, weighted voting, and two tree search algorithms) across model sizes and compute budgets on the MATH benchmark. It claims that scaling inference compute via advanced strategies is more efficient than scaling model parameters, with smaller models like Llemma-7B plus a novel tree search algorithm offering Pareto-superior cost-performance trade-offs over larger models like Llemma-34B.
Significance. If the empirical comparisons hold under fair compute accounting, the results would indicate that inference-time optimization can substitute for larger model sizes in some settings, providing practical guidance for efficient LLM deployment and highlighting the value of test-time scaling laws as a complement to training scaling laws.
major comments (2)
- The central claim that Llemma-7B with the novel tree search outperforms Llemma-34B (and offers better efficiency) depends on equivalent total compute across conditions. Tree search requires multiple forward passes, branching, and backtracking; if cost is measured only in tokens or wall-clock time without explicit FLOPs or model-call normalization that holds the budget constant, the reported Pareto dominance may be an artifact of unequal effective compute rather than strategy superiority.
- The abstract and reported comparisons do not specify an explicit FLOPs or model-call budget held constant across model sizes and strategies. Without this, it is unclear whether the measured performance differences arise from inference strategy efficiency or from unaccounted differences in total computation.
minor comments (2)
- Add details on statistical controls, variance across runs, and exact compute accounting (including how tree search calls are tallied) to strengthen the support for the efficiency claims.
- Clarify the precise definition and implementation of the novel tree search algorithm, including any hyperparameters that affect compute usage.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback emphasizing the need for transparent and equivalent compute accounting across model sizes and inference strategies. We agree that this is critical for interpreting the efficiency claims. Below we respond to each major comment and outline the revisions we will make to strengthen the presentation.
read point-by-point responses
-
Referee: The central claim that Llemma-7B with the novel tree search outperforms Llemma-34B (and offers better efficiency) depends on equivalent total compute across conditions. Tree search requires multiple forward passes, branching, and backtracking; if cost is measured only in tokens or wall-clock time without explicit FLOPs or model-call normalization that holds the budget constant, the reported Pareto dominance may be an artifact of unequal effective compute rather than strategy superiority.
Authors: We agree that fair and explicit compute normalization is necessary to support the efficiency comparisons. In the experiments, we held the inference compute budget constant by fixing the total number of tokens generated (or equivalently the number of model forward passes) for each strategy under each budget level, with tree search explicitly counting all tokens from branching and backtracking. Because the primary comparisons for Pareto dominance are performed within the same model size before contrasting across sizes, the token-based budget provides a consistent measure. That said, we acknowledge that an explicit statement of this normalization (including its relation to FLOPs) would remove any ambiguity. We will add a dedicated paragraph in the methods section and update the figure captions to detail the exact model-call counting procedure. revision: yes
-
Referee: The abstract and reported comparisons do not specify an explicit FLOPs or model-call budget held constant across model sizes and strategies. Without this, it is unclear whether the measured performance differences arise from inference strategy efficiency or from unaccounted differences in total computation.
Authors: The manuscript states that experiments were conducted across different model sizes and compute budgets, but we accept that neither the abstract nor the main text currently provides an explicit definition of the budget in FLOPs or normalized model calls. We will revise the abstract to include a concise statement that all comparisons are performed under matched total inference compute (measured in tokens generated / model calls) and add a short subsection describing the normalization, confirming that tree-search costs are fully included and that cross-model comparisons respect the differing per-token FLOPs of each model size. revision: yes
Circularity Check
Purely empirical study with no derivation chain or self-referential reductions
full rationale
The paper conducts an empirical comparison of inference strategies (greedy search, majority voting, best-of-n, weighted voting, and tree search) across model sizes and compute budgets on the MATH benchmark. No mathematical derivations, first-principles predictions, or equations are presented that could reduce to fitted inputs or self-citations. Claims rest on observed performance and measured costs rather than any self-definitional or load-bearing self-referential steps. The analysis is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Additionally, smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 25 Pith papers
-
Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models
Agentic program search over frozen embedding APIs yields a parameter-free inference algebra—a softmax-weighted centroid of top-K documents interpolated with the query—that lifts nDCG@10 across seven model families on ...
-
Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models
A softmax-weighted centroid of the local top-K documents interpolated with the query improves nDCG@10 for frozen embedding models across seven families on held-out BEIR data.
-
Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation
BLADE uses Bayesian list-wise alignment with dynamic estimation to create a self-evolving target that overcomes limitations of static references in LLM-based recommendation, yielding sustained gains in ranking and com...
-
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
-
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
-
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.
-
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
LCPO trains L1 reasoning models to adhere to prompt-specified CoT lengths, supporting accuracy-compute trade-offs and yielding short reasoning models that outperform larger baselines at matched lengths.
-
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
-
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation
OpenDeepThink improves LLM reasoning by ranking parallel candidate traces via Bradley-Terry aggregation of LLM pairwise judgments, achieving a +405 Codeforces Elo gain on Gemini 3.1 Pro after eight rounds.
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.
-
When Less is Enough: Efficient Inference via Collaborative Reasoning
A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.
-
Evaluation-driven Scaling for Scientific Discovery
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster ...
-
LACE: Lattice Attention for Cross-thread Exploration
LACE enables parallel reasoning paths in LLMs to communicate via lattice attention and error-correct using synthetic training data, improving accuracy by over 7 points over standard parallel search.
-
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
UCAS refines RLVR advantage signals with a logit-space self-confidence proxy for response-level modulation and asymmetric token-level penalties based on raw logit certainty to boost exploration and reduce entropy collapse.
-
Entropy After </Think> for reasoning model early exiting
Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.
-
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
DeepSearch embeds MCTS into RLVR training with global frontier selection, entropy guidance, and adaptive replay to achieve 62.95% average accuracy on math reasoning benchmarks while using 5.7x fewer GPU hours than ext...
-
Muon is Scalable for LLM Training
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
-
Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling
Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while sel...
-
Physical Foundation Models: Fixed hardware implementations of large-scale neural networks
Physical Foundation Models are fixed physical hardware realizations of foundation-scale neural networks that compute via inherent material dynamics, potentially delivering orders-of-magnitude gains in energy efficienc...
-
Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification
Domain-specialized LLM agents for hardware verification close 95-99% coverage using 4-13x fewer tokens and 2-4x faster convergence than general-purpose agents by reallocating tokens toward coverage-directed reasoning.
-
LACE: Lattice Attention for Cross-thread Exploration
LACE adds lattice attention to let parallel LLM reasoning threads interact and correct errors, raising accuracy over 7 points versus standard independent sampling.
-
LACE: Lattice Attention for Cross-thread Exploration
LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.
-
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.
-
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
Reference graph
Works this paper leans on
-
[1]
Making Language Models Better Reasoners with Step-Aware Verifier
Li, Yifei and Lin, Zeqi and Zhang, Shizhuo and Fu, Qiang and Chen, Bei and Lou, Jian-Guang and Chen, Weizhu. Making Language Models Better Reasoners with Step-Aware Verifier. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.291
-
[2]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[3]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
- [4]
-
[5]
Principle-driven self-alignment of language models from scratch with minimal human supervision
Principle-driven self-alignment of language models from scratch with minimal human supervision , author=. arXiv preprint arXiv:2305.03047 , year=
-
[6]
Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =
work page 2023
-
[7]
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , journal=
-
[8]
Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , journal=
-
[9]
Advances in Neural Information Processing Systems , editor=
Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=
work page 2022
-
[10]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Exploring the limits of transfer learning with a unified text-to-text transformer , author=. arXiv preprint arXiv:1910.10683 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1910
- [11]
-
[12]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models , author=. arXiv preprint arXiv:2206.04615 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
arXiv preprint arXiv:2304.01196 , year=
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data , author=. arXiv preprint arXiv:2304.01196 , year=
-
[14]
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback , author=. arXiv preprint arXiv:2203.02155 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
work page 2019
-
[16]
and Salakhutdinov, Ruslan , journal=
Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , journal=. Transformer-
-
[17]
Adam: A Method for Stochastic Optimization
Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
The Twelfth International Conference on Learning Representations , year=
Think before you speak: Training Language Models With Pause Tokens , author=. The Twelfth International Conference on Learning Representations , year=
-
[19]
Jacob Pfau and William Merrill and Samuel R. Bowman , booktitle=. Let. 2024 , url=
work page 2024
-
[20]
Neural networks: Tricks of the trade , pages=
Efficient backprop , author=. Neural networks: Tricks of the trade , pages=. 2012 , publisher=
work page 2012
-
[21]
Layer normalization , author=. arXiv preprint arXiv:1607.06450 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Proceedings of the national academy of sciences , volume=
Overcoming catastrophic forgetting in neural networks , author=. Proceedings of the national academy of sciences , volume=. 2017 , publisher=
work page 2017
- [23]
-
[24]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[25]
Principles and procedures of statistics
Principles and procedures of statistics , author=. Principles and procedures of statistics. , year=
-
[26]
arXiv preprint arXiv:2202.08137 , year=
A data-driven approach for learning to control computers , author=. arXiv preprint arXiv:2202.08137 , year=
-
[27]
Advances in Neural Information Processing Systems , volume=
Generative adversarial imitation learning , author=. Advances in Neural Information Processing Systems , volume=
-
[28]
2018 IEEE international conference on robotics and automation (ICRA) , pages=
End-to-end driving via conditional imitation learning , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=
work page 2018
-
[29]
End to End Learning for Self-Driving Cars
End to end learning for self-driving cars , author=. arXiv preprint arXiv:1604.07316 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [30]
-
[31]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Deep q-learning from demonstrations , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[32]
Control of memory, active perception, and action in
Oh, Junhyuk and Chockalingam, Valliappa and Lee, Honglak and others , booktitle=. Control of memory, active perception, and action in. 2016 , organization=
work page 2016
-
[33]
Multi-task curriculum learning in a complex, visual, hard-exploration domain:
Kanitscheider, Ingmar and Huizinga, Joost and Farhi, David and Guss, William Hebgen and Houghton, Brandon and Sampedro, Raul and Zhokhov, Peter and Baker, Bowen and Ecoffet, Adrien and Tang, Jie and others , journal=. Multi-task curriculum learning in a complex, visual, hard-exploration domain:
-
[34]
Sample efficient reinforcement learning through learning from demonstrations in
Scheller, Christian and Schraner, Yanick and Vogel, Manfred , booktitle=. Sample efficient reinforcement learning through learning from demonstrations in. 2020 , organization=
work page 2020
-
[35]
Guss, William H and Houghton, Brandon and Topin, Nicholay and Wang, Phillip and Codel, Cayden and Veloso, Manuela and Salakhutdinov, Ruslan , journal=. Mine
-
[36]
A deep hierarchical approach to lifelong learning in
Tessler, Chen and Givony, Shahar and Zahavy, Tom and Mankowitz, Daniel and Mannor, Shie , booktitle=. A deep hierarchical approach to lifelong learning in
-
[37]
Most Played Games in 2021, Ranked by Peak Concurrent Players , journal =
Twinfinite Staff , date =. Most Played Games in 2021, Ranked by Peak Concurrent Players , journal =
work page 2021
-
[38]
Exploration by Random Network Distillation
Exploration by random network distillation , author=. arXiv preprint arXiv:1810.12894 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
Advances in Neural Information Processing Systems , volume=
Unifying count-based exploration and intrinsic motivation , author=. Advances in Neural Information Processing Systems , volume=
-
[40]
First return, then explore , author=. Nature , volume=. 2021 , publisher=
work page 2021
-
[41]
Reinforcement learning: An introduction , author=. 2018 , publisher=
work page 2018
-
[42]
Jaderberg, Max and Czarnecki, Wojciech M and Dunning, Iain and Marris, Luke and Lever, Guy and Castaneda, Antonio Garcia and Beattie, Charles and Rabinowitz, Neil C and Morcos, Ari S and Ruderman, Avraham and others , journal=. Human-level performance in 3. 2019 , publisher=
work page 2019
-
[43]
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , author=. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=
work page 2021
-
[44]
Advances in Neural Information Processing Systems , volume=
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , author=. Advances in Neural Information Processing Systems , volume=
-
[45]
Advances in Neural Information Processing Systems , volume=
How transferable are features in deep neural networks? , author=. Advances in Neural Information Processing Systems , volume=
-
[46]
arXiv preprint arXiv:1909.07528 , year=
Emergent tool use from multi-agent autocurricula , author=. arXiv preprint arXiv:1909.07528 , year=
-
[47]
Dota 2 with Large Scale Deep Reinforcement Learning
Dota 2 with large scale deep reinforcement learning , author=. arXiv preprint arXiv:1912.06680 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[48]
arXiv preprint arXiv:2107.12808 , year=
Open-ended learning leads to generally capable agents , author=. arXiv preprint arXiv:2107.12808 , year=
-
[49]
International Conference on Machine Learning , pages=
Learning transferable visual models from natural language supervision , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[50]
Hierarchical text-conditional image generation with
Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , journal=. Hierarchical text-conditional image generation with
-
[51]
IEEE Robotics and Automation Letters , volume=
A machine learning approach to visual perception of forest trails for mobile robots , author=. IEEE Robotics and Automation Letters , volume=. 2015 , publisher=
work page 2015
-
[52]
Machine Learning Proceedings 1992 , pages=
Learning to fly , author=. Machine Learning Proceedings 1992 , pages=. 1992 , publisher=
work page 1992
-
[53]
ACM Computing Surveys (CSUR) , volume=
Imitation learning: A survey of learning methods , author=. ACM Computing Surveys (CSUR) , volume=. 2017 , publisher=
work page 2017
-
[54]
2019 International Conference on Robotics and Automation (ICRA) , pages=
Learning from demonstration in the wild , author=. 2019 International Conference on Robotics and Automation (ICRA) , pages=. 2019 , organization=
work page 2019
-
[55]
International conference on machine learning , pages=
Imitating latent policies from observation , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[56]
2018 IEEE International Conference on Robotics and Automation (ICRA) , pages=
Imitation from observation: Learning to imitate behaviors from raw video via context translation , author=. 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2018 , organization=
work page 2018
-
[57]
Silver, David and Huang, Aja and Maddison, Chris J and Guez, Arthur and Sifre, Laurent and Van Den Driessche, George and Schrittwieser, Julian and Antonoglou, Ioannis and Panneershelvam, Veda and Lanctot, Marc and others , journal=. Mastering the game of. 2016 , publisher=
work page 2016
-
[58]
Xiaohua Zhai and Alexander Kolesnikov and Neil Houlsby and Lucas Beyer , title =. CoRR , volume =. 2021 , url =. 2106.04560 , timestamp =
-
[59]
Nature Machine Intelligence , volume=
Biological underpinnings for lifelong learning machines , author=. Nature Machine Intelligence , volume=. 2022 , publisher=
work page 2022
-
[60]
Proceedings of the European conference on computer vision (ECCV) , pages=
Exploring the limits of weakly supervised pretraining , author=. Proceedings of the European conference on computer vision (ECCV) , pages=
-
[61]
Vinyals, Oriol and Babuschkin, Igor and Czarnecki, Wojciech M and Mathieu, Micha. Grandmaster level in. Nature , volume=. 2019 , publisher=
work page 2019
-
[62]
Advances in Neural Information Processing Systems , volume=
Language models are few-shot learners , author=. Advances in Neural Information Processing Systems , volume=
-
[63]
On the Opportunities and Risks of Foundation Models
On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[64]
Robotics and autonomous systems , volume=
A survey of robot learning from demonstration , author=. Robotics and autonomous systems , volume=. 2009 , publisher=
work page 2009
-
[65]
Trends in cognitive sciences , volume=
Is imitation learning the route to humanoid robots? , author=. Trends in cognitive sciences , volume=. 1999 , publisher=
work page 1999
- [66]
-
[67]
Advances in Neural Information Processing Systems , volume=
Alvinn: An autonomous land vehicle in a neural network , author=. Advances in Neural Information Processing Systems , volume=
-
[68]
2018 IEEE international conference on robotics and automation (ICRA) , pages=
Time-contrastive networks: Self-supervised learning from video , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=
work page 2018
-
[69]
2015 IEEE International Conference on Robotics and Automation (ICRA) , pages=
Learning inverse dynamics models with contacts , author=. 2015 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2015 , organization=
work page 2015
-
[70]
Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
Transfer from simulation to real world through learning deep inverse dynamics model , author=. arXiv preprint arXiv:1610.03518 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[71]
European symposium on artificial neural networks , number=
Learning inverse dynamics: a comparison , author=. European symposium on artificial neural networks , number=
-
[72]
Peng, Xue Bin and Kanazawa, Angjoo and Malik, Jitendra and Abbeel, Pieter and Levine, Sergey , journal=. 2018 , publisher=
work page 2018
-
[73]
Recent Advances in Imitation Learning from Observation
Recent advances in imitation learning from observation , author=. arXiv preprint arXiv:1905.13566 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[74]
Behavioral Cloning from Observation
Behavioral cloning from observation , author=. arXiv preprint arXiv:1805.01954 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[75]
Playing hard exploration games by watching
Aytar, Yusuf and Pfaff, Tobias and Budden, David and Paine, Thomas and Wang, Ziyu and De Freitas, Nando , journal=. Playing hard exploration games by watching
-
[76]
International Conference on Machine Learning , pages=
Agent57: Outperforming the atari human benchmark , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[77]
Text and Code Embeddings by Contrastive Pre-Training
Text and Code Embeddings by Contrastive Pre-Training , author=. arXiv preprint arXiv:2201.10005 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [78]
-
[79]
Advances in Neural Information Processing Systems , volume=
Hindsight experience replay , author=. Advances in Neural Information Processing Systems , volume=
-
[80]
International conference on machine learning , pages=
Universal value function approximators , author=. International conference on machine learning , pages=. 2015 , organization=
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.