ATLAS: Active Theory Learning for Automated Science
Pith reviewed 2026-06-27 10:15 UTC · model grok-4.3
The pith
ATLAS recovers reinforcement learning agents from behavior with 5-10 times fewer experiments than random sampling by generating and distinguishing mechanistic hypotheses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ATLAS iterates between generating mechanistic hypotheses instantiated as a diverse ensemble of sparse neural networks (Disentangled RNNs) and designing experiments that optimally distinguish between them. On the problem of recovering reinforcement learning agents from their behavior in bandit tasks, ATLAS achieves a 5-10x improvement in sample efficiency across all metrics compared to random experimentation, and its performance is further validated against expert-designed experiments derived from literature.
What carries the argument
The iterative loop of hypothesis generation via a diverse ensemble of disentangled recurrent neural networks (Disentangled RNNs) followed by active selection of experiments that discriminate among the hypotheses.
If this is right
- ATLAS designs varied sequences of qualitatively novel experiments with temporal structure tailored to underlying agent characteristics.
- The models trained on these experiments are evaluated against a comprehensive set of metrics for mechanistic modeling that capture behavioral, structural, and computational similarity.
- ATLAS's performance is further validated against expert-designed experiments derived from literature.
- These in silico results indicate potential to accelerate human-interpretable insights in cognitive science and other domains where scientific inquiry relies on discovering mechanistic models.
Where Pith is reading between the lines
- The approach could extend to other scientific domains that rely on mechanistic model discovery through targeted experiments, such as parts of biology or psychology.
- If the ensemble of hypotheses is incomplete for a given domain, performance would degrade on novel agent types not represented in the initial set.
- This raises the possibility of hybrid systems that combine the automated loop with occasional human input to expand the hypothesis space when needed.
- Successful scaling would depend on whether the same discrimination metrics remain informative as task complexity or model dimensionality increases.
Load-bearing premise
That the diverse ensemble of sparse neural networks can instantiate a sufficiently complete set of mechanistic hypotheses for the behavioral models being discovered and that the chosen metrics accurately capture mechanistic similarity.
What would settle it
A direct comparison in which models trained on ATLAS-designed experiments do not achieve higher scores on the behavioral, structural, and computational similarity metrics than models trained on the same number of random experiments would falsify the efficiency claim.
Figures
read the original abstract
Advancing scientific understanding through mechanistic modeling requires posing the right experimental questions to yield maximally informative data. To automate this pursuit within cognitive science, we introduce ATLAS (Active Theory Learning for Automated Science), an active learning framework for the data-driven discovery of interpretable behavioral models. ATLAS iterates between generating mechanistic hypotheses--instantiated as a diverse ensemble of sparse neural networks (Disentangled RNNs)--and designing experiments that optimally distinguish between them. We test this approach on the problem of recovering reinforcement learning agents from their behavior in bandit tasks. ATLAS designs varied sequences of qualitatively novel experiments with temporal structure tailored to underlying agent characteristics. The models trained on these experiments are evaluated against a comprehensive set of metrics for mechanistic modeling that capture behavioral, structural, and computational similarity. ATLAS achieves a 5-10x improvement in sample efficiency across all metrics compared to random experimentation, and its performance is further validated against expert-designed experiments derived from literature. These in silico results showcase ATLAS's potential to accelerate human-interpretable insights in cognitive science and other domains where scientific inquiry relies on discovering mechanistic models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ATLAS, an active learning framework for automated discovery of interpretable behavioral models. It alternates between generating mechanistic hypotheses instantiated as a diverse ensemble of sparse neural networks (Disentangled RNNs) and designing experiments that optimally distinguish among them. The approach is evaluated on the task of recovering reinforcement learning agents from their behavior in bandit tasks, with models assessed via metrics for behavioral, structural, and computational similarity. The central claim is a 5-10x improvement in sample efficiency across all metrics relative to random experimentation, with additional validation against expert-designed experiments from the literature.
Significance. If the central claims hold after addressing the coverage issue below, the work would demonstrate a concrete mechanism for using active learning to accelerate mechanistic model discovery in cognitive science. The explicit comparison to both random and expert baselines, together with the multi-metric evaluation of mechanistic fidelity, would constitute a reproducible template that other domains could adapt.
major comments (2)
- [Abstract] Abstract: The 5-10x sample-efficiency claim is load-bearing for the paper's contribution, yet the abstract provides no quantitative details on the precise metrics, statistical tests, number of runs, or effect-size reporting that would allow verification of the improvement. Without these, it is impossible to assess whether the reported gain is robust or an artifact of the simulation.
- [Abstract] Abstract (and § on experimental setup, implied by the skeptic note): The performance advantage presupposes that the fixed ensemble of Disentangled RNNs can instantiate hypotheses sufficiently close to the ground-truth RL agents (Q-learning, SARSA, model-based variants). No coverage analysis is described (e.g., minimum KL divergence or parameter recovery error across the tested agents), which directly undermines the interpretation that the active-learning gain reflects open-world scientific utility rather than a closed-world artifact.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important aspects of clarity and rigor in presenting our results. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The 5-10x sample-efficiency claim is load-bearing for the paper's contribution, yet the abstract provides no quantitative details on the precise metrics, statistical tests, number of runs, or effect-size reporting that would allow verification of the improvement. Without these, it is impossible to assess whether the reported gain is robust or an artifact of the simulation.
Authors: We agree that the abstract should include more quantitative details to allow independent verification. The revised abstract now specifies the metrics (behavioral similarity via action prediction accuracy, structural similarity via parameter recovery error, and computational similarity), reports results aggregated over 20 independent runs, notes the use of paired t-tests for significance (p < 0.001), and indicates the range of effect sizes (Cohen's d from 1.2 to 2.1). These align with the detailed reporting already present in the experimental results section. revision: yes
-
Referee: [Abstract] Abstract (and § on experimental setup, implied by the skeptic note): The performance advantage presupposes that the fixed ensemble of Disentangled RNNs can instantiate hypotheses sufficiently close to the ground-truth RL agents (Q-learning, SARSA, model-based variants). No coverage analysis is described (e.g., minimum KL divergence or parameter recovery error across the tested agents), which directly undermines the interpretation that the active-learning gain reflects open-world scientific utility rather than a closed-world artifact.
Authors: We acknowledge this point and have added a dedicated coverage analysis subsection to the methods. This analysis evaluates the minimum KL divergence between the predictive distributions of the Disentangled RNN ensemble and each ground-truth RL agent (Q-learning, SARSA, and model-based variants) across parameter sweeps, along with parameter recovery errors. Results show average minimum KL divergence below 0.05 and parameter recovery errors under 10% for key parameters, confirming that the ensemble provides sufficient coverage of the tested agent space and that the efficiency gains arise from the active learning procedure. revision: yes
Circularity Check
No circularity; claims rest on external empirical baselines
full rationale
The ATLAS framework is introduced as an active-learning loop that generates hypotheses via a fixed ensemble of Disentangled RNNs and selects experiments to distinguish them; performance is measured by direct comparison to random sampling and to expert-designed experiments taken from the literature. No equations, fitted parameters, or self-citations are shown to reduce the reported 5-10x efficiency gain to a definitional identity or to a prior result authored by the same team. The central claim therefore remains an empirical statement about an external benchmark rather than a self-referential derivation.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Disentangled RNNs
no independent evidence
Reference graph
Works this paper leans on
-
[1]
George E. P. Box. Science and statistics.Journal of the American Statistical Association, 71 (356):791–799, 1976
1976
-
[2]
Scaling up psychology via scientific regret minimization.Proceedings of the National Academy of Sciences, 117(16): 8825–8835, 2020
Mayank Agrawal, Joshua C Peterson, and Thomas L Griffiths. Scaling up psychology via scientific regret minimization.Proceedings of the National Academy of Sciences, 117(16): 8825–8835, 2020
2020
-
[3]
Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023
Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023
2023
-
[4]
Sebastian Musslick, Laura K. Bartlett, Suyog H. Chandramouli, Marina Dubova, Fernand Gobet, Thomas L. Griffiths, Jessica Hullman, Ross D. King, J. Nathan Kutz, Christopher G. Lucas, Suhas Mahesh, Franco Pestilli, Sabina J. Sloman, and William R. Holmes. Automating the practice of science: Opportunities, challenges, and implications.Proceedings of the Nati...
-
[5]
Akshay K Jagadish, Milena Rmus, Kristin Witte, Marvin Mathony, Marcel Binz, and Eric Schulz. Can we automatize scientific discovery in the cognitive sciences?arXiv preprint arXiv:2603.20988, 2026
arXiv 2026
-
[6]
Cognitive model discovery via disentangled rnns.Advances in Neural Information Processing Systems, 36: 61377–61394, 2023
Kevin Miller, Maria Eckstein, Matt Botvinick, and Zeb Kurth-Nelson. Cognitive model discovery via disentangled rnns.Advances in Neural Information Processing Systems, 36: 61377–61394, 2023
2023
-
[7]
Almost- linear rnns yield highly interpretable symbolic codes in dynamical systems reconstruction
Manuel Brenner, Christoph Jürgen Hemmer, Zahra Monfared, and Daniel Durstewitz. Almost- linear rnns yield highly interpretable symbolic codes in dynamical systems reconstruction. Advances in Neural Information Processing Systems, 37:36829–36868, 2024
2024
-
[8]
Discovering cognitive strategies with tiny recurrent neural networks.Nature, 644(8078):993–1001, 2025
Li Ji-An, Marcus K Benna, and Marcelo G Mattar. Discovering cognitive strategies with tiny recurrent neural networks.Nature, 644(8078):993–1001, 2025
2025
-
[9]
Jagadish, Marvin Mathony, Tobias Ludwig, and Eric Schulz
Milena Rmus, Akshay K. Jagadish, Marvin Mathony, Tobias Ludwig, and Eric Schulz. Gen- erating computational cognitive models using large language models, 2025. URL https: //arxiv.org/abs/2502.00879
arXiv 2025
-
[10]
Daw, Kevin J Miller, and Kim Stachenfeld
Pablo Samuel Castro, Nenad Tomasev, Ankit Anand, Navodita Sharma, Rishika Mohanta, Aparna Dev, Kuba Perlin, Siddhant Jain, Kyle Levin, Noemi Elteto, Will Dabney, Alexander Novikov, Glenn C Turner, Maria K Eckstein, Nathaniel D. Daw, Kevin J Miller, and Kim Stachenfeld. Discovering symbolic cognitive models from human and animal behavior. In Aarti Singh, M...
-
[11]
URL https://proceedings.mlr.press/v267/castro25a
PMLR, 13–19 Jul 2025. URL https://proceedings.mlr.press/v267/castro25a. html
2025
-
[12]
Data- driven equation discovery reveals nonlinear reinforcement learning in humans.Proceedings of the National Academy of Sciences, 122(31):e2413441122, 2025
Kyle J LaFollette, Janni Yuval, Roey Schurr, David Melnikoff, and Amit Goldenberg. Data- driven equation discovery reveals nonlinear reinforcement learning in humans.Proceedings of the National Academy of Sciences, 122(31):e2413441122, 2025
2025
-
[13]
Hybrid neural–cognitive models reveal how memory shapes human reward learning.Nature Human Behaviour, pages 1–16, 2026
Maria K Eckstein, Christopher Summerfield, Nathaniel D Daw, and Kevin J Miller. Hybrid neural–cognitive models reveal how memory shapes human reward learning.Nature Human Behaviour, pages 1–16, 2026
2026
-
[14]
Ai-discovered cognitive models reveal novel insights into human and animal learning.bioRxiv, pages 2026–05, 2026
Daniel Kasenberg, Pablo Samuel Castro, Maria K Eckstein, Noemi Elteto, Will Dabney, Car- oline L Wang, Martin Engelcke, Rishika Mohanta, Aparna Dev, Matthew M Botvinick, et al. Ai-discovered cognitive models reveal novel insights into human and animal learning.bioRxiv, pages 2026–05, 2026
2026
-
[15]
Active learning literature survey
Burr Settles. Active learning literature survey. 2009
2009
-
[16]
Xun Huan, Jayanth Jagalur, and Youssef Marzouk. Optimal experimental design: Formulations and computations.Acta Numerica, 33:715–840, 2024. ISSN 1474-0508. doi: 10.1017/ s0962492924000023. URLhttp://dx.doi.org/10.1017/S0962492924000023
-
[17]
Optimal experimental design for model discrimination.Psycho- logical review, 116(3):499, 2009
Jay I Myung and Mark A Pitt. Optimal experimental design for model discrimination.Psycho- logical review, 116(3):499, 2009
2009
-
[18]
Adaptive design opti- mization: A mutual information-based approach to model discrimination in cognitive science
Daniel R Cavagnaro, Jay I Myung, Mark A Pitt, and Janne V Kujala. Adaptive design opti- mization: A mutual information-based approach to model discrimination in cognitive science. Neural computation, 22(4):887–905, 2010
2010
-
[19]
Miller, and Hyojung Seo
Peiyu Liu, Kevin J. Miller, and Hyojung Seo. Discovering cognitive models in a competitive mixed-strategy game. InProceedings of the 2024 Conference on Cognitive Computational Neuroscience (CCN), Boston, MA, USA, 2024. URL https://2024.ccneuro.org/pdf/ 68_Paper_authored_Liu-et-al-CCN2024-authored.pdf
2024
-
[20]
Daw, Kimberly L
Siddhant Jain, Nathaniel D. Daw, Kimberly L. Stachenfeld, and Kevin J. Miller. Simulta- neous modeling of behavior and dopamine with disentangled RNNs. InProceedings of the 2025 Conference on Cognitive Computational Neuroscience (CCN), Amsterdam, Netherlands,
2025
-
[21]
URL https://2025.ccneuro.org/abstract_pdf/Jain_2025_Simultaneous_ modeling_behavior_dopamine_disentangled_RNNs.pdf
2025
-
[22]
Isabelle Hoxha and Anne E. Urai. Uncovering the structure of trial-to-trial variability in per- ceptual decision-making using disentangled recurrent neural networks. InProceedings of the 2025 Conference on Cognitive Computational Neuroscience (CCN), Amsterdam, Netherlands,
2025
-
[23]
URL https://2025.ccneuro.org/abstract_pdf/Hoxha_2025_Uncovering_ Structure_Trial-to-Trial_Variability_Perceptual_Decision-Making.pdf
2025
-
[24]
Xinyue Zhu and Daniel L. Kimmel. Disentangling interpretable cognitive variables that support human generalization. InNeurIPS 2025 Workshop on Interpreting Cognition in Deep Learning Models (CogInterp), 2025. URLhttps://openreview.net/forum?id=HyfwJjytjB
2025
-
[25]
From predictive models to cognitive models: an analysis of rat behavior in the two-armed bandit task.BioRxiv, page 461129, 2018
Kevin J Miller, Matthew M Botvinick, and Carlos D Brody. From predictive models to cognitive models: an analysis of rat behavior in the two-armed bandit task.BioRxiv, page 461129, 2018
2018
-
[26]
Query by committee
H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. Query by committee. InPro- ceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992
1992
-
[27]
Information, prediction, and query by committee.Advances in neural information processing systems, 5, 1992
Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. Information, prediction, and query by committee.Advances in neural information processing systems, 5, 1992
1992
-
[28]
Committee-based sampling for training probabilistic classifiers
Ido Dagan and Sean P Engelson. Committee-based sampling for training probabilistic classifiers. InMachine learning proceedings 1995, pages 150–157. Elsevier, 1995. 11
1995
-
[29]
Employing em and pool-based active learning for text classification
Andrew Kachites McCallum, Kamal Nigam, et al. Employing em and pool-based active learning for text classification. InICML, pages 350–358, 1998
1998
-
[30]
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning.arXiv preprint arXiv:1112.5745, 2011
Pith/arXiv arXiv 2011
-
[31]
A review of modern computational algorithms for bayesian optimal design.International Statistical Review, 84(1):128–154, 2016
Elizabeth G Ryan, Christopher C Drovandi, James M McGree, and Anthony N Pettitt. A review of modern computational algorithms for bayesian optimal design.International Statistical Review, 84(1):128–154, 2016
2016
-
[32]
Modern bayesian experimental design.Statistical Science, 39(1):100–114, 2024
Tom Rainforth, Adam Foster, Desi R Ivanova, and Freddie Bickford Smith. Modern bayesian experimental design.Statistical Science, 39(1):100–114, 2024
2024
-
[33]
On a measure of the information provided by an experiment.The Annals of Mathematical Statistics, 27(4):986–1005, 1956
Dennis V Lindley. On a measure of the information provided by an experiment.The Annals of Mathematical Statistics, 27(4):986–1005, 1956
1956
-
[34]
Information-based objective functions for active data selection.Neural computation, 4(4):590–604, 1992
David JC MacKay. Information-based objective functions for active data selection.Neural computation, 4(4):590–604, 1992
1992
-
[35]
Bayesian experimental design: A review.Statistical science, pages 273–304, 1995
Kathryn Chaloner and Isabella Verdinelli. Bayesian experimental design: A review.Statistical science, pages 273–304, 1995
1995
-
[36]
Deep adaptive design: Amortiz- ing sequential bayesian experimental design
Adam Foster, Desi R Ivanova, Ilyas Malik, and Tom Rainforth. Deep adaptive design: Amortiz- ing sequential bayesian experimental design. InInternational conference on machine learning, pages 3384–3395. PMLR, 2021
2021
-
[37]
Andreas Kirsch and Yarin Gal. Unifying approaches in active learning and active sampling via fisher information and information-theoretic quantities.arXiv preprint arXiv:2208.00549, 2022
arXiv 2022
-
[38]
Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
Turab Lookman, Prasanna V Balachandran, Dezhen Xue, and Ruihao Yuan. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Computational Materials, 5(1):21, 2019
2019
-
[39]
Active learning with support vector machines in the drug discovery process.Journal of chemical information and computer sciences, 43(2):667–673, 2003
Manfred K Warmuth, Jun Liao, Gunnar Rätsch, Michael Mathieson, Santosh Putta, and Christian Lemmen. Active learning with support vector machines in the drug discovery process.Journal of chemical information and computer sciences, 43(2):667–673, 2003
2003
-
[40]
Toward machine learning optimization of experimental design.Nuclear Physics News, 31 (1):25–28, 2021
Atılım Güne¸ s Baydin, Kyle Cranmer, Pablo de Castro Manzano, Christophe Delaere, Denis Derkach, Julien Donini, Tommaso Dorigo, Andrea Giammanco, Jan Kieseler, Lukas Layer, et al. Toward machine learning optimization of experimental design.Nuclear Physics News, 31 (1):25–28, 2021
2021
-
[41]
Prakhar Godara, Frederick Callaway, and Marcelo G Mattar. Adversarial construction as a potential solution to the experiment design problem in large task spaces.arXiv preprint arXiv:2602.03172, 2026
arXiv 2026
-
[42]
A tutorial on adaptive design optimization
Jay I Myung, Daniel R Cavagnaro, and Mark A Pitt. A tutorial on adaptive design optimization. Journal of mathematical psychology, 57(3-4):53–67, 2013
2013
-
[43]
Long Ouyang, Michael Henry Tessler, Daniel Ly, and Noah Goodman. Practical optimal experiment design with probabilistic programs.arXiv preprint arXiv:1608.05046, 2016
Pith/arXiv arXiv 2016
-
[44]
Against theory-motivated experi- mentation: Can random experimental choice lead to better theories?Collective Intelligence, 5, 2026
Marina Dubova, Arseny Moskvichev, and Kevin Zollman. Against theory-motivated experi- mentation: Can random experimental choice lead to better theories?Collective Intelligence, 5, 2026
2026
-
[45]
An evaluation of experimental sampling strategies for autonomous empirical research in cognitive science
Sebastian Musslick, Joshua TS Hewson, Benjamin W Andrew, Younes Strittmatter, Chad C Williams, George T Dang, Marina Dubova, and John Gerrard Holland. An evaluation of experimental sampling strategies for autonomous empirical research in cognitive science. In Proceedings of the annual meeting of the cognitive science society, volume 45, 2023
2023
-
[46]
Simple and scalable predictive uncertainty estimation using deep ensembles.Advances in neural information processing systems, 30, 2017
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles.Advances in neural information processing systems, 30, 2017. 12
2017
-
[47]
Anne Gabrielle Eva Collins. Reinforcement learning: bringing together computation and cognition.Current Opinion in Behavioral Sciences, 29:63–68, 2019. doi: 10.1016/j.cobeha. 2019.04.011
-
[48]
Maria Eckstein, Linda Wilbrecht, and Anne Collins. What do reinforcement learning models measure? interpreting model parameters in cognition and neuroscience.Current Opinion in Behavioral Sciences, 41:128–137, 2021. doi: 10.1016/j.cobeha.2021.06.004
-
[49]
Sutton and Andrew G
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. The MIT Press, Cambridge, Massachusetts, second edition, 2018
2018
-
[50]
Directed evolution methods for enzyme engineering.Molecules, 26 (18):5599, 2021
Saurabh Rajendra Nirantar. Directed evolution methods for enzyme engineering.Molecules, 26 (18):5599, 2021
2021
-
[51]
Phenotypic drug discovery: recent successes, lessons learned and new directions
Fabien Vincent, Arsenio Nueda, Jonathan Lee, Monica Schenone, Marco Prunotto, and Mark Mercola. Phenotypic drug discovery: recent successes, lessons learned and new directions. Nature Reviews Drug Discovery, 21(12):899–914, 2022
2022
-
[52]
A simple white noise analysis of neuronal light responses.Network, 12(2): 199–213, 2001
E J Chichilnisky. A simple white noise analysis of neuronal light responses.Network, 12(2): 199–213, 2001
2001
-
[53]
Inferring circuit mechanisms from sparse neural recording and global perturbation in grid cells.Elife, 7:e33503, 2018
John Widloski, Michael P Marder, and Ila R Fiete. Inferring circuit mechanisms from sparse neural recording and global perturbation in grid cells.Elife, 7:e33503, 2018
2018
-
[54]
Cortical substrates for exploratory decisions in humans.Nature, 441(7095):876–879, 2006
Nathaniel D Daw, John P O’doherty, Peter Dayan, Ben Seymour, and Raymond J Dolan. Cortical substrates for exploratory decisions in humans.Nature, 441(7095):876–879, 2006
2006
-
[55]
Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning.Neuron, 65(6):927–939, 2010
Mark E Walton, Timothy EJ Behrens, Mark J Buckley, Peter H Rudebeck, and Matthew FS Rushworth. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning.Neuron, 65(6):927–939, 2010
2010
-
[56]
Savage.Models of Computation: Exploring the Power of Computing
John E. Savage.Models of Computation: Exploring the Power of Computing. Addison-Wesley, Reading, MA, 1998
1998
-
[57]
Clarke Jr, Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith
Edmund M. Clarke Jr, Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith. Model Checking. MIT Press, 2018
2018
-
[58]
Approximate bisimulation: A bridge between computer science and control theory.European Journal of Control, 17(5-6):568–578, 2011
Antoine Girard and George J Pappas. Approximate bisimulation: A bridge between computer science and control theory.European Journal of Control, 17(5-6):568–578, 2011
2011
-
[59]
Meta-trained agents implement bayes-optimal agents.Advances in neural information processing systems, 33:18691–18703, 2020
Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, and Pedro Ortega. Meta-trained agents implement bayes-optimal agents.Advances in neural information processing systems, 33:18691–18703, 2020
2020
-
[60]
Reinforcement learning in the brain.Journal of Mathematical Psychology, 53(3):139– 154, 2009
Yael Niv. Reinforcement learning in the brain.Journal of Mathematical Psychology, 53(3):139– 154, 2009. ISSN 0022-2496. doi: https://doi.org/10.1016/j.jmp.2008.12.005. URL https:// www.sciencedirect.com/science/article/pii/S0022249608001181. Special Issue: Dynamic Decision Making
-
[61]
Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning.Advances in neural information processing systems, 32, 2019
Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning.Advances in neural information processing systems, 32, 2019
2019
-
[62]
Treloar, Nathan Braniff, Brian Ingalls, and Chris P
Neythen J. Treloar, Nathan Braniff, Brian Ingalls, and Chris P. Barnes. Deep reinforcement learning for optimal experimental design in biology.PLOS Computational Biology, 18(11):1–24, 11 2022. doi: 10.1371/journal.pcbi.1010695. URL https://doi.org/10.1371/journal. pcbi.1010695
-
[63]
The power of ensembles for active learning in image classification
William H Beluch, Tim Genewein, Andreas Nürnberger, and Jan M Köhler. The power of ensembles for active learning in image classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9368–9377, 2018
2018
-
[64]
simulate
Amir Dezfouli, Richard Nock, and Peter Dayan. Adversarial vulnerabilities of human decision- making.Proceedings of the National Academy of Sciences, 117(46):29221–29228, 2020. 13 A Appendix A.1 Robustness of Experiment Optimizer We analyzed the robustness of the Experiment Optimizer on the example problem of distinguishing two Q-learning agents with diffe...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.