Boundedly Rational Meta-Learning in Sequential Consumer Choice

Hema Yoganarasimhan; Max Kleiman-Weiner; Mehrzad Khosravi

arxiv: 2605.16532 · v1 · pith:6EZHKHKEnew · submitted 2026-05-15 · 💻 cs.LG · econ.GN· q-fin.EC

Boundedly Rational Meta-Learning in Sequential Consumer Choice

Mehrzad Khosravi , Max Kleiman-Weiner , Hema Yoganarasimhan This is my paper

Pith reviewed 2026-05-20 19:48 UTC · model grok-4.3

classification 💻 cs.LG econ.GNq-fin.EC

keywords meta-learningbounded rationalityconsumer choicesequential decisionsknowledge transferBayesian learningdynamic programminglaboratory experiment

0 comments

The pith

Consumers transfer experience across related choices using coarse low-dimensional approximations of uncertainty rather than full Bayesian integration or starting over.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how people carry prior learning into new but similar decision contexts, such as choosing providers across different situations. In a lab task with repeated airline selections on varying routes and noisy feedback, participants show faster improvement in later routes, indicating cross-context transfer. Model comparisons reveal that a boundedly rational policy using only one approximate draw from past uncertainty matches the sequence of human choices more closely than either a no-transfer model or a model that fully integrates all prior information. This points to consumers maintaining simple representations of brand-level patterns when moving between contexts.

Core claim

Trial-by-trial likelihood comparisons in the hierarchical airline choice task show that low-D boundedly rational meta-learning policies, especially BRMDP(1), fit participant behavior better than both a no-transfer benchmark and a fully integrated Bayesian meta-learning benchmark, indicating that consumers transfer regularities across contexts through coarse representations of prior uncertainty.

What carries the argument

BRMDP(D), a boundedly rational meta dynamic programming policy that approximates full Bayesian integration by drawing a limited number D of samples from the hyper-posterior over context parameters.

If this is right

Participants choose better options earlier in later routes and reduce pseudo-regret across contexts.
Consumer learning models must incorporate approximate rather than complete cross-context transfer.
Managerial counterfactuals that assume either no transfer or full integration will produce misleading predictions.
Low-dimensional approximations of prior uncertainty provide a better account of observed choice sequences than the two extreme benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In markets with many overlapping contexts, firms may benefit from designing recommendations that assume consumers use only a few representative prior samples rather than exhaustive updating.
The same coarse-transfer pattern could appear in other sequential domains such as repeated product trials or service selections where underlying regularities exist across categories.
Testing whether the fit of low-D policies improves or worsens as context similarity increases would clarify the boundary conditions of the mechanism.

Load-bearing premise

The laboratory task of repeated airline choices across routes with noisy binary outcomes adequately represents real-world cross-context knowledge transfer in sequential consumer decisions.

What would settle it

A new experiment that varies the relatedness between contexts and tests whether the likelihood advantage of BRMDP(1) over full integration disappears when contexts share no underlying structure.

Figures

Figures reproduced from arXiv: 2605.16532 by Hema Yoganarasimhan, Max Kleiman-Weiner, Mehrzad Khosravi.

**Figure 1.** Figure 1: An example of a consultant in a single-route environment. She has some experience with Ascend and Summit Airways [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: An example of a multi-route environment. The consultant learns about airline performance on the Seattle–Dallas route and [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Graphical representation of the hierarchical Bayesian environment. Each airline [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Airline choice environment used in the experiments. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Mean participants’ best-airline selection over flights for different routes. Each line represents a distinct route, and earlier [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Mean participants’ pseudo-regret over flights for different routes. Each line represents a distinct route, and earlier routes are [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Mean best-airline selection over flights for different routes in the Far Means–Low Var. condition. Each line represents a [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Mean pseudo-regret over flights for different routes in the Far Means–Low Var. condition. Each line represents a distinct [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Log-likelihood comparison across policies for [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

read the original abstract

Many consumer decisions are repeated choices under uncertainty. Standard models capture these decisions using Bayesian learning and dynamic programming: consumers update beliefs from feedback and use those beliefs to guide future choices. In many markets, however, learning does not restart when consumers enter a new context: prior experience with a brand, product, or provider can shape beliefs in later, related decisions. We study this cross-context knowledge transfer, or meta-learning, in sequential choice. We design a hierarchical laboratory task in which participants repeatedly choose among airlines across routes and observe noisy binary outcomes. Reduced-form evidence shows that participants improve not only within routes, but also across routes: they choose better airlines earlier in later routes and reduce pseudo-regret. To identify the mechanism behind this transfer, we compare human choices to a no-transfer benchmark and a fully integrated Bayesian meta-learning benchmark. In particular, we introduce a class of boundedly rational meta dynamic programming policies, BRMDP(D), that approximate full integration using a limited number of hyper-posterior draws, denoted by D. Trial-by-trial likelihood comparisons show that low-D boundedly rational meta-learning, especially BRMDP(1), fits participant behavior better than both no transfer and fully integrated Bayesian transfer. Consumers, therefore, transfer brand-level regularities across contexts, but through coarse representations of prior uncertainty. The findings imply that models of consumer learning should allow for approximate cross-context transfer, and that managerial counterfactuals based on either no-transfer or fully integrated learning can be misleading.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript examines cross-context knowledge transfer (meta-learning) in sequential consumer choice under uncertainty. Participants complete a hierarchical laboratory task involving repeated choices among airlines across multiple routes, receiving noisy binary feedback on outcomes. Reduced-form analyses show within-route learning as well as cross-route improvement, with better early-route choices and lower pseudo-regret in later routes. The authors introduce a family of boundedly rational meta-dynamic-programming policies, BRMDP(D), that approximate full Bayesian integration over a hyper-posterior by drawing only D samples. Trial-by-trial likelihood comparisons on human data indicate that low-D policies, particularly BRMDP(1), outperform both a no-transfer benchmark and a fully integrated Bayesian meta-learning model, supporting the claim that consumers transfer brand-level regularities via coarse representations of prior uncertainty.

Significance. If the empirical comparisons hold, the paper offers a useful middle ground between no-transfer and fully rational meta-learning models, with direct implications for consumer-behavior modeling and managerial counterfactuals. The BRMDP(D) construction supplies a computationally tractable approximation whose free parameter D is directly interpretable as the granularity of uncertainty representation. The work also supplies falsifiable predictions via likelihood rankings and reduced-form cross-route metrics, which are strengths for a field that often relies on purely qualitative claims about transfer.

major comments (2)

[§4] §4 (Results), likelihood comparisons: the reported superiority of BRMDP(1) over the fully integrated benchmark and the no-transfer model is central to the main claim, yet the manuscript provides no standard errors, bootstrap intervals, or formal statistical tests on the likelihood differences; without these, it is difficult to judge whether the ranking is robust to sampling variation across participants.
[Experimental design] Experimental design section: the description of the hierarchical task does not state the exact number of routes, trials per route, or total participant count; these quantities are load-bearing for interpreting both the reduced-form cross-route improvement and the power of the model-comparison results.

minor comments (3)

[§3.2] §3.2, definition of BRMDP(D): the precise sampling procedure for the D hyper-posterior draws and how the resulting policy is computed should be written as a short algorithm or pseudocode for reproducibility.
[Figure 2] Figure 2 (or equivalent likelihood plot): adding participant-level variability bands or reporting the number of observations per route would improve interpretability of the visual comparison.
[§3] Notation: the symbol for the hyper-posterior is introduced without an explicit equation reference; adding a numbered display equation would reduce ambiguity when readers compare BRMDP(D) to the full Bayesian benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us strengthen the statistical robustness and clarity of the manuscript. We address each major point below and have incorporated revisions accordingly.

read point-by-point responses

Referee: [§4] §4 (Results), likelihood comparisons: the reported superiority of BRMDP(1) over the fully integrated benchmark and the no-transfer model is central to the main claim, yet the manuscript provides no standard errors, bootstrap intervals, or formal statistical tests on the likelihood differences; without these, it is difficult to judge whether the ranking is robust to sampling variation across participants.

Authors: We agree that measures of uncertainty are necessary to evaluate the robustness of the model ranking. In the revised manuscript we now include bootstrap 95% confidence intervals for the per-participant log-likelihood differences, obtained by resampling participants 10,000 times. These intervals exclude zero for BRMDP(1) versus both the no-transfer and full Bayesian benchmarks. We have also added a paired Wilcoxon signed-rank test on the individual-level likelihoods (p < 0.01 for the primary comparisons) and a new panel in Figure 4 displaying the distribution of differences. These additions directly address the concern while preserving the original likelihood values. revision: yes
Referee: [Experimental design] Experimental design section: the description of the hierarchical task does not state the exact number of routes, trials per route, or total participant count; these quantities are load-bearing for interpreting both the reduced-form cross-route improvement and the power of the model-comparison results.

Authors: We appreciate the referee noting this omission. The revised Experimental Design section now states that participants completed 4 routes with 20 trials each, for a total of 96 participants (after excluding 4 who failed attention checks). We have inserted a new Table 1 that summarizes all task parameters, including the number of airlines per route (3), the binary feedback noise level, and the route-specific outcome probabilities. These explicit quantities should now allow readers to assess both the reduced-form cross-route effects and the statistical power of the likelihood comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper defines BRMDP(D) explicitly as an approximation to full Bayesian meta-learning via a finite number of hyper-posterior draws and then performs direct likelihood comparisons of this family, a no-transfer benchmark, and the full-integration model against observed human choices in the airline-route task. These comparisons are statistical fits to external data rather than any quantity being recovered by construction from its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked to justify the central claim that low-D variants provide a better account of transfer; the derivation from task design through reduced-form cross-route improvement to model ranking therefore remains independent of the reported result.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The work rests on standard Bayesian updating assumptions and introduces BRMDP(D) as an approximation mechanism; D controls the coarseness of the representation and is central to the bounded-rationality claim.

free parameters (1)

D (number of hyper-posterior draws)
Controls the degree of approximation in BRMDP(D) policies; low values like D=1 are shown to fit data best.

axioms (2)

domain assumption Consumers perform Bayesian belief updating from noisy feedback
Invoked as the baseline for both full-integration and bounded-rationality models.
domain assumption Cross-route knowledge transfer occurs via meta-learning over brand-level regularities
Core premise of the hierarchical task design.

invented entities (1)

BRMDP(D) policies no independent evidence
purpose: Approximate full Bayesian meta-learning using limited hyper-posterior draws
Newly introduced class to capture bounded rationality; no independent evidence provided outside the model fits.

pith-pipeline@v0.9.0 · 5810 in / 1385 out tokens · 47890 ms · 2026-05-20T19:48:32.230079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

192 extracted references · 192 canonical work pages

[1]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 0 (2-3): 0 235--256, 2002

work page 2002
[2]

A. V. Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107 0 (3): 0 797--817, 1992

work page 1992
[3]

S. Basu, B. Kveton, M. Zaheer, and C. Szepesv \'a ri. No regrets for learning the prior in bandits. Advances in neural information processing systems, 34: 0 28029--28041, 2021

work page 2021
[4]

R. Bellman. A problem in the sequential design of experiments. Sankhy \=a : The Indian Journal of Statistics (1933-1960) , 16 0 (3/4): 0 221--229, 1956

work page 1933
[5]

R. Bellman. Dynamic programming. science, 153 0 (3731): 0 34--37, 1966

work page 1966
[6]

R. Bhui, L. Lai, and S. J. Gershman. Resource-rational decision making. Current Opinion in Behavioral Sciences, 41: 0 15--21, 2021

work page 2021
[7]

M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024

work page 2024
[8]

J. A. Bohren and D. N. Hauser. Misspecified models in learning and games. Annual Review of Economics, 17, 2025

work page 2025
[9]

T. Bondi. Alone, together: A model of social (mis) learning from consumer reviews. Marketing Science, 44 0 (6): 0 1258--1277, 2025

work page 2025
[10]

Bubeck, N

S. Bubeck, N. Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5 0 (1): 0 1--122, 2012

work page 2012
[11]

Callaway, B

F. Callaway, B. Van Opheusden, S. Gul, P. Das, P. M. Krueger, T. L. Griffiths, and F. Lieder. Rational use of cognitive resources in human planning. Nature human behaviour, 6 0 (8): 0 1112--1125, 2022

work page 2022
[12]

C. F. Camerer, T.-H. Ho, and J.-K. Chong. A cognitive hierarchy model of games. The quarterly journal of economics, 119 0 (3): 0 861--898, 2004

work page 2004
[13]

Chater, J.-Q

N. Chater, J.-Q. Zhu, J. Spicer, J. Sundh, P. Le \'o n-Villagr \'a , and A. Sanborn. Probabilistic biases meet the bayesian brain. Current Directions in Psychological Science, 29 0 (5): 0 506--512, 2020

work page 2020
[14]

H. Che, T. Erdem, and T. S. \"O nc \"u . Consumer learning and evolution of consumer brand preferences. Quantitative Marketing and Economics, 13: 0 173--202, 2015

work page 2015
[15]

A. T. Ching, T. Erdem, and M. P. Keane. Learning models: An assessment of progress, challenges, and new developments. Marketing Science, 32 0 (6): 0 913--938, 2013

work page 2013
[16]

A. T. Ching, T. Erdem, and M. P. Keane. Empirical models of learning dynamics: A survey of recent developments. Handbook of marketing decision models, pages 223--257, 2017

work page 2017
[17]

Coscelli and M

A. Coscelli and M. Shum. An empirical model of learning and patient spillovers in new drug entry. Journal of econometrics, 122 0 (2): 0 213--246, 2004

work page 2004
[18]

G. S. Crawford and M. Shum. Uncertainty and learning in pharmaceutical demand. econometrica, 73 0 (4): 0 1137--1173, 2005

work page 2005
[19]

N. D. Daw et al. Trial-by-trial data analysis using computational models. Decision making, affect, and learning: Attention and performance XXIII, 23 0 (1): 0 3--38, 2011

work page 2011
[20]

M. H. DeGroot. Optimal statistical decisions. John Wiley & Sons, 2005

work page 2005
[21]

Eluchans, G

M. Eluchans, G. L. Lancia, A. Maselli, M. D’Alessandro, J. R. Gordon, and G. Pezzulo. Adaptive planning depth in human problem-solving. Royal Society Open Science, 12 0 (4), 2025

work page 2025
[22]

T. Erdem. An empirical analysis of umbrella branding. Journal of Marketing Research, 35 0 (3): 0 339--351, 1998

work page 1998
[23]

Erdem and M

T. Erdem and M. P. Keane. Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing science, 15 0 (1): 0 1--20, 1996

work page 1996
[24]

Gabaix, D

X. Gabaix, D. Laibson, G. Moloche, and S. Weinberg. Costly information acquisition: Experimental analysis of a boundedly rational model. American Economic Review, 96 0 (4): 0 1043--1068, 2006

work page 2006
[25]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman and Hall/CRC, 1995

work page 1995
[26]

S. J. Gershman. Deconstructing the human algorithms for exploration. Cognition, 173: 0 34--42, 2018

work page 2018
[27]

Goldfarb, T.-H

A. Goldfarb, T.-H. Ho, W. Amaldoss, A. L. Brown, Y. Chen, T. H. Cui, A. Galasso, T. Hossain, M. Hsu, N. Lim, et al. Behavioral models of managerial decision-making. Marketing Letters, 23 0 (2): 0 405--421, 2012

work page 2012
[28]

T. L. Griffiths, F. Callaway, M. B. Chang, E. Grant, P. M. Krueger, and F. Lieder. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29: 0 24--30, 2019

work page 2019
[29]

T. L. Griffiths, N. Chater, and J. B. Tenenbaum. Bayesian models of cognition: Reverse engineering the mind. MIT Press, 2024

work page 2024
[30]

Guan and H

J. Guan and H. Xiong. Improved bayes regret bounds for multi-task hierarchical bayesian bandit algorithms. Advances in Neural Information Processing Systems, 37: 0 72964--72999, 2024

work page 2024
[31]

M. K. Ho, J. D. Cohen, and T. L. Griffiths. Rational simplification and rigidity in human planning. Psychological Science, 34 0 (11): 0 1281--1292, 2023

work page 2023
[32]

T. H. Ho, N. Lim, and C. F. Camerer. Modeling the psychology of consumer and firm behavior with behavioral economics. Journal of marketing Research, 43 0 (3): 0 307--331, 2006

work page 2006
[33]

C. F. Hofacker, H. N. Nguyen, and M. Fina. Bayesian inference and consumer behavioral theory. Psychology & Marketing, 41 0 (12): 0 3144--3156, 2024

work page 2024
[34]

J. Hong, B. Kveton, M. Zaheer, and M. Ghavamzadeh. Hierarchical bayesian bandits. In International Conference on Artificial Intelligence and Statistics, pages 7724--7741. PMLR, 2022

work page 2022
[35]

J. W. Hutchinson and E. M. Eisenstein. Consumer learning and expertise. Handbook of consumer psychology, 4: 0 103--132, 2008

work page 2008
[36]

J. W. Hutchinson and R. J. Meyer. Dynamic decision making: Optimal policies and actual behavior in sequential choice problems. Marketing Letters, 5: 0 369--382, 1994

work page 1994
[37]

R. E. Kass and A. E. Raftery. Bayes factors. Journal of the american statistical association, 90 0 (430): 0 773--795, 1995

work page 1995
[38]

C. Kemp, A. Perfors, and J. B. Tenenbaum. Learning overhypotheses with hierarchical bayesian models. Developmental science, 10 0 (3): 0 307--321, 2007

work page 2007
[39]

Koller and N

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009
[40]

Kveton, M

B. Kveton, M. Konobeev, M. Zaheer, C.-w. Hsu, M. Mladenov, C. Boutilier, and C. Szepesvari. Meta-thompson sampling. In International Conference on Machine Learning, pages 5884--5893. PMLR, 2021

work page 2021
[41]

Lai and S

L. Lai and S. J. Gershman. Human decision making balances reward maximization and policy compression. PLOS Computational Biology, 20 0 (4): 0 e1012057, 2024

work page 2024
[42]

T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6 0 (1): 0 4--22, 1985

work page 1985
[43]

Lattimore and C

T. Lattimore and C. Szepesv \'a ri. Bandit algorithms. Cambridge University Press, 2020

work page 2020
[44]

Lieder and T

F. Lieder and T. L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and brain sciences, 43: 0 e1, 2020

work page 2020
[45]

S. Lin, J. Zhang, and J. R. Hauser. Learning from experience, simply. Marketing Science, 34 0 (1): 0 1--19, 2015

work page 2015
[46]

Liu and A

J. Liu and A. Ansari. Understanding consumer dynamic decision making under competing loyalty programs. Journal of Marketing Research, 57 0 (3): 0 422--444, 2020

work page 2020
[48]

McCoy, R

J. McCoy, R. Ciulli, and E. Bradlow. Two-for-one conjoint: Bayesian cross-category learning for shared-attribute categories. Available at SSRN 4136593, 2022

work page 2022
[49]

R. J. Meyer and J. W. Hutchinson. (when) are we dynamically optimal? a psychological field guide for marketing modelers. Journal of Marketing, 80 0 (5): 0 20--33, 2016

work page 2016
[50]

R. J. Meyer and Y. Shi. Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management science, 41 0 (5): 0 817--834, 1995

work page 1995
[51]

C. A. Montgomery and B. Wernerfelt. Risk reduction and umbrella branding. Journal of Business, pages 31--50, 1992

work page 1992
[52]

S. Nabi, H. Nassif, J. Hong, H. Mamani, and G. Imbens. Bayesian meta-prior learning using empirical bayes. Management Science, 68 0 (3): 0 1737--1755, 2022

work page 2022
[53]

J. Rust. Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica: Journal of the Econometric Society, pages 999--1033, 1987

work page 1987
[54]

Salakhutdinov, J

R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba. Learning with hierarchical-deep models. IEEE transactions on pattern analysis and machine intelligence, 35 0 (8): 0 1958--1971, 2012

work page 1958
[55]

Schulz and S

E. Schulz and S. J. Gershman. The algorithmic architecture of exploration in the human brain. Current opinion in neurobiology, 55: 0 7--14, 2019

work page 2019
[56]

Schulz, N

E. Schulz, N. T. Franklin, and S. J. Gershman. Finding structure in multi-armed bandits. Cognitive psychology, 119: 0 101261, 2020

work page 2020
[57]

D. R. Shanks, R. J. Tunney, and J. D. McCarthy. A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making, 15 0 (3): 0 233--250, 2002

work page 2002
[58]

H. A. Simon. A behavioral model of rational choice. The quarterly journal of economics, pages 99--118, 1955

work page 1955
[59]

Sridhar, R

K. Sridhar, R. Bezawada, and M. Trivedi. Investigating the drivers of consumer cross-category learning for new products using multiple data sets. Marketing Science, 31 0 (4): 0 668--688, 2012

work page 2012
[60]

R. S. Sutton, A. G. Barto, et al. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998
[61]

S. S. Tehrani and A. T. Ching. A heuristic approach to explore: The value of perfect information. Management Science, 70 0 (5): 0 3200--3224, 2024

work page 2024
[62]

J. B. Tenenbaum, C. Kemp, T. L. Griffiths, and N. D. Goodman. How to grow a mind: Statistics, structure, and abstraction. science, 331 0 (6022): 0 1279--1285, 2011

work page 2011
[63]

Vul and H

E. Vul and H. Pashler. Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, 19 0 (7): 0 645--647, 2008

work page 2008
[64]

E. Vul, N. Goodman, T. L. Griffiths, and J. B. Tenenbaum. One and done? optimal decisions from very few samples. Cognitive science, 38 0 (4): 0 599--637, 2014

work page 2014
[65]

Wernerfelt

B. Wernerfelt. Umbrella branding as a signal of new product quality: An example of signalling by posting a bond. The RAND Journal of Economics, pages 458--466, 1988

work page 1988
[66]

C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder. Generalization guides human exploration in vast decision spaces. Nature human behaviour, 2 0 (12): 0 915--924, 2018

work page 2018
[67]

Xu and J

F. Xu and J. B. Tenenbaum. Word learning as bayesian inference. Psychological review, 114 0 (2): 0 245, 2007

work page 2007
[68]

L. Yang, O. Toubia, and M. G. De Jong. A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52 0 (2): 0 166--183, 2015

work page 2015
[69]

Nature Human Behaviour, 1, 0017 , author=

Coherency maximizing exploration in the supermarket. Nature Human Behaviour, 1, 0017 , author=

work page
[70]

International Conference on Machine Learning , pages=

Meta-thompson sampling , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[71]

Journal of marketing Research , volume=

Modeling the psychology of consumer and firm behavior with behavioral economics , author=. Journal of marketing Research , volume=. 2006 , publisher=

work page 2006
[72]

Marketing Letters , volume=

Behavioral models of managerial decision-making , author=. Marketing Letters , volume=. 2012 , publisher=

work page 2012
[73]

Royal Society Open Science , volume=

Adaptive planning depth in human problem-solving , author=. Royal Society Open Science , volume=. 2025 , publisher=

work page 2025
[74]

The quarterly journal of economics , volume=

A simple model of herd behavior , author=. The quarterly journal of economics , volume=. 1992 , publisher=

work page 1992
[75]

Psychological Science , volume=

Rational simplification and rigidity in human planning , author=. Psychological Science , volume=. 2023 , publisher=

work page 2023
[76]

Nature human behaviour , volume=

Rational use of cognitive resources in human planning , author=. Nature human behaviour , volume=. 2022 , publisher=

work page 2022
[77]

PLOS Computational Biology , volume=

Human decision making balances reward maximization and policy compression , author=. PLOS Computational Biology , volume=. 2024 , publisher=

work page 2024
[78]

Current Directions in Psychological Science , volume=

Probabilistic biases meet the Bayesian brain , author=. Current Directions in Psychological Science , volume=. 2020 , publisher=

work page 2020
[79]

Behavioral and Brain Sciences , volume=

Meta-learned models of cognition , author=. Behavioral and Brain Sciences , volume=. 2024 , publisher=

work page 2024
[80]

Annual Review of Economics , volume=

Misspecified models in learning and games , author=. Annual Review of Economics , volume=. 2025 , publisher=

work page 2025
[81]

Psychology & Marketing , volume=

Bayesian inference and consumer behavioral theory , author=. Psychology & Marketing , volume=. 2024 , publisher=

work page 2024

Showing first 80 references.

[1] [1]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 0 (2-3): 0 235--256, 2002

work page 2002

[2] [2]

A. V. Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107 0 (3): 0 797--817, 1992

work page 1992

[3] [3]

S. Basu, B. Kveton, M. Zaheer, and C. Szepesv \'a ri. No regrets for learning the prior in bandits. Advances in neural information processing systems, 34: 0 28029--28041, 2021

work page 2021

[4] [4]

R. Bellman. A problem in the sequential design of experiments. Sankhy \=a : The Indian Journal of Statistics (1933-1960) , 16 0 (3/4): 0 221--229, 1956

work page 1933

[5] [5]

R. Bellman. Dynamic programming. science, 153 0 (3731): 0 34--37, 1966

work page 1966

[6] [6]

R. Bhui, L. Lai, and S. J. Gershman. Resource-rational decision making. Current Opinion in Behavioral Sciences, 41: 0 15--21, 2021

work page 2021

[7] [7]

M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024

work page 2024

[8] [8]

J. A. Bohren and D. N. Hauser. Misspecified models in learning and games. Annual Review of Economics, 17, 2025

work page 2025

[9] [9]

T. Bondi. Alone, together: A model of social (mis) learning from consumer reviews. Marketing Science, 44 0 (6): 0 1258--1277, 2025

work page 2025

[10] [10]

Bubeck, N

S. Bubeck, N. Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5 0 (1): 0 1--122, 2012

work page 2012

[11] [11]

Callaway, B

F. Callaway, B. Van Opheusden, S. Gul, P. Das, P. M. Krueger, T. L. Griffiths, and F. Lieder. Rational use of cognitive resources in human planning. Nature human behaviour, 6 0 (8): 0 1112--1125, 2022

work page 2022

[12] [12]

C. F. Camerer, T.-H. Ho, and J.-K. Chong. A cognitive hierarchy model of games. The quarterly journal of economics, 119 0 (3): 0 861--898, 2004

work page 2004

[13] [13]

Chater, J.-Q

N. Chater, J.-Q. Zhu, J. Spicer, J. Sundh, P. Le \'o n-Villagr \'a , and A. Sanborn. Probabilistic biases meet the bayesian brain. Current Directions in Psychological Science, 29 0 (5): 0 506--512, 2020

work page 2020

[14] [14]

H. Che, T. Erdem, and T. S. \"O nc \"u . Consumer learning and evolution of consumer brand preferences. Quantitative Marketing and Economics, 13: 0 173--202, 2015

work page 2015

[15] [15]

A. T. Ching, T. Erdem, and M. P. Keane. Learning models: An assessment of progress, challenges, and new developments. Marketing Science, 32 0 (6): 0 913--938, 2013

work page 2013

[16] [16]

A. T. Ching, T. Erdem, and M. P. Keane. Empirical models of learning dynamics: A survey of recent developments. Handbook of marketing decision models, pages 223--257, 2017

work page 2017

[17] [17]

Coscelli and M

A. Coscelli and M. Shum. An empirical model of learning and patient spillovers in new drug entry. Journal of econometrics, 122 0 (2): 0 213--246, 2004

work page 2004

[18] [18]

G. S. Crawford and M. Shum. Uncertainty and learning in pharmaceutical demand. econometrica, 73 0 (4): 0 1137--1173, 2005

work page 2005

[19] [19]

N. D. Daw et al. Trial-by-trial data analysis using computational models. Decision making, affect, and learning: Attention and performance XXIII, 23 0 (1): 0 3--38, 2011

work page 2011

[20] [20]

M. H. DeGroot. Optimal statistical decisions. John Wiley & Sons, 2005

work page 2005

[21] [21]

Eluchans, G

M. Eluchans, G. L. Lancia, A. Maselli, M. D’Alessandro, J. R. Gordon, and G. Pezzulo. Adaptive planning depth in human problem-solving. Royal Society Open Science, 12 0 (4), 2025

work page 2025

[22] [22]

T. Erdem. An empirical analysis of umbrella branding. Journal of Marketing Research, 35 0 (3): 0 339--351, 1998

work page 1998

[23] [23]

Erdem and M

T. Erdem and M. P. Keane. Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing science, 15 0 (1): 0 1--20, 1996

work page 1996

[24] [24]

Gabaix, D

X. Gabaix, D. Laibson, G. Moloche, and S. Weinberg. Costly information acquisition: Experimental analysis of a boundedly rational model. American Economic Review, 96 0 (4): 0 1043--1068, 2006

work page 2006

[25] [25]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman and Hall/CRC, 1995

work page 1995

[26] [26]

S. J. Gershman. Deconstructing the human algorithms for exploration. Cognition, 173: 0 34--42, 2018

work page 2018

[27] [27]

Goldfarb, T.-H

A. Goldfarb, T.-H. Ho, W. Amaldoss, A. L. Brown, Y. Chen, T. H. Cui, A. Galasso, T. Hossain, M. Hsu, N. Lim, et al. Behavioral models of managerial decision-making. Marketing Letters, 23 0 (2): 0 405--421, 2012

work page 2012

[28] [28]

T. L. Griffiths, F. Callaway, M. B. Chang, E. Grant, P. M. Krueger, and F. Lieder. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29: 0 24--30, 2019

work page 2019

[29] [29]

T. L. Griffiths, N. Chater, and J. B. Tenenbaum. Bayesian models of cognition: Reverse engineering the mind. MIT Press, 2024

work page 2024

[30] [30]

Guan and H

J. Guan and H. Xiong. Improved bayes regret bounds for multi-task hierarchical bayesian bandit algorithms. Advances in Neural Information Processing Systems, 37: 0 72964--72999, 2024

work page 2024

[31] [31]

M. K. Ho, J. D. Cohen, and T. L. Griffiths. Rational simplification and rigidity in human planning. Psychological Science, 34 0 (11): 0 1281--1292, 2023

work page 2023

[32] [32]

T. H. Ho, N. Lim, and C. F. Camerer. Modeling the psychology of consumer and firm behavior with behavioral economics. Journal of marketing Research, 43 0 (3): 0 307--331, 2006

work page 2006

[33] [33]

C. F. Hofacker, H. N. Nguyen, and M. Fina. Bayesian inference and consumer behavioral theory. Psychology & Marketing, 41 0 (12): 0 3144--3156, 2024

work page 2024

[34] [34]

J. Hong, B. Kveton, M. Zaheer, and M. Ghavamzadeh. Hierarchical bayesian bandits. In International Conference on Artificial Intelligence and Statistics, pages 7724--7741. PMLR, 2022

work page 2022

[35] [35]

J. W. Hutchinson and E. M. Eisenstein. Consumer learning and expertise. Handbook of consumer psychology, 4: 0 103--132, 2008

work page 2008

[36] [36]

J. W. Hutchinson and R. J. Meyer. Dynamic decision making: Optimal policies and actual behavior in sequential choice problems. Marketing Letters, 5: 0 369--382, 1994

work page 1994

[37] [37]

R. E. Kass and A. E. Raftery. Bayes factors. Journal of the american statistical association, 90 0 (430): 0 773--795, 1995

work page 1995

[38] [38]

C. Kemp, A. Perfors, and J. B. Tenenbaum. Learning overhypotheses with hierarchical bayesian models. Developmental science, 10 0 (3): 0 307--321, 2007

work page 2007

[39] [39]

Koller and N

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009

[40] [40]

Kveton, M

B. Kveton, M. Konobeev, M. Zaheer, C.-w. Hsu, M. Mladenov, C. Boutilier, and C. Szepesvari. Meta-thompson sampling. In International Conference on Machine Learning, pages 5884--5893. PMLR, 2021

work page 2021

[41] [41]

Lai and S

L. Lai and S. J. Gershman. Human decision making balances reward maximization and policy compression. PLOS Computational Biology, 20 0 (4): 0 e1012057, 2024

work page 2024

[42] [42]

T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6 0 (1): 0 4--22, 1985

work page 1985

[43] [43]

Lattimore and C

T. Lattimore and C. Szepesv \'a ri. Bandit algorithms. Cambridge University Press, 2020

work page 2020

[44] [44]

Lieder and T

F. Lieder and T. L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and brain sciences, 43: 0 e1, 2020

work page 2020

[45] [45]

S. Lin, J. Zhang, and J. R. Hauser. Learning from experience, simply. Marketing Science, 34 0 (1): 0 1--19, 2015

work page 2015

[46] [46]

Liu and A

J. Liu and A. Ansari. Understanding consumer dynamic decision making under competing loyalty programs. Journal of Marketing Research, 57 0 (3): 0 422--444, 2020

work page 2020

[47] [48]

McCoy, R

J. McCoy, R. Ciulli, and E. Bradlow. Two-for-one conjoint: Bayesian cross-category learning for shared-attribute categories. Available at SSRN 4136593, 2022

work page 2022

[48] [49]

R. J. Meyer and J. W. Hutchinson. (when) are we dynamically optimal? a psychological field guide for marketing modelers. Journal of Marketing, 80 0 (5): 0 20--33, 2016

work page 2016

[49] [50]

R. J. Meyer and Y. Shi. Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management science, 41 0 (5): 0 817--834, 1995

work page 1995

[50] [51]

C. A. Montgomery and B. Wernerfelt. Risk reduction and umbrella branding. Journal of Business, pages 31--50, 1992

work page 1992

[51] [52]

S. Nabi, H. Nassif, J. Hong, H. Mamani, and G. Imbens. Bayesian meta-prior learning using empirical bayes. Management Science, 68 0 (3): 0 1737--1755, 2022

work page 2022

[52] [53]

J. Rust. Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica: Journal of the Econometric Society, pages 999--1033, 1987

work page 1987

[53] [54]

Salakhutdinov, J

R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba. Learning with hierarchical-deep models. IEEE transactions on pattern analysis and machine intelligence, 35 0 (8): 0 1958--1971, 2012

work page 1958

[54] [55]

Schulz and S

E. Schulz and S. J. Gershman. The algorithmic architecture of exploration in the human brain. Current opinion in neurobiology, 55: 0 7--14, 2019

work page 2019

[55] [56]

Schulz, N

E. Schulz, N. T. Franklin, and S. J. Gershman. Finding structure in multi-armed bandits. Cognitive psychology, 119: 0 101261, 2020

work page 2020

[56] [57]

D. R. Shanks, R. J. Tunney, and J. D. McCarthy. A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making, 15 0 (3): 0 233--250, 2002

work page 2002

[57] [58]

H. A. Simon. A behavioral model of rational choice. The quarterly journal of economics, pages 99--118, 1955

work page 1955

[58] [59]

Sridhar, R

K. Sridhar, R. Bezawada, and M. Trivedi. Investigating the drivers of consumer cross-category learning for new products using multiple data sets. Marketing Science, 31 0 (4): 0 668--688, 2012

work page 2012

[59] [60]

R. S. Sutton, A. G. Barto, et al. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998

[60] [61]

S. S. Tehrani and A. T. Ching. A heuristic approach to explore: The value of perfect information. Management Science, 70 0 (5): 0 3200--3224, 2024

work page 2024

[61] [62]

J. B. Tenenbaum, C. Kemp, T. L. Griffiths, and N. D. Goodman. How to grow a mind: Statistics, structure, and abstraction. science, 331 0 (6022): 0 1279--1285, 2011

work page 2011

[62] [63]

Vul and H

E. Vul and H. Pashler. Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, 19 0 (7): 0 645--647, 2008

work page 2008

[63] [64]

E. Vul, N. Goodman, T. L. Griffiths, and J. B. Tenenbaum. One and done? optimal decisions from very few samples. Cognitive science, 38 0 (4): 0 599--637, 2014

work page 2014

[64] [65]

Wernerfelt

B. Wernerfelt. Umbrella branding as a signal of new product quality: An example of signalling by posting a bond. The RAND Journal of Economics, pages 458--466, 1988

work page 1988

[65] [66]

C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder. Generalization guides human exploration in vast decision spaces. Nature human behaviour, 2 0 (12): 0 915--924, 2018

work page 2018

[66] [67]

Xu and J

F. Xu and J. B. Tenenbaum. Word learning as bayesian inference. Psychological review, 114 0 (2): 0 245, 2007

work page 2007

[67] [68]

L. Yang, O. Toubia, and M. G. De Jong. A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52 0 (2): 0 166--183, 2015

work page 2015

[68] [69]

Nature Human Behaviour, 1, 0017 , author=

Coherency maximizing exploration in the supermarket. Nature Human Behaviour, 1, 0017 , author=

work page

[69] [70]

International Conference on Machine Learning , pages=

Meta-thompson sampling , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[70] [71]

Journal of marketing Research , volume=

Modeling the psychology of consumer and firm behavior with behavioral economics , author=. Journal of marketing Research , volume=. 2006 , publisher=

work page 2006

[71] [72]

Marketing Letters , volume=

Behavioral models of managerial decision-making , author=. Marketing Letters , volume=. 2012 , publisher=

work page 2012

[72] [73]

Royal Society Open Science , volume=

Adaptive planning depth in human problem-solving , author=. Royal Society Open Science , volume=. 2025 , publisher=

work page 2025

[73] [74]

The quarterly journal of economics , volume=

A simple model of herd behavior , author=. The quarterly journal of economics , volume=. 1992 , publisher=

work page 1992

[74] [75]

Psychological Science , volume=

Rational simplification and rigidity in human planning , author=. Psychological Science , volume=. 2023 , publisher=

work page 2023

[75] [76]

Nature human behaviour , volume=

Rational use of cognitive resources in human planning , author=. Nature human behaviour , volume=. 2022 , publisher=

work page 2022

[76] [77]

PLOS Computational Biology , volume=

Human decision making balances reward maximization and policy compression , author=. PLOS Computational Biology , volume=. 2024 , publisher=

work page 2024

[77] [78]

Current Directions in Psychological Science , volume=

Probabilistic biases meet the Bayesian brain , author=. Current Directions in Psychological Science , volume=. 2020 , publisher=

work page 2020

[78] [79]

Behavioral and Brain Sciences , volume=

Meta-learned models of cognition , author=. Behavioral and Brain Sciences , volume=. 2024 , publisher=

work page 2024

[79] [80]

Annual Review of Economics , volume=

Misspecified models in learning and games , author=. Annual Review of Economics , volume=. 2025 , publisher=

work page 2025

[80] [81]

Psychology & Marketing , volume=

Bayesian inference and consumer behavioral theory , author=. Psychology & Marketing , volume=. 2024 , publisher=

work page 2024