Boundedly Rational Meta-Learning in Sequential Consumer Choice
Pith reviewed 2026-05-20 19:48 UTC · model grok-4.3
The pith
Consumers transfer experience across related choices using coarse low-dimensional approximations of uncertainty rather than full Bayesian integration or starting over.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Trial-by-trial likelihood comparisons in the hierarchical airline choice task show that low-D boundedly rational meta-learning policies, especially BRMDP(1), fit participant behavior better than both a no-transfer benchmark and a fully integrated Bayesian meta-learning benchmark, indicating that consumers transfer regularities across contexts through coarse representations of prior uncertainty.
What carries the argument
BRMDP(D), a boundedly rational meta dynamic programming policy that approximates full Bayesian integration by drawing a limited number D of samples from the hyper-posterior over context parameters.
If this is right
- Participants choose better options earlier in later routes and reduce pseudo-regret across contexts.
- Consumer learning models must incorporate approximate rather than complete cross-context transfer.
- Managerial counterfactuals that assume either no transfer or full integration will produce misleading predictions.
- Low-dimensional approximations of prior uncertainty provide a better account of observed choice sequences than the two extreme benchmarks.
Where Pith is reading between the lines
- In markets with many overlapping contexts, firms may benefit from designing recommendations that assume consumers use only a few representative prior samples rather than exhaustive updating.
- The same coarse-transfer pattern could appear in other sequential domains such as repeated product trials or service selections where underlying regularities exist across categories.
- Testing whether the fit of low-D policies improves or worsens as context similarity increases would clarify the boundary conditions of the mechanism.
Load-bearing premise
The laboratory task of repeated airline choices across routes with noisy binary outcomes adequately represents real-world cross-context knowledge transfer in sequential consumer decisions.
What would settle it
A new experiment that varies the relatedness between contexts and tests whether the likelihood advantage of BRMDP(1) over full integration disappears when contexts share no underlying structure.
Figures
read the original abstract
Many consumer decisions are repeated choices under uncertainty. Standard models capture these decisions using Bayesian learning and dynamic programming: consumers update beliefs from feedback and use those beliefs to guide future choices. In many markets, however, learning does not restart when consumers enter a new context: prior experience with a brand, product, or provider can shape beliefs in later, related decisions. We study this cross-context knowledge transfer, or meta-learning, in sequential choice. We design a hierarchical laboratory task in which participants repeatedly choose among airlines across routes and observe noisy binary outcomes. Reduced-form evidence shows that participants improve not only within routes, but also across routes: they choose better airlines earlier in later routes and reduce pseudo-regret. To identify the mechanism behind this transfer, we compare human choices to a no-transfer benchmark and a fully integrated Bayesian meta-learning benchmark. In particular, we introduce a class of boundedly rational meta dynamic programming policies, BRMDP(D), that approximate full integration using a limited number of hyper-posterior draws, denoted by D. Trial-by-trial likelihood comparisons show that low-D boundedly rational meta-learning, especially BRMDP(1), fits participant behavior better than both no transfer and fully integrated Bayesian transfer. Consumers, therefore, transfer brand-level regularities across contexts, but through coarse representations of prior uncertainty. The findings imply that models of consumer learning should allow for approximate cross-context transfer, and that managerial counterfactuals based on either no-transfer or fully integrated learning can be misleading.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines cross-context knowledge transfer (meta-learning) in sequential consumer choice under uncertainty. Participants complete a hierarchical laboratory task involving repeated choices among airlines across multiple routes, receiving noisy binary feedback on outcomes. Reduced-form analyses show within-route learning as well as cross-route improvement, with better early-route choices and lower pseudo-regret in later routes. The authors introduce a family of boundedly rational meta-dynamic-programming policies, BRMDP(D), that approximate full Bayesian integration over a hyper-posterior by drawing only D samples. Trial-by-trial likelihood comparisons on human data indicate that low-D policies, particularly BRMDP(1), outperform both a no-transfer benchmark and a fully integrated Bayesian meta-learning model, supporting the claim that consumers transfer brand-level regularities via coarse representations of prior uncertainty.
Significance. If the empirical comparisons hold, the paper offers a useful middle ground between no-transfer and fully rational meta-learning models, with direct implications for consumer-behavior modeling and managerial counterfactuals. The BRMDP(D) construction supplies a computationally tractable approximation whose free parameter D is directly interpretable as the granularity of uncertainty representation. The work also supplies falsifiable predictions via likelihood rankings and reduced-form cross-route metrics, which are strengths for a field that often relies on purely qualitative claims about transfer.
major comments (2)
- [§4] §4 (Results), likelihood comparisons: the reported superiority of BRMDP(1) over the fully integrated benchmark and the no-transfer model is central to the main claim, yet the manuscript provides no standard errors, bootstrap intervals, or formal statistical tests on the likelihood differences; without these, it is difficult to judge whether the ranking is robust to sampling variation across participants.
- [Experimental design] Experimental design section: the description of the hierarchical task does not state the exact number of routes, trials per route, or total participant count; these quantities are load-bearing for interpreting both the reduced-form cross-route improvement and the power of the model-comparison results.
minor comments (3)
- [§3.2] §3.2, definition of BRMDP(D): the precise sampling procedure for the D hyper-posterior draws and how the resulting policy is computed should be written as a short algorithm or pseudocode for reproducibility.
- [Figure 2] Figure 2 (or equivalent likelihood plot): adding participant-level variability bands or reporting the number of observations per route would improve interpretability of the visual comparison.
- [§3] Notation: the symbol for the hyper-posterior is introduced without an explicit equation reference; adding a numbered display equation would reduce ambiguity when readers compare BRMDP(D) to the full Bayesian benchmark.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us strengthen the statistical robustness and clarity of the manuscript. We address each major point below and have incorporated revisions accordingly.
read point-by-point responses
-
Referee: [§4] §4 (Results), likelihood comparisons: the reported superiority of BRMDP(1) over the fully integrated benchmark and the no-transfer model is central to the main claim, yet the manuscript provides no standard errors, bootstrap intervals, or formal statistical tests on the likelihood differences; without these, it is difficult to judge whether the ranking is robust to sampling variation across participants.
Authors: We agree that measures of uncertainty are necessary to evaluate the robustness of the model ranking. In the revised manuscript we now include bootstrap 95% confidence intervals for the per-participant log-likelihood differences, obtained by resampling participants 10,000 times. These intervals exclude zero for BRMDP(1) versus both the no-transfer and full Bayesian benchmarks. We have also added a paired Wilcoxon signed-rank test on the individual-level likelihoods (p < 0.01 for the primary comparisons) and a new panel in Figure 4 displaying the distribution of differences. These additions directly address the concern while preserving the original likelihood values. revision: yes
-
Referee: [Experimental design] Experimental design section: the description of the hierarchical task does not state the exact number of routes, trials per route, or total participant count; these quantities are load-bearing for interpreting both the reduced-form cross-route improvement and the power of the model-comparison results.
Authors: We appreciate the referee noting this omission. The revised Experimental Design section now states that participants completed 4 routes with 20 trials each, for a total of 96 participants (after excluding 4 who failed attention checks). We have inserted a new Table 1 that summarizes all task parameters, including the number of airlines per route (3), the binary feedback noise level, and the route-specific outcome probabilities. These explicit quantities should now allow readers to assess both the reduced-form cross-route effects and the statistical power of the likelihood comparisons. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper defines BRMDP(D) explicitly as an approximation to full Bayesian meta-learning via a finite number of hyper-posterior draws and then performs direct likelihood comparisons of this family, a no-transfer benchmark, and the full-integration model against observed human choices in the airline-route task. These comparisons are statistical fits to external data rather than any quantity being recovered by construction from its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked to justify the central claim that low-D variants provide a better account of transfer; the derivation from task design through reduced-form cross-route improvement to model ranking therefore remains independent of the reported result.
Axiom & Free-Parameter Ledger
free parameters (1)
- D (number of hyper-posterior draws)
axioms (2)
- domain assumption Consumers perform Bayesian belief updating from noisy feedback
- domain assumption Cross-route knowledge transfer occurs via meta-learning over brand-level regularities
invented entities (1)
-
BRMDP(D) policies
no independent evidence
Reference graph
Works this paper leans on
-
[1]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 0 (2-3): 0 235--256, 2002
work page 2002
-
[2]
A. V. Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107 0 (3): 0 797--817, 1992
work page 1992
-
[3]
S. Basu, B. Kveton, M. Zaheer, and C. Szepesv \'a ri. No regrets for learning the prior in bandits. Advances in neural information processing systems, 34: 0 28029--28041, 2021
work page 2021
-
[4]
R. Bellman. A problem in the sequential design of experiments. Sankhy \=a : The Indian Journal of Statistics (1933-1960) , 16 0 (3/4): 0 221--229, 1956
work page 1933
-
[5]
R. Bellman. Dynamic programming. science, 153 0 (3731): 0 34--37, 1966
work page 1966
-
[6]
R. Bhui, L. Lai, and S. J. Gershman. Resource-rational decision making. Current Opinion in Behavioral Sciences, 41: 0 15--21, 2021
work page 2021
-
[7]
M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024
work page 2024
-
[8]
J. A. Bohren and D. N. Hauser. Misspecified models in learning and games. Annual Review of Economics, 17, 2025
work page 2025
-
[9]
T. Bondi. Alone, together: A model of social (mis) learning from consumer reviews. Marketing Science, 44 0 (6): 0 1258--1277, 2025
work page 2025
- [10]
-
[11]
F. Callaway, B. Van Opheusden, S. Gul, P. Das, P. M. Krueger, T. L. Griffiths, and F. Lieder. Rational use of cognitive resources in human planning. Nature human behaviour, 6 0 (8): 0 1112--1125, 2022
work page 2022
-
[12]
C. F. Camerer, T.-H. Ho, and J.-K. Chong. A cognitive hierarchy model of games. The quarterly journal of economics, 119 0 (3): 0 861--898, 2004
work page 2004
-
[13]
N. Chater, J.-Q. Zhu, J. Spicer, J. Sundh, P. Le \'o n-Villagr \'a , and A. Sanborn. Probabilistic biases meet the bayesian brain. Current Directions in Psychological Science, 29 0 (5): 0 506--512, 2020
work page 2020
-
[14]
H. Che, T. Erdem, and T. S. \"O nc \"u . Consumer learning and evolution of consumer brand preferences. Quantitative Marketing and Economics, 13: 0 173--202, 2015
work page 2015
-
[15]
A. T. Ching, T. Erdem, and M. P. Keane. Learning models: An assessment of progress, challenges, and new developments. Marketing Science, 32 0 (6): 0 913--938, 2013
work page 2013
-
[16]
A. T. Ching, T. Erdem, and M. P. Keane. Empirical models of learning dynamics: A survey of recent developments. Handbook of marketing decision models, pages 223--257, 2017
work page 2017
-
[17]
A. Coscelli and M. Shum. An empirical model of learning and patient spillovers in new drug entry. Journal of econometrics, 122 0 (2): 0 213--246, 2004
work page 2004
-
[18]
G. S. Crawford and M. Shum. Uncertainty and learning in pharmaceutical demand. econometrica, 73 0 (4): 0 1137--1173, 2005
work page 2005
-
[19]
N. D. Daw et al. Trial-by-trial data analysis using computational models. Decision making, affect, and learning: Attention and performance XXIII, 23 0 (1): 0 3--38, 2011
work page 2011
-
[20]
M. H. DeGroot. Optimal statistical decisions. John Wiley & Sons, 2005
work page 2005
-
[21]
M. Eluchans, G. L. Lancia, A. Maselli, M. D’Alessandro, J. R. Gordon, and G. Pezzulo. Adaptive planning depth in human problem-solving. Royal Society Open Science, 12 0 (4), 2025
work page 2025
-
[22]
T. Erdem. An empirical analysis of umbrella branding. Journal of Marketing Research, 35 0 (3): 0 339--351, 1998
work page 1998
-
[23]
T. Erdem and M. P. Keane. Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing science, 15 0 (1): 0 1--20, 1996
work page 1996
- [24]
- [25]
-
[26]
S. J. Gershman. Deconstructing the human algorithms for exploration. Cognition, 173: 0 34--42, 2018
work page 2018
-
[27]
A. Goldfarb, T.-H. Ho, W. Amaldoss, A. L. Brown, Y. Chen, T. H. Cui, A. Galasso, T. Hossain, M. Hsu, N. Lim, et al. Behavioral models of managerial decision-making. Marketing Letters, 23 0 (2): 0 405--421, 2012
work page 2012
-
[28]
T. L. Griffiths, F. Callaway, M. B. Chang, E. Grant, P. M. Krueger, and F. Lieder. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29: 0 24--30, 2019
work page 2019
-
[29]
T. L. Griffiths, N. Chater, and J. B. Tenenbaum. Bayesian models of cognition: Reverse engineering the mind. MIT Press, 2024
work page 2024
-
[30]
J. Guan and H. Xiong. Improved bayes regret bounds for multi-task hierarchical bayesian bandit algorithms. Advances in Neural Information Processing Systems, 37: 0 72964--72999, 2024
work page 2024
-
[31]
M. K. Ho, J. D. Cohen, and T. L. Griffiths. Rational simplification and rigidity in human planning. Psychological Science, 34 0 (11): 0 1281--1292, 2023
work page 2023
-
[32]
T. H. Ho, N. Lim, and C. F. Camerer. Modeling the psychology of consumer and firm behavior with behavioral economics. Journal of marketing Research, 43 0 (3): 0 307--331, 2006
work page 2006
-
[33]
C. F. Hofacker, H. N. Nguyen, and M. Fina. Bayesian inference and consumer behavioral theory. Psychology & Marketing, 41 0 (12): 0 3144--3156, 2024
work page 2024
-
[34]
J. Hong, B. Kveton, M. Zaheer, and M. Ghavamzadeh. Hierarchical bayesian bandits. In International Conference on Artificial Intelligence and Statistics, pages 7724--7741. PMLR, 2022
work page 2022
-
[35]
J. W. Hutchinson and E. M. Eisenstein. Consumer learning and expertise. Handbook of consumer psychology, 4: 0 103--132, 2008
work page 2008
-
[36]
J. W. Hutchinson and R. J. Meyer. Dynamic decision making: Optimal policies and actual behavior in sequential choice problems. Marketing Letters, 5: 0 369--382, 1994
work page 1994
-
[37]
R. E. Kass and A. E. Raftery. Bayes factors. Journal of the american statistical association, 90 0 (430): 0 773--795, 1995
work page 1995
-
[38]
C. Kemp, A. Perfors, and J. B. Tenenbaum. Learning overhypotheses with hierarchical bayesian models. Developmental science, 10 0 (3): 0 307--321, 2007
work page 2007
-
[39]
D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009
work page 2009
- [40]
- [41]
-
[42]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6 0 (1): 0 4--22, 1985
work page 1985
-
[43]
T. Lattimore and C. Szepesv \'a ri. Bandit algorithms. Cambridge University Press, 2020
work page 2020
-
[44]
F. Lieder and T. L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and brain sciences, 43: 0 e1, 2020
work page 2020
-
[45]
S. Lin, J. Zhang, and J. R. Hauser. Learning from experience, simply. Marketing Science, 34 0 (1): 0 1--19, 2015
work page 2015
- [46]
- [48]
-
[49]
R. J. Meyer and J. W. Hutchinson. (when) are we dynamically optimal? a psychological field guide for marketing modelers. Journal of Marketing, 80 0 (5): 0 20--33, 2016
work page 2016
-
[50]
R. J. Meyer and Y. Shi. Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management science, 41 0 (5): 0 817--834, 1995
work page 1995
-
[51]
C. A. Montgomery and B. Wernerfelt. Risk reduction and umbrella branding. Journal of Business, pages 31--50, 1992
work page 1992
-
[52]
S. Nabi, H. Nassif, J. Hong, H. Mamani, and G. Imbens. Bayesian meta-prior learning using empirical bayes. Management Science, 68 0 (3): 0 1737--1755, 2022
work page 2022
-
[53]
J. Rust. Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica: Journal of the Econometric Society, pages 999--1033, 1987
work page 1987
-
[54]
R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba. Learning with hierarchical-deep models. IEEE transactions on pattern analysis and machine intelligence, 35 0 (8): 0 1958--1971, 2012
work page 1958
-
[55]
E. Schulz and S. J. Gershman. The algorithmic architecture of exploration in the human brain. Current opinion in neurobiology, 55: 0 7--14, 2019
work page 2019
- [56]
-
[57]
D. R. Shanks, R. J. Tunney, and J. D. McCarthy. A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making, 15 0 (3): 0 233--250, 2002
work page 2002
-
[58]
H. A. Simon. A behavioral model of rational choice. The quarterly journal of economics, pages 99--118, 1955
work page 1955
-
[59]
K. Sridhar, R. Bezawada, and M. Trivedi. Investigating the drivers of consumer cross-category learning for new products using multiple data sets. Marketing Science, 31 0 (4): 0 668--688, 2012
work page 2012
-
[60]
R. S. Sutton, A. G. Barto, et al. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[61]
S. S. Tehrani and A. T. Ching. A heuristic approach to explore: The value of perfect information. Management Science, 70 0 (5): 0 3200--3224, 2024
work page 2024
-
[62]
J. B. Tenenbaum, C. Kemp, T. L. Griffiths, and N. D. Goodman. How to grow a mind: Statistics, structure, and abstraction. science, 331 0 (6022): 0 1279--1285, 2011
work page 2011
- [63]
-
[64]
E. Vul, N. Goodman, T. L. Griffiths, and J. B. Tenenbaum. One and done? optimal decisions from very few samples. Cognitive science, 38 0 (4): 0 599--637, 2014
work page 2014
-
[65]
B. Wernerfelt. Umbrella branding as a signal of new product quality: An example of signalling by posting a bond. The RAND Journal of Economics, pages 458--466, 1988
work page 1988
-
[66]
C. M. Wu, E. Schulz, M. Speekenbrink, J. D. Nelson, and B. Meder. Generalization guides human exploration in vast decision spaces. Nature human behaviour, 2 0 (12): 0 915--924, 2018
work page 2018
- [67]
-
[68]
L. Yang, O. Toubia, and M. G. De Jong. A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52 0 (2): 0 166--183, 2015
work page 2015
-
[69]
Nature Human Behaviour, 1, 0017 , author=
Coherency maximizing exploration in the supermarket. Nature Human Behaviour, 1, 0017 , author=
-
[70]
International Conference on Machine Learning , pages=
Meta-thompson sampling , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[71]
Journal of marketing Research , volume=
Modeling the psychology of consumer and firm behavior with behavioral economics , author=. Journal of marketing Research , volume=. 2006 , publisher=
work page 2006
-
[72]
Behavioral models of managerial decision-making , author=. Marketing Letters , volume=. 2012 , publisher=
work page 2012
-
[73]
Royal Society Open Science , volume=
Adaptive planning depth in human problem-solving , author=. Royal Society Open Science , volume=. 2025 , publisher=
work page 2025
-
[74]
The quarterly journal of economics , volume=
A simple model of herd behavior , author=. The quarterly journal of economics , volume=. 1992 , publisher=
work page 1992
-
[75]
Psychological Science , volume=
Rational simplification and rigidity in human planning , author=. Psychological Science , volume=. 2023 , publisher=
work page 2023
-
[76]
Nature human behaviour , volume=
Rational use of cognitive resources in human planning , author=. Nature human behaviour , volume=. 2022 , publisher=
work page 2022
-
[77]
PLOS Computational Biology , volume=
Human decision making balances reward maximization and policy compression , author=. PLOS Computational Biology , volume=. 2024 , publisher=
work page 2024
-
[78]
Current Directions in Psychological Science , volume=
Probabilistic biases meet the Bayesian brain , author=. Current Directions in Psychological Science , volume=. 2020 , publisher=
work page 2020
-
[79]
Behavioral and Brain Sciences , volume=
Meta-learned models of cognition , author=. Behavioral and Brain Sciences , volume=. 2024 , publisher=
work page 2024
-
[80]
Annual Review of Economics , volume=
Misspecified models in learning and games , author=. Annual Review of Economics , volume=. 2025 , publisher=
work page 2025
-
[81]
Psychology & Marketing , volume=
Bayesian inference and consumer behavioral theory , author=. Psychology & Marketing , volume=. 2024 , publisher=
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.