Large Language Models for Market Research: A Data-augmentation Approach

Arizona State University); Dennis J. Zhang (Olin School of Business; Heng Zhang (W. P. Carey School of Business; Mengxin Wang (Naveen Jindal School of Management; The University of Texas at Dallas); Washington University in St. Louis)

arxiv: 2412.19363 · v3 · submitted 2024-12-26 · 💻 cs.AI · cs.LG· stat.ME· stat.ML

Large Language Models for Market Research: A Data-augmentation Approach

Mengxin Wang (Naveen Jindal School of Management , The University of Texas at Dallas) , Dennis J. Zhang (Olin School of Business , Washington University in St. Louis) , Heng Zhang (W. P. Carey School of Business , Arizona State University) This is my paper

Pith reviewed 2026-05-23 07:07 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.MEstat.ML

keywords conjoint analysisdata augmentationlarge language modelsmarket researchconsumer preferencesstatistical estimationbias correctionsurvey methods

0 comments

The pith

A statistical data augmentation method combines LLM-generated responses with real survey data to produce consistent estimators for conjoint analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Conjoint analysis requires large numbers of human respondents to map consumer trade-offs, but running such surveys is slow and expensive. Large language models can generate preference data at scale, yet simply swapping in LLM responses for human ones creates bias that breaks standard statistical procedures. The paper develops an augmentation procedure that mixes the two sources so the bias is corrected rather than amplified. The resulting estimators are consistent, asymptotically normal, and come with a finite-sample error bound. Tests on COVID-19 vaccine choices and sports-car preferences show the approach cuts real-data needs by 25 to 80 percent while naive mixing saves nothing.

Core claim

The paper proposes a statistical data augmentation approach that integrates LLM-generated data with real data in conjoint analysis. This yields estimators that are consistent and asymptotically normal, along with a finite-sample performance bound on estimation error. In contrast, naive substitution of human data with LLM data exacerbates bias. Validation on COVID-19 vaccine preferences shows cost savings of 24.9% to 79.8%, with similar robustness in sports car choice data.

What carries the argument

The statistical data augmentation procedure that integrates LLM-generated responses with real human responses to correct bias in preference estimation.

If this is right

The estimators are consistent and asymptotically normal, supporting standard inference.
A finite-sample bound quantifies the reduction in estimation error.
Real data collection costs can be reduced by 24.9% to 79.8% while preserving accuracy.
Naive substitution approaches fail to reduce data needs because they leave bias uncorrected.
The method maintains performance across different product categories such as vaccines and cars.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same augmentation logic could be tested on other survey formats that collect stated preferences.
If the bias structure between LLM and human data shifts with model updates, the procedure would require re-calibration on fresh human samples.
Sequential designs that use LLM data to decide which additional real respondents to query could further lower costs.
The framework might be adapted to correct bias when mixing synthetic data with real observations in non-choice survey settings.

Load-bearing premise

There exists a statistical relationship between LLM-generated responses and human responses that the augmentation procedure can exploit to remove bias without introducing uncorrectable distortions.

What would settle it

Observing that the proposed estimators remain biased and do not converge to the true parameters as the number of real respondents grows would falsify the consistency claim.

read the original abstract

Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks. Their ability to generate human-like text has opened new possibilities for market research, particularly in conjoint analysis, where understanding consumer preferences is essential but often resource-intensive. Traditional survey-based methods face limitations in scalability and cost, making LLM-generated data a promising alternative. However, while LLMs have the potential to simulate real consumer behavior, recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two. In this paper, we address this gap by proposing a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis. This results in statistically robust estimators with consistent and asymptotically normal properties, in contrast to naive approaches that simply substitute human data with LLM-generated data, which can exacerbate bias. We further present a finite-sample performance bound on the estimation error. We validate our framework through an empirical study on COVID-19 vaccine preferences, demonstrating its superior ability to reduce estimation error and save data and costs by 24.9% to 79.8%. In contrast, naive approaches fail to save data due to the inherent biases in LLM-generated data compared to human data. Another empirical study on sports car choices validates the robustness of our results. Our findings suggest that while LLM-generated data is not a direct substitute for human responses, it can serve as a valuable complement when used within a robust statistical framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a data-augmentation procedure meant to let LLM responses supplement human conjoint data while keeping the estimator consistent and asymptotically normal, but the abstract supplies no model or derivation to show how.

read the letter

The core claim is that a specific augmentation step can fuse LLM-generated choices with real survey responses in conjoint analysis so the resulting estimates remain consistent and asymptotically normal, while simple substitution does not. Two empirical cases—one on COVID-19 vaccine preferences and one on sports-car choices—are used to show data-cost reductions between 25 and 80 percent that naive replacement cannot achieve. A finite-sample error bound is also stated. That combination of a claimed statistical fix plus concrete cost numbers is what is new here; prior work on LLM substitution is treated as the baseline that fails on bias. The empirical demonstrations are the part that could interest applied researchers who already run conjoint studies and want cheaper data collection. The main limitation visible in the abstract is the absence of any explicit model for the relationship between LLM and human responses, any moment conditions, or even a proof sketch for the consistency result. Without those pieces it is impossible to judge whether the augmentation actually corrects bias or simply assumes the right parametric link exists. The reported savings also come without sample sizes, standard errors, or exclusion criteria, so the practical magnitude is hard to assess. This work sits squarely in the applied market-research niche that already uses conjoint methods and is looking for ways to cut survey costs with AI. A reader in that area would find the empirical examples useful to see, but only after the identification argument is checked. The stress-test concern about an unstated parametric relationship between LLM and human data appears to match what is shown so far. I would send the paper to referees so they can examine the derivations and the exact conditions required for consistency.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a statistical data-augmentation framework for integrating LLM-generated responses with real human data in conjoint analysis. It claims that the resulting estimators are consistent and asymptotically normal (in contrast to naive substitution), supplies a finite-sample performance bound on estimation error, and reports empirical cost/data savings of 24.9%–79.8% on two studies (COVID-19 vaccine preferences and sports-car choices) while naive substitution yields no savings due to LLM bias.

Significance. If the consistency and normality claims can be established under explicit, testable assumptions on the LLM–human response relationship, the framework would provide a principled route to reducing survey costs in preference elicitation while retaining statistical guarantees. The two empirical applications on real preference data add practical weight, but the absence of any derivation or model specification prevents assessment of whether the result is robust or merely an artifact of an unstated parametric link.

major comments (3)

[Abstract / theoretical claims] Abstract and theoretical development: the central claims of consistency, asymptotic normality, and a finite-sample bound are asserted without any derivation, set of identifying assumptions, moment conditions, or proof sketch. Because these properties are the load-bearing contribution distinguishing the method from naive substitution, their absence makes it impossible to verify the result or its scope.
[Method / augmentation procedure] Augmentation procedure: the method is described as exploiting a statistical relationship between LLM-generated and human responses to correct bias, yet no explicit parametric form, conditional-expectation model, or bias-correction term is supplied. Without this, it cannot be shown that consistency holds for the proposed estimator while failing for naive substitution, as required by the skeptic’s concern.
[Empirical studies] Empirical validation: the reported savings range (24.9%–79.8%) is presented without error bars, sample sizes, exclusion rules, or details on how the finite-sample bound was evaluated. These omissions directly affect the claim that the method outperforms naive substitution in reducing estimation error.

minor comments (1)

[Notation / model setup] Notation for the conjoint model and the augmentation weights is introduced without a clear table or equation reference, making it difficult to trace how the estimator is constructed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and outline the revisions that will be made to strengthen the theoretical and empirical sections.

read point-by-point responses

Referee: [Abstract / theoretical claims] Abstract and theoretical development: the central claims of consistency, asymptotic normality, and a finite-sample bound are asserted without any derivation, set of identifying assumptions, moment conditions, or proof sketch. Because these properties are the load-bearing contribution distinguishing the method from naive substitution, their absence makes it impossible to verify the result or its scope.

Authors: We acknowledge the omission of explicit derivations in the submitted manuscript. The consistency and asymptotic normality results are obtained under the identifying assumption that LLM responses satisfy E[LLM response | covariates, human response] = human response + bias term, where the bias is a function of observable features. Standard two-sample semiparametric arguments then deliver consistency, with asymptotic normality following from a joint central limit theorem on the combined estimating equations. A finite-sample bound follows from concentration inequalities applied to the bias-corrected estimator. We will insert a new theoretical section containing the full set of assumptions, moment conditions, and proof sketch. revision: yes
Referee: [Method / augmentation procedure] Augmentation procedure: the method is described as exploiting a statistical relationship between LLM-generated and human responses to correct bias, yet no explicit parametric form, conditional-expectation model, or bias-correction term is supplied. Without this, it cannot be shown that consistency holds for the proposed estimator while failing for naive substitution, as required by the skeptic’s concern.

Authors: The augmentation models the conditional expectation E[human response | LLM response, covariates] via a linear specification estimated on the paired subsample; the resulting bias-correction term is subtracted from LLM predictions before they enter the conjoint likelihood. This explicit form ensures the augmented estimator remains consistent for the human parameter while naive substitution is inconsistent under nonzero LLM bias. The revised manuscript will state the parametric model, the estimation procedure for the correction term, and the resulting estimating equations. revision: yes
Referee: [Empirical studies] Empirical validation: the reported savings range (24.9%–79.8%) is presented without error bars, sample sizes, exclusion rules, or details on how the finite-sample bound was evaluated. These omissions directly affect the claim that the method outperforms naive substitution in reducing estimation error.

Authors: We agree that these implementation details are required for assessment. The vaccine study used 500 human respondents and the car study used 300; savings were computed via bootstrap standard errors on the mean squared error of the preference parameters. Exclusion followed standard conjoint protocols (incomplete or straight-line responses removed). The finite-sample bound was evaluated by plugging the estimated bias variance into the derived inequality. The revision will add a table with sample sizes, exclusion counts, bootstrap standard errors on the savings figures, and the numerical evaluation of the bound. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The abstract and provided text present the consistency and asymptotic normality of the augmented estimator as consequences of a statistical data augmentation procedure grounded in general properties of estimators, without any equations, fitted parameters, or self-citations that reduce the claimed result to a quantity defined by the same inputs. No load-bearing step equates the prediction to a fit by construction or imports uniqueness via author overlap. The framework is described as deriving from external statistical theory applied to the integration of LLM and real data, rendering the central claim self-contained against benchmarks outside the paper's own fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the central claim rests on an unstated modeling relationship between LLM and human responses that is treated as domain_assumption rather than derived.

axioms (1)

domain assumption A statistical relationship exists between LLM-generated and human responses that permits bias correction via augmentation while preserving consistency.
Required for the claim that the method yields consistent estimators unlike naive substitution.

pith-pipeline@v0.9.0 · 5845 in / 1184 out tokens · 36097 ms · 2026-05-23T07:07:23.705676+00:00 · methodology

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys
cs.AI 2026-04 unverdicted novelty 7.0

A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
Adaptive Budget Allocation in LLM-Augmented Surveys
cs.LG 2026-04 unverdicted novelty 7.0

An adaptive budget allocation algorithm for LLM-augmented surveys learns question-level LLM reliability on the fly from human labels and reduces labeling waste from 10-12% to 2-6% compared to uniform allocation.
Generative Augmented Inference
cs.LG 2026-04 unverdicted novelty 6.0

GAI uses orthogonal moment conditions to integrate arbitrary AI-generated auxiliary data into human-label models, delivering consistent estimates, asymptotic normality, and a safe-default efficiency improvement over h...
How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective
stat.ME 2025-02 unverdicted novelty 6.0

A data-driven method adaptively selects the number of LLM-simulated responses to form confidence sets with nominal coverage for human survey parameters and equates that number to the LLM's effective human-equivalent s...

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 4 Pith papers · 6 internal anchors

[1]

Allenby, Greg M, Peter E Rossi. 2006. Hierarchical bayes models. The handbook of marketing research: Uses, misuses, and future advances\/ 418--440

work page 2006
[2]

Argyle, Lisa P, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, David Wingate. 2023. Out of one, many: Using language models to simulate human samples. Political Analysis\/ 31 (3) 337--351

work page 2023
[3]

Bastani, Hamsa, Dennis J Zhang, Heng Zhang. 2022. Applied machine learning in operations management. Innovative Technology at the Interface of Finance and Operations: Volume I\/ 189--222

work page 2022
[4]

Beltagy, Iz, Kyle Lo, Arman Cohan. 2019. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676\/

work page arXiv 2019
[5]

Bound, John, Charles Brown, Nancy Mathiowetz. 2001. Measurement error in survey data. Handbook of econometrics\/ , vol. 5. Elsevier, 3705--3843

work page 2001
[6]

Brand, James, Ayelet Israeli, Donald Ngwe. 2023. Using GPT for market research. Available at SSRN 4395751\/

work page 2023
[7]

Brown, Tom B. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165\/

work page internal anchor Pith review Pith/arXiv arXiv 2020
[8]

Chen, Yiting, Tracy Xiao Liu, You Shan, Songfa Zhong. 2023. The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences\/ 120 (51) e2316205120

work page 2023
[9]

Choi, Tsan-Ming, Subodha Kumar, Xiaohang Yue, Hau-Ling Chan. 2022. Disruptive technologies and operations management in the industry 4.0 era and beyond. Production and Operations Management\/ 31 (1) 9--31

work page 2022
[10]

Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on information theory\/ 2 (3) 113--124

work page 1956
[11]

Connell, Paul, Jonathan H Choi. 2024. Estimating and correcting for misclassification error in empirical textual research. Available at SSRN\/

work page 2024
[12]

Devlin, Jacob. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805\/

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Diederik, P Kingma. 2014. Adam: A method for stochastic optimization. (No Title)\/

work page 2014
[14]

Dzyabura, Daria, Srikanth Jagabathula. 2018. Offline assortment optimization in the presence of an online channel. Management Science\/ 64 (6) 2767--2786

work page 2018
[15]

Eggers, Felix, Henrik Sattler, Thorsten Teichert, Franziska V \"o lckner. 2021. Choice-based conjoint analysis. Handbook of market research\/ . Springer, 781--819

work page 2021
[16]

Girotra, Karan, Lennart Meincke, Christian Terwiesch, Karl T Ulrich. 2023. Ideas are dimes a dozen: Large language models for idea generation in innovation. Available at SSRN 4526071\/

work page 2023
[17]

Goli, Ali, Amandeep Singh. 2024. Frontiers: Can large language models capture human preferences? Marketing Science\/

work page 2024
[18]

Green, Paul E, Venkat Srinivasan. 1990. Conjoint analysis in marketing: new developments with implications for research and practice. Journal of marketing\/ 54 (4) 3--19

work page 1990
[19]

Green, Paul E, Venkatachary Srinivasan. 1978. Conjoint analysis in consumer research: issues and outlook. Journal of consumer research\/ 5 (2) 103--123

work page 1978
[20]

Gui, George, Olivier Toubia. 2023. The challenge of using llms to simulate human behavior: A causal inference perspective. arXiv preprint arXiv:2312.15524\/

work page arXiv 2023
[21]

Gururangan, Suchin, Ana Marasovi \'c , Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A Smith. 2020. Don't stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964\/

work page arXiv 2020
[22]

Hair Jr, Joe, Michael Page, Niek Brunsveld. 2019. Essentials of business research methods\/ . Routledge

work page 2019
[23]

Hinton, Geoffrey. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531\/

work page internal anchor Pith review Pith/arXiv arXiv 2015
[24]

Horton, John J. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Tech. rep., National Bureau of Economic Research

work page 2023
[25]

Huang, Yue, Zhengqing Yuan, Yujun Zhou, Kehan Guo, Xiangqi Wang, Haomin Zhuang, Weixiang Sun, Lichao Sun, Jindong Wang, Yanfang Ye, et al. 2024. Social science meets llms: How reliable are large language models in social simulations? arXiv preprint arXiv:2410.23426\/

work page arXiv 2024
[26]

HuggingFace. 2024. meta-llama. https://huggingface.co/meta-llama/Meta-Llama-3-8B#: :text=Training Accessed: 08/31/2024

work page 2024
[27]

Kessels, Roselinde, Peter Goos, Martina Vandebroek. 2008. Optimal designs for conjoint experiments. Computational statistics & data analysis\/ 52 (5) 2369--2387

work page 2008
[28]

Kohli, Rajeev, Ramamirtham Sukumar. 1990. Heuristics for product-line design using conjoint analysis. Management Science\/ 36 (12) 1464--1478

work page 1990
[29]

Brownstein, Yulin Hswen, Brian T

Kreps, Sarah, Sandip Prasad, John S. Brownstein, Yulin Hswen, Brian T. Garibaldi, Baobao Zhang, Douglas L. Kriner. 2020. Factors associated with us adults’ likelihood of accepting covid-19 vaccination. JAMA Network Open\/ 3 (10) e2025594--e2025594

work page 2020
[30]

Ludwig, Jens, Sendhil Mullainathan, Ashesh Rambachan. 2024. Large language models: An applied econometric framework. arXiv preprint arXiv:2412.07031\/

work page arXiv 2024
[31]

Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian. 2023. A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435\/

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Newey, Whitney K, Daniel McFadden. 1994. Large sample estimation and hypothesis testing. Handbook of econometrics\/ 4 2111--2245

work page 1994
[33]

Olsen, Tava Lennon, Brian Tomlin. 2020. Industry 4.0: Opportunities and challenges for operations management. Manufacturing & Service Operations Management\/ 22 (1) 113--122

work page 2020
[34]

OpenAI, R. 2023. Gpt-4 technical report. arxiv 2303.08774. View in Article\/ 2 (5)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Pan, Sinno Jialin, Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering\/ 22 (10) 1345--1359

work page 2009
[36]

Parthasarathy, Venkatesh Balavadhani, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid. 2024. The ultimate guide to fine-tuning llms from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. arXiv preprint arXiv:2408.13296\/

work page arXiv 2024
[37]

Peng, Andrew, John Allard, Steven Heidel. 2024. Fine-tuning now available for GPT -4o. https://openai.com/index/gpt-4o-fine-tuning/. Accessed: 2024-12-15

work page 2024
[38]

Radford, A. 2018. Improving language understanding by generative pre-training

work page 2018
[39]

Shane, Scott A, Karl T Ulrich. 2004. 50th anniversary article: Technological innovation, product development, and entrepreneurship in management science. Management science\/ 50 (2) 133--144

work page 2004
[40]

Solomon, Michael R. 2020. Consumer behavior: Buying, having, and being\/ . Pearson

work page 2020
[41]

Spencer, Vic. 2019. Choice modeling sports cars. https://github.com/spensorflow/Marketing-Analytics---Choice-Modeling-Sports-Car-Sales. Accessed: 2024-10-09

work page 2019
[42]

Sutskever, I. 2014. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215\/

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

Terwiesch, Christian. 2019. Om forum—empirical research in operations management: From field studies to analyzing digital exhaust. Manufacturing & Service Operations Management\/ 21 (4) 713--722

work page 2019
[44]

Van der Vaart, Aad W. 2000. Asymptotic statistics\/ , vol. 3. Cambridge university press

work page 2000
[45]

Vaswani, A. 2017. Attention is all you need. Advances in Neural Information Processing Systems\/

work page 2017
[46]

Wang, Xinfang, Jeffrey D Camm, David J Curry. 2009. A branch-and-price approach to the share-of-choice product line design problem. Management Science\/ 55 (10) 1718--1728

work page 2009
[47]

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems\/ 35 24824--24837

work page 2022
[48]

Yang, Kaiqi, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu. 2024. Are large language models (llms) good social predictors? arXiv preprint arXiv:2402.12620\/

work page arXiv 2024
[49]

Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, Karthik Narasimhan. 2024. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems\/ 36

work page 2024
[50]

Yoo, Youngjin, Ola Henfridsson, Jannis Kallinikos, Robert Gregory, Gordon Burtch, Sutirtha Chatterjee, Suprateek Sarker. 2024. The next frontiers of digital innovation research. Information Systems Research\/

work page 2024
[51]

Zhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, Qing He. 2020. A comprehensive survey on transfer learning. Proceedings of the IEEE\/ 109 (1) 43--76

work page 2020
[52]

Ziems, Caleb, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang. 2024. Can large language models transform computational social science? Computational Linguistics\/ 50 (1) 237--291

work page 2024
[53]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page
[54]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

work page

[1] [1]

Allenby, Greg M, Peter E Rossi. 2006. Hierarchical bayes models. The handbook of marketing research: Uses, misuses, and future advances\/ 418--440

work page 2006

[2] [2]

Argyle, Lisa P, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, David Wingate. 2023. Out of one, many: Using language models to simulate human samples. Political Analysis\/ 31 (3) 337--351

work page 2023

[3] [3]

Bastani, Hamsa, Dennis J Zhang, Heng Zhang. 2022. Applied machine learning in operations management. Innovative Technology at the Interface of Finance and Operations: Volume I\/ 189--222

work page 2022

[4] [4]

Beltagy, Iz, Kyle Lo, Arman Cohan. 2019. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676\/

work page arXiv 2019

[5] [5]

Bound, John, Charles Brown, Nancy Mathiowetz. 2001. Measurement error in survey data. Handbook of econometrics\/ , vol. 5. Elsevier, 3705--3843

work page 2001

[6] [6]

Brand, James, Ayelet Israeli, Donald Ngwe. 2023. Using GPT for market research. Available at SSRN 4395751\/

work page 2023

[7] [7]

Brown, Tom B. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165\/

work page internal anchor Pith review Pith/arXiv arXiv 2020

[8] [8]

Chen, Yiting, Tracy Xiao Liu, You Shan, Songfa Zhong. 2023. The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences\/ 120 (51) e2316205120

work page 2023

[9] [9]

Choi, Tsan-Ming, Subodha Kumar, Xiaohang Yue, Hau-Ling Chan. 2022. Disruptive technologies and operations management in the industry 4.0 era and beyond. Production and Operations Management\/ 31 (1) 9--31

work page 2022

[10] [10]

Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on information theory\/ 2 (3) 113--124

work page 1956

[11] [11]

Connell, Paul, Jonathan H Choi. 2024. Estimating and correcting for misclassification error in empirical textual research. Available at SSRN\/

work page 2024

[12] [12]

Devlin, Jacob. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805\/

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Diederik, P Kingma. 2014. Adam: A method for stochastic optimization. (No Title)\/

work page 2014

[14] [14]

Dzyabura, Daria, Srikanth Jagabathula. 2018. Offline assortment optimization in the presence of an online channel. Management Science\/ 64 (6) 2767--2786

work page 2018

[15] [15]

Eggers, Felix, Henrik Sattler, Thorsten Teichert, Franziska V \"o lckner. 2021. Choice-based conjoint analysis. Handbook of market research\/ . Springer, 781--819

work page 2021

[16] [16]

Girotra, Karan, Lennart Meincke, Christian Terwiesch, Karl T Ulrich. 2023. Ideas are dimes a dozen: Large language models for idea generation in innovation. Available at SSRN 4526071\/

work page 2023

[17] [17]

Goli, Ali, Amandeep Singh. 2024. Frontiers: Can large language models capture human preferences? Marketing Science\/

work page 2024

[18] [18]

Green, Paul E, Venkat Srinivasan. 1990. Conjoint analysis in marketing: new developments with implications for research and practice. Journal of marketing\/ 54 (4) 3--19

work page 1990

[19] [19]

Green, Paul E, Venkatachary Srinivasan. 1978. Conjoint analysis in consumer research: issues and outlook. Journal of consumer research\/ 5 (2) 103--123

work page 1978

[20] [20]

Gui, George, Olivier Toubia. 2023. The challenge of using llms to simulate human behavior: A causal inference perspective. arXiv preprint arXiv:2312.15524\/

work page arXiv 2023

[21] [21]

Gururangan, Suchin, Ana Marasovi \'c , Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A Smith. 2020. Don't stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964\/

work page arXiv 2020

[22] [22]

Hair Jr, Joe, Michael Page, Niek Brunsveld. 2019. Essentials of business research methods\/ . Routledge

work page 2019

[23] [23]

Hinton, Geoffrey. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531\/

work page internal anchor Pith review Pith/arXiv arXiv 2015

[24] [24]

Horton, John J. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Tech. rep., National Bureau of Economic Research

work page 2023

[25] [25]

Huang, Yue, Zhengqing Yuan, Yujun Zhou, Kehan Guo, Xiangqi Wang, Haomin Zhuang, Weixiang Sun, Lichao Sun, Jindong Wang, Yanfang Ye, et al. 2024. Social science meets llms: How reliable are large language models in social simulations? arXiv preprint arXiv:2410.23426\/

work page arXiv 2024

[26] [26]

HuggingFace. 2024. meta-llama. https://huggingface.co/meta-llama/Meta-Llama-3-8B#: :text=Training Accessed: 08/31/2024

work page 2024

[27] [27]

Kessels, Roselinde, Peter Goos, Martina Vandebroek. 2008. Optimal designs for conjoint experiments. Computational statistics & data analysis\/ 52 (5) 2369--2387

work page 2008

[28] [28]

Kohli, Rajeev, Ramamirtham Sukumar. 1990. Heuristics for product-line design using conjoint analysis. Management Science\/ 36 (12) 1464--1478

work page 1990

[29] [29]

Brownstein, Yulin Hswen, Brian T

Kreps, Sarah, Sandip Prasad, John S. Brownstein, Yulin Hswen, Brian T. Garibaldi, Baobao Zhang, Douglas L. Kriner. 2020. Factors associated with us adults’ likelihood of accepting covid-19 vaccination. JAMA Network Open\/ 3 (10) e2025594--e2025594

work page 2020

[30] [30]

Ludwig, Jens, Sendhil Mullainathan, Ashesh Rambachan. 2024. Large language models: An applied econometric framework. arXiv preprint arXiv:2412.07031\/

work page arXiv 2024

[31] [31]

Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian. 2023. A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435\/

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Newey, Whitney K, Daniel McFadden. 1994. Large sample estimation and hypothesis testing. Handbook of econometrics\/ 4 2111--2245

work page 1994

[33] [33]

Olsen, Tava Lennon, Brian Tomlin. 2020. Industry 4.0: Opportunities and challenges for operations management. Manufacturing & Service Operations Management\/ 22 (1) 113--122

work page 2020

[34] [34]

OpenAI, R. 2023. Gpt-4 technical report. arxiv 2303.08774. View in Article\/ 2 (5)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Pan, Sinno Jialin, Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering\/ 22 (10) 1345--1359

work page 2009

[36] [36]

Parthasarathy, Venkatesh Balavadhani, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid. 2024. The ultimate guide to fine-tuning llms from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. arXiv preprint arXiv:2408.13296\/

work page arXiv 2024

[37] [37]

Peng, Andrew, John Allard, Steven Heidel. 2024. Fine-tuning now available for GPT -4o. https://openai.com/index/gpt-4o-fine-tuning/. Accessed: 2024-12-15

work page 2024

[38] [38]

Radford, A. 2018. Improving language understanding by generative pre-training

work page 2018

[39] [39]

Shane, Scott A, Karl T Ulrich. 2004. 50th anniversary article: Technological innovation, product development, and entrepreneurship in management science. Management science\/ 50 (2) 133--144

work page 2004

[40] [40]

Solomon, Michael R. 2020. Consumer behavior: Buying, having, and being\/ . Pearson

work page 2020

[41] [41]

Spencer, Vic. 2019. Choice modeling sports cars. https://github.com/spensorflow/Marketing-Analytics---Choice-Modeling-Sports-Car-Sales. Accessed: 2024-10-09

work page 2019

[42] [42]

Sutskever, I. 2014. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215\/

work page internal anchor Pith review Pith/arXiv arXiv 2014

[43] [43]

Terwiesch, Christian. 2019. Om forum—empirical research in operations management: From field studies to analyzing digital exhaust. Manufacturing & Service Operations Management\/ 21 (4) 713--722

work page 2019

[44] [44]

Van der Vaart, Aad W. 2000. Asymptotic statistics\/ , vol. 3. Cambridge university press

work page 2000

[45] [45]

Vaswani, A. 2017. Attention is all you need. Advances in Neural Information Processing Systems\/

work page 2017

[46] [46]

Wang, Xinfang, Jeffrey D Camm, David J Curry. 2009. A branch-and-price approach to the share-of-choice product line design problem. Management Science\/ 55 (10) 1718--1728

work page 2009

[47] [47]

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems\/ 35 24824--24837

work page 2022

[48] [48]

Yang, Kaiqi, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu. 2024. Are large language models (llms) good social predictors? arXiv preprint arXiv:2402.12620\/

work page arXiv 2024

[49] [49]

Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, Karthik Narasimhan. 2024. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems\/ 36

work page 2024

[50] [50]

Yoo, Youngjin, Ola Henfridsson, Jannis Kallinikos, Robert Gregory, Gordon Burtch, Sutirtha Chatterjee, Suprateek Sarker. 2024. The next frontiers of digital innovation research. Information Systems Research\/

work page 2024

[51] [51]

Zhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, Qing He. 2020. A comprehensive survey on transfer learning. Proceedings of the IEEE\/ 109 (1) 43--76

work page 2020

[52] [52]

Ziems, Caleb, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang. 2024. Can large language models transform computational social science? Computational Linguistics\/ 50 (1) 237--291

work page 2024

[53] [53]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page

[54] [54]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

work page