Artificial Intelligence (AI), and more specifically Modern Large Language Models (LLMs) are changing the entire market research industry by enabling the generation of synthetic data, of such quality that it can augment, if not replace human respondents.
Around the world scientists are discovering and reporting how exactly precise and useful LLMs are in replicating human opinions.
This blog entry here aims to be a repository and to share interesting scientific research that’s moving the entire industry; and our team in building OpinioAI as a platform. I’ll include some of the interesting abstracts and quotes for easier digestion.
I wish to personally thank all the researchers working and publishing on the topic. They’re very interesting to read and the studies are extremely useful. Thank you!
The article will be constantly updated as new research is discovered. If you know some, please do share with me nikola [at] opinio.ai!
Research, studies & articles:
- Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2022). Out of one, many: using language models to simulate human samples. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2209.06899
Summary:
We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models.
We show that the “algorithmic bias” within one such tool — the GPT-3 language model — is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property “algorithmic fidelity” and explore its extent in GPT-3.
We create “silicon samples” by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States.
We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.
- Hämäläinen, P., Tavast, M., & Kunnari, A. (2023). Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. ACML Digital Library. https://doi.org/10.1145/3544548.3580688
- Savage, N. (2023). Synthetic data could be better than real data. Nature. https://doi.org/10.1038/d41586-023-01445-8
- Can AI chatbots replace human subjects in behavioral experiments? (n.d.). Science | AAAS. https://www.science.org/content/article/can-ai-chatbots-replace-human-subjects-behavioral-experiments
- Aher, G. (2022, August 18). Using large language models to simulate multiple humans and replicate human subject studies. arXiv.org. https://arxiv.org/abs/2208.10264
- Horton, J. J. (2023, January 18). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv.org. https://arxiv.org/abs/2301.07543
- Andreas, J. (2022, December 3). Language models as agent models. arXiv.org. https://arxiv.org/abs/2212.01681
- Brand, J., Israeli, A., & Ngwe, D. (2023). Using GPT for market research. Social Science Research Network. https://doi.org/10.2139/ssrn.4395751
Summary:
Large language models (LLMs) have quickly become popular as labor-augmenting tools for programming, writing, and many other processes that benefit from quick text generation. In this paper we explore the uses and benefits of LLMs for researchers and practitioners who aim to understand consumer preferences.
We focus on the distributional nature of LLM responses, and query the Generative Pre-trained Transformer 3.5 (GPT-3.5) model to generate hundreds of survey responses to each prompt. We offer two sets of results to illustrate our approach and assess it.
First, we show that GPT-3.5, a widely-used LLM, responds to sets of survey questions in ways that are consistent with economic theory and well-documented patterns of consumer behavior, including downward-sloping demand curves and state dependence.
Second, we show that estimates of willingness-to-pay for products and features generated by GPT-3.5 are of realistic magnitudes and match estimates from a recent study that elicited preferences from human consumers.
We also offer preliminary guidelines for how best to query information from GPT-3.5 for marketing purposes and discuss potential limitations.
- Jiang, G. (2022, May 20). Evaluating and inducing personality in pre-trained language models. arXiv.org. https://arxiv.org/abs/2206.07550
- Karra, S. R. (2022, April 25). Estimating the personality of White-Box language models. arXiv.org. https://arxiv.org/abs/2204.12000
- Artificial Intelligence from a Psychologist’s Point of View. (n.d.). Max Planck Institute for Biological Cybernetics Tübingen. https://www.kyb.tuebingen.mpg.de/679134/news_publication_19846421_transferred
- Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2022). Language Models for Automated Market Research: A new way to generate Perceptual maps. Social Science Research Network. https://doi.org/10.2139/ssrn.4241291
- Chu, E. (2023, March 28). Language models trained on media diets can predict public opinion. arXiv.org. https://arxiv.org/abs/2303.16779
Summary:
Public opinion reflects and shapes societal behavior, but the traditional survey-based tools to measure it are limited.
We introduce a novel approach to probe media diet models — language models adapted to online news, TV broadcast, or radio show content — that can emulate the opinions of subpopulations that have consumed a set of media.
To validate this method, we use as ground truth the opinions expressed in U.S. nationally representative surveys on COVID-19 and consumer confidence.
Our studies indicate that this approach is (1) predictive of human judgements found in survey response distributions and robust to phrasing and channels of media exposure, (2) more accurate at modeling people who follow media more closely, and (3) aligned with literature on which types of opinions are affected by media consumption.
- Marjieh, R. (2023, February 2). Large language models predict human sensory judgments across six modalities. arXiv.org. https://arxiv.org/abs/2302.01308v2
- Wilbanks, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008
- Jansen, B. J., Jung, S. G., & Salminen, J. (2023). Employing large language models in survey research. Natural Language Processing Journal, 4, 100020. https://doi.org/10.1016/j.nlp.2023.100020
- Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2024). Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Science. https://doi.org/10.1287/mksc.2023.0454
- Arora, N., Chakraborty, I., & Nishimura, Y. (2024). Hybrid Marketing Research: Large Language Models as an Assistant. Available at SSRN. https://doi.org/10.2139/ssrn.4683054
Summary:
An area within marketing that is well poised for adoption of large language models (LLMs) is marketing research. In this paper the authors empirically investigate how LLMs could potentially assist at different stages of the marketing research process.
They partnered with a Fortune 500 food company and replicated a qualitative and a quantitative study that the company conducted using GPT-4. The authors designed the system architecture and prompts necessary to create personas, ask questions, and obtain answers from synthetic respondents. Their findings suggest that LLMs present a big opportunity, especially for qualitative research.
The LLMs can help determine the profile of individuals to interview, generate synthetic respondents, interview them, and even moderate a depth interview. The LLM-assisted responses are superior in terms of depth and insight.
The authors conclude that the AI-human hybrid has great promise and LLMs could serve as an excellent collaborator/assistant for a qualitative marketing researcher. The findings for the quantitative study are less impressive.
The LLM correctly picked the answer direction and valence but does not recover the true response distributions well. In the future, approaches such as few-shot learning and fine-tuning may result in synthetic survey data that mimic human data more accurately.
- Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2024). Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Science. https://doi.org/10.1287/mksc.2023.0454
- Sarstedt, M., Adler, S. J., Rau, L., Schmitt, B. (2024.). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Wiley Online Library. https://doi.org/10.1002/mar.21982
- Robison, G. (2024, February 9). What’s ChatGPT’s favorite ice cream flavor? An investigation into synthetic respondents [Online forum post]. https://www.lesswrong.com/posts/2hsrPrsJNgNxBSFs4/what-s-chatgpt-s-favorite-ice-cream-flavor-an-investigation
- Sun, S., Lee, E., Nan, D., Zhao, X., Lee, W., Jansen, B. J., & Kim, J. H. (2024, February 28). Random Silicon Sampling: Simulating human Sub-Population opinion using a large language model based on Group-Level demographic information. arXiv.org. https://arxiv.org/abs/2402.18144
Large language models exhibit societal biases associated with demographic information, including race, gender, and others. Endowing such language models with personalities based on demographic data can enable generating opinions that align with those of humans.
Building on this idea, we propose “random silicon sampling,” a method to emulate the opinions of the human population sub-group. Our study analyzed 1) a language model that generates the survey responses that correspond with a human group based solely on its demographic distribution and 2) the applicability of our methodology across various demographic subgroups and thematic questions.
Through random silicon sampling and using only group-level demographic information, we discovered that language models can generate response distributions that are remarkably similar to the actual U.S. public opinion polls.
Moreover, we found that the replicability of language models varies depending on the demographic group and topic of the question, and this can be attributed to inherent societal biases in the models. Our findings demonstrate the feasibility of mirroring a group’s opinion using only demographic distribution and elucidate the effect of social biases in language models on such simulations.
- Kim, J., & Lee, B. (2023, May 16). AI-Augmented Surveys: Leveraging large language models and surveys for opinion prediction. arXiv.org. https://arxiv.org/abs/2305.09620
- Sarstedt, M., Adler, S. J., Rau, L. A., & Schmitt, B. H. (2024). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychology & Marketing. https://doi.org/10.1002/mar.21982
- Schoenegger, P., Tuminauskaite, I., Park, P. S., Bastos, R. V. S., & Tetlock, P. E. (2024, February 29). Wisdom of the Silicon Crowd: LLM ensemble prediction capabilities rival human crowd accuracy. arXiv.org. https://arxiv.org/abs/2402.19379
In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of 12 LLMs. We compare the aggregated LLM predictions on 31 binary questions to those of a crowd of 925 human forecasters from a three-month forecasting tournament.
Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark, and is not statistically different from the human crowd. We also observe a set of human-like biases in machine responses, such as an acquiescence effect and a tendency to favour round numbers.
In Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models’ forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%, though this leads to less accurate predictions than simply averaging human and machine forecasts.
Our results suggest that LLMs can achieve forecasting accuracy rivaling that of the human crowd: via the simple, practically applicable method of forecast aggregation.
- Arora, N., Chakraborty, I., & Nishimura, Y. (2024). EXPRESS: AI-Human Hybrids for Marketing Research: Leveraging LLMs as Collaborators. Journal of Marketing. https://doi.org/10.1177/00222429241276529
The authors’ central premise is that a human-LLM hybrid approach leads to efficiency and effectiveness gains in the marketing research process. In qualitative research, they show that LLMs can assist in both data generation and analysis; LLMs effectively create sample characteristics, generate synthetic respondents, and conduct and moderate in-depth interviews.
The AI-human hybrid generates information-rich, coherent data that surpasses human-only data in depth and insightfulness and matches human performance in data analysis tasks of generating themes and summaries. Evidence from expert judges shows that humans and LLMs possess complementary skills; the human-LLM hybrid outperforms its human-only or LLM-only counterpart.
For quantitative research, the LLM correctly picks the answer direction and valence, with the quality of synthetic data significantly improving through few-shot learning and retrieval-augmented generation. The authors demonstrate the value of the AI-human hybrid by collaborating with a Fortune 500 food company and replicating a 2019 qualitative and quantitative study using GPT-4.
For their empirical investigation, the authors design the system architecture and prompts to create personas, ask questions, and obtain responses from synthetic respondents. They provide roadmaps for integrating LLMs into qualitative and quantitative marketing research and conclude that LLMs serve as valuable collaborators in the insight generation process.
- Kumar, A., & Lakkaraju, H. (2024, April 11). Manipulating large language models to increase product visibility. arXiv.org. https://arxiv.org/abs/2404.07981
- Viglia, G., Adler, S. J., Miltgen, C. L., & Sarstedt, M. (2024). The use of synthetic data in tourism. Annals of Tourism Research, 108, 103819. https://doi.org/10.1016/j.annals.2024.103819
- Park, J. S., Zou, C. Q., Shaw, A., Hill, B. M., Cai, C., Morris, M. R., Willer, R., Liang, P., & Bernstein, M. S. (2024, November 15). Generative agent simulations of 1,000 people. arXiv.org. https://arxiv.org/abs/2411.10109
We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals–applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent.
The generative agents replicate participants’ responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications
0 Comments