Simulating Social Perceptions with LLMs: From a Policy Case to a Full-Pipeline Benchmark

February 14, 2026 Zhuoren Jiang

Zhuoren Jiang and Chenxi Lin

People can experience the same public policy very differently. Some feel their lives are improving; others feel left behind. This is not simply disagreement, it reflects a core part of policy impact that is hard to capture with objective indicators alone: public perception. Traditional social surveys are designed for this purpose, but they are often slow, expensive, and hard to adapt quickly. They also face challenges such as fixed question formats, limited flexibility, and cross-cultural comparability.

With the rise of large language models (LLMs), many researchers have begun asking: Could we use LLMs as “virtual respondents” to reduce the cost and time of surveys? However, significant challenges remain. Off-the-shelf LLMs tend to reflect the voices of digitally active, highly educated populations more than those of underrepresented groups, which can amplify representational bias. Even more importantly, many existing efforts evaluate LLMs only on simplified, structured questionnaire tasks, whereas real social surveys also rely heavily on interviews, follow-up questions, and contextual reasoning.

—Could we use LLMs as “virtual respondents” to reduce the cost and time of surveys?—

Two recent studies offer a more grounded answer: LLMs should not “replace” human surveys, but they can become more reliable survey assistants if we (1) align them to a domain and (2) evaluate them on the entire survey pipeline. One paper demonstrates this with a real policy topic regarding perceptions of China’s “common prosperity,” a major initiative aimed at narrowing the wealth gap. The other builds a general benchmark and training framework called AlignSurvey that replicates the full social survey workflow within the LLM evaluation and alignment setting.

1) Start with a real policy question: how do people perceive “common prosperity”?

The common-prosperity study argues that public perception is not merely an optional add-on to policy evaluation but a key form of evidence. To study this with LLMs, the authors propose a domain alignment approach that mirrors established social science practices by combining qualitative interviews with quantitative questionnaires.

Concretely, the paper designs a dual-track setup: an expert-led semi-structured interview track (qualitative), and a structured questionnaire track (quantitative).

To make LLM outputs more interpretable and less “generic,” the dataset includes attitude labels and expert-annotated reasoning chains, aiming to improve both explainability and performance across population subgroups.

The method is then organized into a six-task pipeline, spanning social background and interview simulation, and individual and group-level attitude and questionnaire simulation. The domain-aligned models are presented in a clear family (CPPL, CPPQ, CPPM), each built by fine-tuning different open-source backbones.

In plain terms: the goal is not for a model to “guess answers,” but to produce responses that are consistent with the social context and can scale from individual-level reasoning to group-level patterns.

2) How do we verify effectiveness? By analyzing distributions instead of relying solely on individual responses

In real surveys, what matters is not only whether a single respondent chooses the “right” option, but whether the aggregated distribution across groups matches reality. For example, do rural and urban populations show different response patterns? Do different income groups diverge in predictable ways?

To evaluate this, the common-prosperity study uses Wasserstein distance (WD) as a measure of how close the model-generated group distribution is to the real survey distribution. The result is striking: general-purpose LLMs can be far from the real group distribution, while domain-aligned models substantially reduce the gap (e.g., WD around 0.30–0.46 for aligned models vs. 1.44 for a zero-shot GPT-5 setting in the reported table).

Even more policy-relevant: improvements are often larger for underrepresented groups. The paper reports notable gains for low-income, informally employed, and lower-education populations, among others.

This matters because policy evaluation often fails exactly where it hurts most: when averages look fine but vulnerable groups are not properly represented.

3) From one topic to a general solution: AlignSurvey replicates the full survey pipeline

The common-prosperity work shows what domain alignment can do in one high-stakes topic. But social science needs something broader: a reusable framework that can be applied across topics, populations, and cultures.

This is where AlignSurvey comes in. Instead of treating “survey simulation” as a single task, AlignSurvey decomposes professional social surveys into four stages and maps them into four modeling tasks:

social role modeling,
semi-structured interview modeling,
attitude/stance modeling,
structured questionnaire response modeling.

The motivation is explicit: focusing only on structured questionnaire items is insufficient, because real surveys include interviews, probing questions, and context-dependent reasoning.

To support this, AlignSurvey builds a layered dataset system: Social Foundation Corpus with 44,000+ interview dialogues and 400,000+ structured survey records, covering multiple countries. Entire-Pipeline Survey Datasets, centered on AlignSurvey-Expert (ASE) with 161 semi-structured interviews and 1,679 questionnaires, plus demographic metadata and expert-annotated attitude and reasoning chains. Additional cross-cultural validation using GSS (US) and CHIP (China).

AlignSurvey also emphasizes that evaluation must include demographic diversity and fairness, not just average scores.

4) Alignment outperforms “just use a bigger model”: SurveyLM and a two-stage alignment strategy

AlignSurvey proposes a practical training path via SurveyLM, trained with a two-stage alignment approach: first align on the Social Foundation Corpus to acquire survey-relevant social concepts and cross-cultural expression patterns, then fine-tune on the four pipeline tasks with supervised signals.

The paper reports consistent improvements: SurveyLM brings roughly 10–15 percentage points gains on multiple tasks and reduces distribution gaps measured by WD. It also helps mitigate a common LLM behavior in surveys: over-selecting the “middle option” to stay safe.

Most importantly, alignment improves performance for underrepresented populations, including rural groups, older adults (e.g., 76+), self-employed respondents, and lower-to-middle income groups, which in turn shrinks demographic disparities.

5) Three takeaways for public governance and policy research

Takeaway 1: LLMs can assist social surveys, but only with full-pipeline evaluation and domain/pipeline alignment. A “questionnaire-only” benchmark misses what surveys actually do.

Takeaway 2: If you want to simulate society, evaluate distributions—not just individual answers. Distribution metrics like WD make group-level fidelity testable and accountable.

Takeaway 3: Fairness is not optional in survey settings. Without alignment, LLMs can amplify representational bias; with alignment, improvements can be strongest where data are historically weakest.

Conclusion: Toward “perception-aware governance”

Policy outcomes are measured not only by GDP, spending, or coverage rates, but also by how people perceive fairness, opportunity, and security. The future is unlikely to involve LLMs replacing surveys entirely. Instead, a more realistic and responsible path integrates human surveys with aligned LLMs, using grounded empirical data as the foundation and aligned models as scalable tools for rapid pre-testing, subgroup diagnosis, and timely monitoring, all while being continuously audited through full-pipeline benchmarks.

In that sense, the message of these two papers is clear, suggesting that while LLMs should not speak for people, they can be trained to help us measure perceptions more quickly, transparently, and fairly.

This article is based on:

Jiang, Z., Huang, B., Ge, J., Lin, C., Xu, Y., & Yu, J. (2025). Simulating social perception with large language models: perceptions of China’s common prosperity. Journal of Chinese Governance, 1-29.
Lin, C., Yuan, W., Jiang, Z., Huang, B., Zhang, R., Ge, J., … & Yu, J. (2025). AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys. arXiv:2511.07871. (Accepted for publication at AAAI 2026)

Cite this article in APA as: Jiang, Z. & Lin, C. (2026, February 20). Simulating social perceptions with LLMs: From a policy case to a full-pipeline benchmark. Information Matters. https://informationmatters.org/2026/02/simulating-social-perceptions-with-llms-from-a-policy-case-to-a-full-pipeline-benchmark/

Authors

Zhuoren Jiang

Dr. Zhuoren Jiang is an Assistant Professor in the Department of Information Resource Management at the School of Public Affairs, Zhejiang University. He has served as a consultant at Alibaba's DAMO Academy, where he contributed to advancing language technology. Currently, he collaborates with Tongyi Lab, focusing on the development of large language models. Dr. Jiang has published over 70 peer-reviewed papers in leading international journals and conferences, including Journal of Informetrics, Journal of the Association for Information Science and Technology, and Information Processing and Management, along with top-tier computer science conferences like SIGIR, WWW, ACL, AAAI, EMNLP, CIKM, and WSDM. He has led multiple research projects funded by esteemed organizations such as the National Natural Science Foundation of China. His contributions have been recognized with several accolades, including the Best Poster Award at the 2013 ACM/IEEE-CS Joint Conference on Digital Libraries and a nomination for Best Short Paper at the 2024 ACM SIGIR Conference. His research interests span computational social science, graph neural networks, and artificial intelligence applications, and he is also involved in various professional organizations, contributing to the advancement of information resource management and AI.

View all posts Assistant Professor
Chenxi Lin

Chenxi Lin is a Public Information Resource Management PhD candidate in the School of Public Affairs at Zhejiang University. Her research interests focus on computational social science and artificial intelligence applications.

View all posts PhD Student