Send a survey to 5,000 customers and you get 300 responses, 40 of which bother with the open-text field. Let an AI simulate synthetic users and you get perfectly articulated answers that no real person would ever give. Run an in-depth interview and you get depth - for 15 out of 5,000 customers, because that is all the budget allows. Three methods, three promises, none of them delivered.

Key takeaways: AI-powered research is not one method but three fundamentally different approaches, each with distinct strengths and blind spots. AI-enhanced surveys scale breadth; AI interviews with real people scale depth; synthetic users scale speed. This article provides a decision framework with matrix: when to use which method, where each one fails, and how hybrid designs compensate for the weaknesses.

I have been hearing the same debate for months. One camp says AI-powered research finally makes qualitative work scalable. The other says it replaces understanding with data production. What gets lost in this argument: for most teams, the alternative to AI-assisted research is not the perfect interview run by a seasoned moderator. The alternative is no research at all - because the budget for ten in-depth interviews does not exist, because the sprint will not wait, and because the product owner ends up deciding without data. The question is not "AI or craft" but "methodological depth that actually happens in day-to-day project work, or insight that only exists on paper."

Three approaches, one goal

The first approach is the most familiar: the digital survey, enhanced by AI. Qualtrics, SurveyMonkey, Typeform - these platforms now use machine learning for adaptive question logic, automated analysis, and sentiment scoring of open-ended responses. The promise: more insight from the same questionnaires.

The second approach takes a different path. Instead of predefined questions with checkboxes, an AI conducts an open conversation with a real person. It asks a question, listens, probes deeper - five, six layers in, until the actual reason is on the table. Asynchronous, anonymous, in the participant's own language. No scheduled appointment, no moderator, no discussion guide that grows stale after the third interview.

The most radical is the third approach: it dispenses with real people entirely. Synthetic users - AI-generated personas - answer research questions based on training data. A thousand interviews in a single afternoon, no recruitment, no incentives. The fastest and cheapest option - and the riskiest.

All three promise to make qualitative insight more scalable. But the promises are not equivalent. Surveys scale breadth; AI interviews scale depth; synthetic simulation scales speed. Anyone who lumps all three under "AI research" makes the wrong choice - because they pick the wrong tool for the wrong question.

What 80,000 real conversations reveal

That AI-led interviews with real people are more than a niche has been demonstrated at an unprecedented scale. In March 2026, Anthropic published the largest qualitative study to date: 80,508 interviews in 70 languages, conducted across 159 countries - with an AI as the interviewer. No questionnaires, no checkboxes; open conversations that dynamically adapted to participants' answers.

80,000 qualitative interviews would be unthinkable with human moderators - not in years, not with an unlimited budget. The AI did not just scale; it also collected data in languages and regions that traditional research systematically underrepresents. What was surprising: the depth of responses. Participants talked about hopes and fears with a candor rarely seen in standardized surveys.

What the study does not prove is equally telling. Anthropic surveyed its own users - people who already interact with AI and accept an AI as a conversation partner. The study comes from the maker of the AI used; the results are impressive but not independently verified. Whether the same method works with AI-skeptical audiences, older demographics, or highly sensitive contexts such as clinical research remains an open question.

Where each approach fails

Every method has a blind spot, and it sits exactly where the next method has its strength.

AI-enhanced surveys deliver distributions but not motives. Adaptive question logic makes the questionnaire smarter, but it remains a questionnaire. No algorithm turns "How satisfied are you on a scale of 1 to 10?" into an answer that explains why someone gives a 6 instead of an 8. That "why" is precisely the lever product decisions hinge on. AI-led interviews deliver it - two to three targeted follow-ups per core question create a conversational depth that questionnaires structurally cannot reach.

But here is the catch: synthetic users promise the same depth without the effort. And this is where it gets risky. They produce what sounds plausible, not what is true. Their answers skew positive, predictable, mainstream. Edge cases, contradictions, the creative irrationality of real human behavior - all of it is missing. Anyone basing product decisions on synthetic data optimizes for a world that does not exist.

The limitations of AI-led interviews with real people are discussed less often. They require participants to be willing to talk to an AI - and to feel comfortable enough to be honest. With tech-savvy audiences, this works well; with a 62-year-old administrator who is interacting with a chatbot for the first time, it is far from a given. Data quality depends directly on the quality of the interview guide, and a poorly configured AI interview produces results just as shallow as a poorly written questionnaire. Facial expressions, tone of voice, hesitation - everything a skilled moderator reads is lost in text-based AI conversations.

None of this means the methods are interchangeable. A survey that delivers a number but no "why" is not a substitute for an interview - it is a different form of insight with different boundaries. The real question for research teams is therefore not "which method is best" but "which method delivers the most reliable insight for this specific question, given the constraints I have."

The decision matrix

The choice of method depends on four factors: question type, timeline, budget, and validity requirements.

Criterion	Survey (AI-enhanced)	AI interview (real people)	Synthetic users
Question type	Distributions, benchmarks, trends	Motives, experiences, the "why"	Hypotheses, early exploration
Timeline	1-2 weeks	2-5 days	Hours
Relative cost	Medium (platform + field time)	Low (tool + recruitment)	Lowest (tool only)
Sample size	Hundreds to thousands	Dozens to hundreds	Unlimited (simulated)
Validity	High for quantitative claims	High for qualitative insight	Low to speculative
Blind spot	No access to the "why"	Depends on willingness to engage	No grounding in reality
Best use case	NPS tracking, feature prioritization	Churn analysis, needs research, change monitoring	Guide testing, brainstorming, hypothesis generation

A team with three days and the question "Why are our best people leaving?" needs an AI interview, not a synthetic user panel. A team that wants to know how many customers are aware of a feature needs a survey, not a depth interview.

Hybrid approaches work best

In practice, the strongest research designs combine all three methods - not as a checklist but as a sequence where each phase informs the next.

It starts with synthetic users as sparring partners: stress-test the interview guide, check hypotheses for plausibility, find obvious gaps in the research design. Their job is not research but preparation. From what they deliver, the questions emerge that will be put to real people - 50 to 200 asynchronous AI interviews that uncover motives, stories, and unexpected connections in days rather than weeks, the kind no algorithm can predict. And when those conversations reveal three central sources of frustration, a question arises that only a survey can answer: how do these patterns distribute across the entire customer base?

This sequence - simulate, converse, quantify - sounds coherent in theory. In practice, it rarely gets executed end to end. Not because discipline is lacking but because project plans, budget approvals, and sprint cycles set boundaries that even the most motivated team cannot push past. The synthetic users deliver results in hours; the next milestone is in two weeks. The interviews get dropped because scope shifted before they could start.

Anyone serious about the hybrid approach must therefore make a decision before kicking off: which question gets answered by which method - and which budget is reserved for it before the project plan can claw it back.

FAQ

Are AI-led interviews GDPR-compliant?

Yes, provided the platform ensures data processing within the EU, collects consent properly, and stores responses pseudonymized or anonymized. The critical factor is whether personal data is transmitted to AI models and on what legal basis. The requirements do not differ fundamentally from those for traditional online surveys, but the technical implementation demands careful scrutiny.

How valid are answers people give to an AI?

Research suggests that the absence of a human counterpart increases honesty. Participants in AI-led conversations report more openly on sensitive topics - from job dissatisfaction to product frustration - because there is no social judgment. The caveat: this openness requires that the person feels comfortable with the format.

Can synthetic users replace real interviews?

No. They can generate hypotheses and test interview guides, but their answers reflect probabilities in training data, not lived experience. For decisions that need to be grounded in the behavior of real people, they are not fit for purpose.

How many AI interviews do I need for reliable results?

The number depends on the research objective, not a statistical formula. For exploratory studies, 20 to 30 conversations are often enough to identify the core themes. For studies aiming to compare patterns across segments, 100 to 200 interviews are a solid benchmark. The advantage over traditional interviews: the marginal cost per additional conversation is minimal.

What is the difference between AI-led interviews and AI-enhanced surveys?

AI-enhanced surveys use machine learning to analyze questionnaires more intelligently, but the answer structure remains predefined. AI-led interviews are open conversations in which the AI listens, asks follow-up questions, and follows the participant's train of thought. The difference lies in depth of insight: surveys deliver distributions, interviews deliver motives.

Sources

Anthropic: What 81,000 people want from AI - 80,508 interviews in 70 languages, 159 countries, March 2026

Try it yourself

Frameworks help you decide. But what an AI-led interview actually feels like - whether the follow-up questions are relevant, whether you answer more deeply than in a survey, whether the anonymity truly makes a difference - that you can only experience firsthand. Twenty minutes, no login, right in your browser.

Join now ->

AI Survey 2026: A Framework for Research Teams