How synthetic data models are shaping the future of research
“Right now is a pivotal time in market research,” says Patrick Estave, Director of Data Innovation at Cint. “We have an opportunity to change both how we think about research and the infrastructure that supports it.”
Like many in the world of contemporary market research, Estave sees the potential in synthetic data — not as a means of replacing human respondents, of course, but as a way of augmenting existing data to build out increasingly sophisticated models that benefit customers across different industries.
Synthetic data is data which has been created using algorithms, simulations, and other generative AI techniques with the aim of replicating the patterns found in datasets collated from human responses.
Synthetic models are proving ever more useful in an era where speed is of the essence: organizations of all kinds need access to data faster than ever before.
We spoke with Estave about the benefits of synthetic boosting, the importance of solid prompts, and how synthetic data can help with continuous research.
The ins and outs of synthetic boosting
“When I think about traditional data, it’s two dimensional: rows and columns. Rows are your respondents, and columns are the questions you’re asking them,” Estave says. “We have what we call boosting models which are used for row generation or data imputation.”
Imagine a situation where you require 300 respondents to solidify a hypothesis or research point but you’re only able to get 100 completes. Synthetic boosting could be the answer as it allows researchers to expand on niche data points.
“Synthetic boosting can increase the size of a segmentation by looking at patterns relevant to the underlying data as a whole, as well as the targeted segmentation.,” says Estave. “Boosting creates probabilistic responses, and some of our testing has shown incredibly high accuracy levels when compared to ground truth. The result is high fidelity, synthetic data sets.”
What is synthetic column boosting used for?
When it comes to columns, boosting is also an option. “On this front, one interesting application of use is taking respondents that you already have, and the data associated with them, and creating new responses to questions that these people have never seen.”” he says. “Sounds like magic, right?”

One use case for column boosting is building personas. By aggregating thousands of data points into a single “representative” archetype, researchers can query a digital profile as if they were interviewing a real person.
One use case for column boosting is building personas. By aggregating thousands of data points into a single “representative” archetype, researchers can query a digital profile as if they were interviewing a real person.
“Think of it as one person representing a hundred or a thousand data points,” explains Estave. “You can start to ask that archetype questions specific to the data to dig into preferences. It’s an incredible tool for quick concept testing; it allows us to ‘poke the data’ we already have and see if a concept makes sense before we spend the time and budget on primary fieldwork.”
Synthetic modeling also enables “data cloning” at an individualized level. This is a game-changer for rescuing studies where logic failures or missing variables would otherwise require a costly re-contact phase. Since re-contacting the same respondents in a marketplace environment often yields low response rates, synthetic cloning offers a probabilistic alternative.
“It gives you a quick turnaround on how people would respond to those questions that either failed or were missed,” Estave says. “Because we’re leveraging Large Language Models (LLMs), we get those interesting ‘leanings’ and variations that pick up on how a person should be responding based on the data we already have.”
However, even as these tools evolve, Cint remains committed to a human-first approach. Synthetic models are only as good as the human responses that power them. “The hybrid approach is highly necessary,” Estave concludes. “You can’t build synthetic on synthetic. We’re always going to need people involved to understand the human side of things; what they’re thinking and why they’re making decisions.”
Why synthetic modelling needs solid prompts
An LLM is only as good as the prompts you give it, no matter how sophisticated the tech. The same ‘garbage in, garbage out’ rule applies to the use of synthetic data models in market research and media measurement.
“Every bit of synthetic is holding a mirror up against your data set,” says Estave. “Whatever you put into it is what’s going to reflect back. The power of the prompt is incredibly important because, without specific parameters, a model might just start making things up.”
“Every bit of synthetic is holding a mirror up against your data set, whatever you put into it is what’s going to reflect back. The power of the prompt is incredibly important because, without specific parameters, a model might just start making things up.”
Patrick Estave
Director of Data Innovation, Cint
In the world of LLMs, precision is the antidote to the hallucinations that any LLM user will know and loathe. Without strict guardrails in place, models can default to broad training sets rather than the specifics of an individual, and potentially niche, use case.
“When you key it into your data set or a specific archetype of a person, the synthetic model can then start to mold and shift into something that represents that person. That initial prompt is going to be really important as well.”
For further information on how seed data can curtail hallucinations in synthetic data models, read an interview with Cint data scientist Imran Anjam here.
Synthetic models as a driver for continuous research
Cint Managing Director Ben Hogg describes continuous research as “the unsung hero of an uncertain world.” After all, treating research as an ongoing process rather than as a one-and-done solution allows business to better navigate financial uncertainty and instability.
“One of the things that surprised me the most as I’ve been having more and more conversations with customers, with other companies, is how much data is just out there,” Estave says. “There’s so much out there and there’s so much that is just used once.”
When synthetic is used to augment human responses as part of a continuous research process, you are, as Estave puts it, “able to take multiple data sets that share some commonality, on the demographics, the people, the types of questions, and you’re able to tie all that together. That is where the next generation of research is going to be.”
Connect with Cint
Want to know more about how you can unlock additional value from your research data?

























































































