The Cold Start Problem

Written by Stephen Sklarew | Apr 29, 2025 4:22:05 PM

By far the biggest lesson I’ve learned since getting involved in AI is that quality data is the limited resource critical for AI success.

One of the main reasons why large language models like ChatGPT are so successful is that they were trained on massive amounts of data (e.g., the internet) and improved with human feedback.

That said, one of the not so obvious things I learned shortly after starting Synaptiq is that oftentimes you will hit a “cold start problem”.

Let’s take the example of building an AI recommendation engine for an e-commerce site. There are typically two approaches:

Recommending similar products based on others you’ve purchased in the past
Recommending products based on your profile and what others with similar profiles purchased

Two common approaches to build AI recommenders.

In the first approach, so long as you have a past purchase history and a well-defined product catalog, an AI model can start making reasonable recommendations. In the second model, so long as you have a profile and there are others with similar profiles that have purchased products in the past, recommendations are possible.

But, what if you haven’t purchased products in the past, you don’t have a profile yet, or there aren’t others with similar profiles that have purchased products?

This is a classic cold start problem, and it’s a common challenge across a wide range of AI applications like:

Healthcare diagnostic tools that require diverse patient data across conditions
Financial fraud detection that requires examples of legitimate and fraudulent transactions
Customer service chatBots that require historical conversation logs to provide relevant answers to questions

There are basically three creative options to overcome a cold start problem:

Data sourcing
Product design and user experience
Go-to-market

Data sourcing

If you don’t have the data you need on hand, there are a handful of options with varying levels of effort and cost.

Acquire the data somewhere else - look for free, publicly available datasets or data you can purchase through a vendor. You may even be able to set up a data partnership where another company provides you the data you need to solve your cold start problem, while you share the data you create with them.
Have your data scientists or machine learning engineers search for existing models that are pre-trained in adjacent domains. They may be able to apply “transfer learning” to bootstrap your AI model.
Hire people to generate an initial training dataset - you may be able to augment their work with the help of large language models.
Talk to your domain experts and data scientists about generating representative synthetic data.

To dive deeper into this topic read How Much Data Do We Need blog written by my cofounder, Dr. Tim Oates.

Product design & user experience

When you have a cold start problem, it’s important to think carefully about your product’s user experience. There are smart ways to design an experience that help you overcome the cold start problem while engaging your early adopters.

Here are few suggestions:

Set appropriate expectations on how the system will behave while your AI models are being trained
Don’t present AI model outputs until certain data is collected and, instead, ensure the experience provides value in non-AI ways (e.g., rule-based before AI)
Incentivize users that contribute data to train your AI model
Gradually and carefully expose AI model outputs as its value increases

The key here is to invest in the upfront product design for a cold start situation and iteratively improve the experience, or you may never get out of the “chicken and egg problem.

Go-To-Market

Finally, think carefully about your rollout strategy, pricing strategy, and business case expectations. For instance, it may be best to start with a small segment of users in a pilot before generating awareness. Likewise, your pricing may need to be low until your AI models start generating value. And, whatever you do, don’t overset expectations on any sort of ROI dates until your product has spent time in users’ hands.

For those of you that are in sensitive information or knowledge worker organizations (e.g., healthcare, legal, finances, professional services, etc.), it’s also best practice to pilot your AI models internally first.

Real life examples

We have run into many cold start problems over the last 10 years. But there are two that stand out and are easy to explain.

The first was a project we worked on as a subcontractor for the federal government early in our journey. Back then, the federal government employed a lot of contractors to build cloud applications and struggled to manage all the costs for cloud resources. The big cloud providers didn’t have any automated tools to help the government optimize its cloud resources.

We built a system that monitored cloud resource consumption and optimized it against its expected quality of service. That meant we needed a lot of cloud consumption data to prove we had a viable approach which, unfortunately, the government wasn’t going to give us direct access to. So, we had a cold start problem. To overcome this challenge, we generated simulated data in our isolated environment, tuned the model until it met expectations, then gave it to the government to deploy in their secure environment.

You can read more about it in our published research paper, Automated Cloud Provisioning on AWS using Deep Reinforcement Learning.

Shortly after, we worked for a company that sells custom curriculums of training courses to businesses. When we met them, they had realized that their sales and customer success team wasn't going to scale effectively if every sale required manual human curation. So we built an AI recommender for them fueled by their historical data.

Everything was going great until we learned that they were selling into a wide range of customers with diverse profiles. There was a high likelihood that a prospective customer wouldn’t be similar to any active customers. In this cold start situation, we worked with the client to purchase company profile data so that our AI model would work if the prospective customer’s profile wasn’t already in the system.

This company also rolled out our model as a sales and customer success support tool initially, then expanded it into an active customer recommendation system.

Conclusion

A cold start problem is a common hurdle when launching AI solutions, especially for machine learning models that rely on robust datasets to function effectively. Without adequate initial data, these systems often struggle with accuracy and performance issues.

Fortunately, a multifaceted approach can help organizations navigate this challenge. By tapping into alternative data sources—whether public repositories, adjacent domain data, or synthetically generated information—companies can build a foundational dataset. Thoughtful product design that incorporates strategic data collection mechanisms and expert human oversight further strengthens the solution. Creating intuitive user experiences that naturally encourage data sharing also accelerates the learning curve.

From a strategic perspective, targeting early adopters and focusing on applications that deliver value even with limited data helps establish momentum. As users engage with the system, the expanding dataset fuels continuous improvement in the AI's capabilities.

With creativity and pragmatism, businesses can successfully implement AI solutions that evolve and mature alongside their growing data resources, ultimately delivering increasingly powerful results over time.

About Synaptiq

Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation.

You can learn more about our story through our past projects, blog, or podcast.

View full post