The Cold Start Problem
High-quality data is the most scarce and essential ingredient for AI success.
One of the main reasons why large language models like ChatGPT perform so well is that they were trained on massive amounts of data (e.g., the internet) and improved with human feedback.
That said, one of the not so obvious things I learned shortly after starting Synaptiq is how often you will hit a “cold start problem.”
Take, for example, building an AI recommendation engine for an e-commerce platform. There are usually two main ways to approach it:
-
Recommending similar products based on others you’ve purchased in the past
-
Recommending products based on your profile and what others with similar profiles purchased
In the first approach, as long as you have a past purchase history and a well-defined product catalog, an AI model is able to make reasonable recommendations. In the second model, as long as you have a profile and there are others with similar profiles that have purchased products in the past, recommendations are possible.
But, what if you haven’t purchased products in the past, you don’t have a profile yet, or there aren’t others with similar profiles that have purchased products?
This is a classic cold start problem, and it’s a common challenge across a wide range of AI applications like:
-
Healthcare diagnostic tools that require diverse patient data across conditions
-
Financial fraud detection that requires examples of legitimate and fraudulent transactions
-
Customer service chatBots that require historical conversation logs to provide relevant answers to questions
To overcome a cold start problem, there are three creative options :
-
Data sourcing
-
Product design and user experience
-
Go-to-market