Selling and Delivering Data Science and Machine Learning Projects
The other day a friend asked me, "How do you sell and successfully deliver data science / machine learning?" At first I thought it was a simple question; but, as we got into it, I realized there are several key nuances I've learned that greatly contrast with the typical software services model.
First and foremost, I've learned that you're not selling 1s and 0s. You're selling probabilities. In the typical software services project, your primary goal is to properly scope the solution. While there are many approaches to do so, they all attempt to paint a picture of the future through some sort of methodology (agile, waterfall, etc.) by working with domain experts (humans). The goal of the delivery team is to fill in the details with 1s and 0s and produce a discrete solution that can be tested against the requirements for completeness. This is an engineering oriented model.
“Prediction is hard. Especially about the future.” - Yogi Berra
For pure data science and machine learning projects, you're not selling engineering, you're selling science. For those of us that came from science backgrounds and studied things like statistics, we learned the world is messy and full of probabilities everywhere. Drawing out requirements for these types of projects is very different. While humans play a role to set a vision, it all comes down to data. In other words, you don't know what's possible until you explore the available data. What makes this tricky is that it's hard to get data in the sales process and even harder to explore what's possible without spending money. This dynamic leads to the next big difference.
In the sales process for data science and machine learning, there is a lot more education than software services. Many buyers expect staggering success given what they’ve read in the media -- and, they aren't used to talking about probabilities. When they hired other software services firms, there wasn't a big dependency on exploring data before locking in scope, timelines and cost. For this reason, in the sales process, it's critical to educate prospective clients about how and why these types of projects are different. Explaining how this plays out with consumer services they use today is a useful approach (e.g., Amazon's recommendations).
In the delivery process for data science and machine learning, there is a strong need to manage expectations. Like Agile software development, you don't know exactly what you'll end up with until you're done. Resolution becomes clearer as the project progresses. But, unlike Agile, the output of your solution is a probability, e.g., "How likely is it that this is an image of a cucumber instead of a squash?" All clients are used to 1s and 0s. So if the best you can do with the given data is 80%, then how does your client feel about that? These probabilities often have many downstream impacts on your clients ' customers. So, it's a tough subject to broach but critical to align on early in a project.
Finally, given the nature of probabilities and the inherent dependency on data, every data science or machine learning effort relies on data strategy. Whether you scope data strategy into your effort or not, you can't do these projects in a vacuum. As the scientists are identifying what's possible; someone needs to work with the client to refine the data strategy and how it connects to business outcomes. This is very difficult to do offshore (especially for a new client) because there needs to be meaningful interactions where, whether you like it or not, context and culture are paramount.