Our client in the automotive space built a mobile appraisal app for dealers. A key requirement of the app was to compare values from various car-valuation books including Edmunds, Kelly Blue Book, Black Book and Galves. Unfortunately each of the car-valuation books uses a slightly different “language” to describe cars.
Finding the value of a given used car is a translation problem from one language to another. This translation is needed because the best valuation is based on a scheme that takes into account the valuations from all the major sources. The client spent months building complicated, hardcoded rules to try to solve this challenge.
We used an ensemble of Natural Language Processing (NLP) models to match identical but differently described vehicles in multiple used-car valuation books.
Using electronic versions of the 4 common automobile-valuation books, we created a dataset of car descriptions based on the following features:
This dataset is tantamount to having the same automobiles described in 4 different languages. We used multiple language translation models and chose the best one for the task at hand by measuring and validating its performance against domain experts -- those in the car industry who are intimately familiar with the way in which the same automobile.
Within a few weeks we trained models that automatically classified a majority of the matches between books. Our client was able to eliminate a substantial amount of their hard-coded rules and only had to maintain rules to handle exceptional cases.