Adrian Rosebrock, Tim Oates, Jesus Caban
Abstract. Constructing an image classification system using strong, local invariant descriptors is time-consuming and tedious, requiring many experimentations and parameter tunings to obtain an adequately performing model. Furthermore, training a system in a given domain and then migrating the model to a different domain will likely yield poor performance. As the investigation into the recent Boston Marathon attacks demonstrated, large, unstructured image databases from traffic cameras, security systems, law enforcement officials, and citizens can be beneficial to authorities when amassed for review. However, reviewing every image is an expensive undertaking in terms of time and human intervention.
Reviewing crime scene images is inherently a classification task. For example, authorities may want to know if a given image contains a suspect, a suspicious package, or injured people. Given an emergency situation, these classifications must be generated as quickly and accurately as possible. In this case study, we present a rapidly deployable image classification system using “feature-views”, in which each view consists of a set of weak, global features. These weak global descriptors are computationally simple to extract, intuitive to understand, and require substantially less parameter tuning than their local invariant counterparts. We demonstrate that we can outperform current state-of-the-art methods or achieve comparable accuracy with much less effort and domain knowledge by combining weak features with ensemble methods. Additionally, we provide both theoretical and empirical justification for our ensemble framework that can be used to construct rapidly deployable image classification systems called “Ecosembles”.
In an emergency where image classification systems are needed, time is crucial. The current state-of-the-art technique for constructing an image classification system using local invariant descriptors is time-consuming and tedious, requiring many experimentations and parameter tunings to obtain an adequate performing model. The typical pipeline for constructing such a classification model consists of:
- Extracting critical points from a given image
- Describing each key point in a manner invariant to location, translation, scaling, illumination, and viewpoint
- Constructing a vocabulary of visual words via clustering and vector quantization
- Identifying visual stop words
- Pre-processing visual words (e.g. scaling, normalization, dimensionality reduction)
- Selecting a machine learning model and training the model on the processed vocabulary
Each of these steps requires a non-trivial amount of parameter tunings, which can only be determined through experimentations and validation.
In this case study, we present a rapidly deployable image classification system using “feature-views,” where each view consists of weak global descriptors. Using weak features allows us to skip steps such as keypoint detection, codebook construction, vector quantization, and many other pre-processing steps. This, in turn, makes our system quicker and easier to deploy, requiring substantially less effort and domain knowledge. Even given the know limitations of weak features, we can obtain equal or comparable accuracy to the current state-of-the-art techniques…
Complete technical paper available as a PDF.