Adrian Rosebrock, Tim Oates, Jesus Caban
Abstract—Constructing an image classification system using strong, local invariant descriptors is both time consuming and tedious, requiring many experimentations and parameter tunings to obtain an adequately performing model. Furthermore training a system in a given domain and then migrating the model to a separate domain will likely yield poor performance. As the recent Boston Marathon attacks demonstrated, large, unstructured image databases from traffic cameras, security systems, law enforcement officials, and citizens can be quickly amassed for authorities to review; however, reviewing each and every image is a expensive undertaking, in terms of both time and human intervention. Inherently, reviewing crime scene images is a classification task. For example, authorities may want to know if a given image contains a suspect, a suspicious package, or if there are injured people in the photo. Given an emergency situation, these classifications will be needed as quickly and accurately as possible. In this work we present a rapidly deployable image classification system using “feature-views”, which each view consists of a set of weak, global features. These weak global descriptors are computationally simple to extract, intuitive to understand, and require substantially less parameter tuning than their local invariant counterparts. We demonstrate that by combining weak features with ensemble methods we are able to outperform current state-of-the-art methods or achieve comparable accuracy with much less effort and domain knowledge. Finally we provide both theoretical and empirical justification for our ensemble framework that can be used to construct rapidly deployable image classification systems called “Ecosembles”.
In an emergency situation where image classification systems are needed, time is crucial. The current state-of-the-art technique for constructing an image classification system using local invariant descriptors is both time consuming and tedious, requiring many experimentations and parameter tunings to obtain an adequate performing model. The typical pipeline for constructing such a classification model includes (1) extracting key points from a given image, (2) describing each key point in a manner invariant to location, translation, scaling, illumination and viewpoint, (3) constructing a vocabulary of visual words via clustering and vector quantization, (4) identifying visual stop words, (5) pre-processing visual words (e.g. scaling, normalization, dimensionality reduction), and finally, (6) selecting a machine learning model and training the model on the processed vocabulary. Each of these steps requires a non-trivial amount of parameter tunings that can only be determined through extensive experimentations and validation.
In this work, we present a rapidly deployable image classification system using “feature-views”, where each view consists of weak global descriptors. Using weak features allows us to skip steps such as keypoint detection, codebook construction, vector quantization, and may other pre-processing steps, making our system easier and faster to deploy, requiring substantially less effort and domain knowledge. Even given the know limitations of weak features we are able to obtain equal or comparable accuracy to the current state-of-the-art techniques…
Complete technical paper available as a PDF.