A Real-Time Deformable Detector

Overview

Our framework mixes three types of edge counting features. Every row shows an example feature from each type along with its extractions for three samples: an open hand, the same hand where the thumb has moved and a rotated version of this case. The example features are shown on the left column: the solid box shows the support of the feature while the solid line within shows the extracted edge orientation. The dashed box shows the area in the image from which the pose estimate is computed, here the dominant edge orientation. This area is also highlighted in every sample by the bolded outline of the hand. Note how the first feature effectively tracks the hand's backside, the second feature tracks the thumb while the third tracks the forefinger.Complete freedom is given to the learning procedure to select a pose estimator and pose-indexed feature.

framework

Qualitative Results

Hand Video Sequences
Typical results obtained with our framework for hand detection. In these sequences, green squares indicate a correct detection whereas red squares indicate a false alarms. Note how our method is robust to strong changes in the apperance of the hand despite the fact that these changes were not annotated in the training sequence. Detetion here proceeds frame by frame, independently, without background substraction. We expect that adding temporal cohesion to significantly improve results.


Aerial Image (Google Earth) of Cars
Typical results obtained with our framework for car detection. In these images, obtained from Google Earth over Geneva, green squares indicate a correct detection whereas red squares indicate a false alarm. The training was also done from Google Earth Images over a different city. Note how our method is able to detect cars in any orientation despite the fact that training data was not annotated for orientation: our framework is able to learn the pose variations present in the training data, adapt to them and provide with reliable detection.

Face Detection on MIT-CMU
Typical results obtained with our framework for face detection. In these images, obtained from MIT-CMU, green squares indicate a correct detection whereas red squares indicate a false alarm.

Quantitative Results

We compared our framework with the state-of-the art in object detection. Both methods have access to the same ground truth for training, namely data that is only annotated for location but not for additional pose parameters such as deformations (hands), in-plane rotations (cars) and rigid rotations (faces). The Receiver Operating Characteristics are shown below.

hands_roc cars_roc faces_roc

Links

K. Ali, F. Fleuret, David Hasler and P.Fua
IEEE Transactions on Pattern Analysis and Machine Inteligence, 2012
K. Ali, F. Fleuret, David Hasler and P.Fua
IEEE International Conference on Computer Vision, 2009