Camshift -- initial results and discussion

The images below show a few frames of tracking results from each of three test runs with my Camshift implementation.

Run 1: moving slowly toward and away from the camera.
The Camshift algorithm includes an adaptive resizing step. It assumes a predefined proportion between width and height and resizes each dimension proportional to the square root of the skin-probability area inside its search region. This resizing worked fairly well, as the frames from Run 1 show. Camshift sized the region in the lower, center frame of this run a bit too small.

 

Run 2: swaying side to side.
In Run 2, I deliberately sized and positioned the initial region poorly. Within 5 frames, the tracker adapted automatically to capture the face region well (second image).

The left and right frames in the bottom row of this run illustrate a limitation of a naive Mean Shift tracker. This algorithm always converges to the nearest mode. It doesn't look beyond that mode's location to see if there's a better match nearby.

 

Run 3: approaching from a distance.
In Run 3, I tried to test resizing over a broader extreme, but after about 50 frames, the tracker started sliding away towards a region of warm lighting in the background. Interestingly, if I run this video sequence in reverse, the tracker does very well. I believe that's because the search region for each frame is larger when the face image recedes than when it approaches. When the search region is too small, it drifts away more easily. Bradski mentions that as an issue in his algorithm article.

 

Home | Face Tracking