Face Tracking

The application context for face tracking is a mobile robot that interacts socially with users.

Robust tracking methods have several potential uses in the context of a mobile robot. I'm initially interested in interacting with users, keeping a user's face well-centered in the video frame, and in doing offline learning that depends on well-localized face tracking.

 

Summary

The specific tracking goals for this project are

  1. Robustly follow a user's face in real time, even when the robot (and therefore, the camera) is moving
  2. Reliably follow a face in a short video clip to extract face samples for learning
I'm currently using two tracking methods: 1) Bayesian Mean Shift for real-time tracking, and 2) Birchfield's head tracking method for offline processing to learn a face model.

 

Project History and Status

Project Proposal
I initiated this project in the context of Serge Belongie's Computer Vision Project course, CSE190a, at UCSD, in Spring 2006. Serge had us apply for his course by submitting a proposal the quarter before the class started. My project proposal is here.

Literature Survey
This project began with a literature survey of existing tracking methods. The results of this survey are summarized here. Because of time limitations, the literature survey isn't intended to be exhaustive.

Camshift
OpenCV's face tracker, Camshift, is a Mean Shift tracker. I implemented and evaluated this method first, since it's commonly used. It runs easily in real time, but I wasn't satisfied with its reliability. Here are my Camshift notes:

Birchfield's Tracker
Since I'd found Camshift too unreliable, Serge suggested looking at Birchfield's tracking method. My notes on Birchfield's method are
here.

I found this method to be more reliable than Camshift, but not quite fast enough for real-time face tracking.

Birchfield's metric for tracking quality includes an ellipse-matching term. I found that this term sometimes caused the tracker to wander away from the face when a user was in front of a background (such as a bookcase) that includes many vertical edges.

In fairness to this method, however, I should point out that Birchfield intended it to be used as a head tracker, not as a face tracker. For it to work as designed, the user must first offer a 3/4 profile view so that both hair and face are well represented in the color model. Requiring users to pose like this is not appropriate for my application context, so I instead initialize it with a frontal face view.

Update note, June 2006: I recently made two changes to the Birchfield tracker that seem to have noticeably improved its robustness. First, the color histogram is now a joint histogram of hue and saturation -- the same histogram as Camshift uses. Second, Instead of computing similarity using histogram intersection, I'm now computing it as Bhattacharyya similarity. I've also made the implementation much faster by precomputing ellipse pixels.

Bayesian Mean Shift
To meet the goal of tracking a user's face in real time, even when the background is cluttered and the robot's camera is in motion, I modified the Camshift method to compute the face probability of pixels in a more principled way, using Bayes rule. This approach, explained in detail
here, takes the background distribution into account explicitly. During tracking, the face histogram is drawn from the initial input region. The background histogram, in contrast, is updated each frame. The background histogram for each frame is based on a small region exterior to the tracked face region in the preceding frame. There's a detailed explanation of this algorithm in the presentation I prepared for CSE190a.

Current Status
The Bayesian Mean Shift tracker performs well for real-time face tracking provided scale stays roughly constant. I found it usually recovers readily from small localization errors. This behavior is a benefit for real-time interactions. But it's actually a drawback during offline learning.

Birchfield's method, in contrast, while too slow for real-time social interactions, and subject to drift under certain conditions, is nevertheless a good option for use during offline learning. When it does drift, it doesn't easily recover. This behavior is a benefit during offline learning, since complete tracker failure is more detectable than small tracking errors that correct themselves.

Future Directions
I started off intending to either find and implement, or else design, a single face-tracking method. Initially, I focussed only on real-time methods, and made real-time performance a requirement. I've since realized I might be better off using multiple tracking methods, and choosing the method best suited for each separate task.

With this perspective in mind, future directions include 1) improving the Bayesian Mean Shift tracker for real-time interactions by adding scale adaptation and 2) optimizing Birchfield's method for use during offline learning.

 

All Project Links

 

Home | Research | Computer Vision