Face Tracking
The application context for face tracking is a mobile robot that interacts socially with users.
Robust tracking methods have several potential uses in the context of a mobile robot.
I'm initially interested in interacting with users, keeping a user's face well-centered
in the video frame, and in doing offline learning that depends on well-localized face tracking.
Summary
The specific tracking goals for this project are
- Robustly follow a user's face in real time, even when the robot (and therefore, the camera) is moving
- Reliably follow a face in a short video clip to extract face samples for learning
I'm currently using two tracking methods: 1) Bayesian Mean Shift for
real-time tracking, and 2) Birchfield's head tracking method for
offline processing to learn a face model.
Project History and Status
Project Proposal
I initiated this project in the context of Serge Belongie's Computer Vision
Project course, CSE190a, at UCSD, in Spring 2006. Serge had us apply for his
course by submitting
a proposal the quarter before the class started. My project proposal is
here.
Literature Survey
This project began with a literature survey of existing tracking methods.
The results of this survey are summarized
here. Because of time limitations, the literature survey
isn't intended to be exhaustive.
Camshift
OpenCV's face tracker,
Camshift, is a Mean Shift tracker.
I implemented and evaluated this method first, since it's
commonly used. It runs easily
in real time, but I wasn't satisfied with its reliability. Here are my Camshift notes:
Birchfield's Tracker
Since I'd found Camshift too unreliable, Serge suggested looking at
Birchfield's tracking method.
My notes on Birchfield's method are
here.
I found this method to be more reliable than Camshift, but not quite fast enough for real-time
face tracking.
Birchfield's metric for tracking quality includes an ellipse-matching term.
I found that this
term sometimes caused the tracker to wander away from the face when a user was
in front of a background (such
as a bookcase) that includes many vertical edges.
In fairness to this method, however, I should point out that Birchfield intended
it to be used as a head tracker, not as a face tracker. For it to work as designed,
the user must first offer a 3/4 profile view so that both hair and face are well represented
in the color model. Requiring users to pose like this is not appropriate for my application context,
so I instead initialize it with a frontal face view.
Update note, June 2006: I recently made two changes to the Birchfield tracker
that seem to have noticeably improved its robustness. First, the color
histogram is now a joint histogram of hue and saturation -- the same histogram as
Camshift uses. Second, Instead of computing similarity using histogram intersection,
I'm now computing it as
Bhattacharyya similarity.
I've also made the implementation much faster by precomputing ellipse pixels.
Bayesian Mean Shift
To meet the goal of tracking a user's face in real time, even when the background is
cluttered and the robot's camera is in motion, I modified the Camshift method to
compute the face probability of pixels in a more
principled way, using Bayes rule. This approach, explained in detail
here,
takes the background distribution into account explicitly. During tracking,
the face histogram is drawn from the initial input region. The background
histogram, in contrast, is updated each frame. The background histogram for each frame
is based on a small region exterior to the tracked face region in the preceding
frame. There's a detailed explanation of this algorithm in the
presentation
I prepared for CSE190a.
Current Status
The Bayesian Mean Shift tracker performs well for real-time face tracking
provided scale stays roughly constant. I found it usually recovers readily from small
localization errors. This behavior is a benefit for real-time interactions. But it's
actually a drawback during offline learning.
Birchfield's method, in contrast, while too slow for real-time social interactions,
and subject to drift under certain conditions, is nevertheless a good option
for use during offline learning. When it does drift, it doesn't easily recover.
This behavior is a benefit during offline learning, since complete tracker failure
is more detectable than small tracking errors that correct themselves.
Future Directions
I started off intending to either find and implement, or else design,
a single face-tracking method. Initially, I focussed only on real-time
methods, and made real-time performance a requirement. I've since realized
I might be better off using multiple tracking methods, and choosing
the method best suited for each separate task.
With this perspective in mind, future directions include
1) improving the Bayesian
Mean Shift tracker for real-time interactions by
adding scale adaptation and
2) optimizing Birchfield's method for use during offline learning.
All Project Links
- Proposal document
- Existing tracking methods
- Bayesian Mean Shift Tracker (my real-time face-tracking method)