Mouth Detection and Tracking References


Mouth Detection and Tracking Based on Boosted Cascaded Classifier

The functions described in this section integrate the boosted cascade classifiers [Lienhart, 02] into a general framework for mouth detection and tracking. As shown in Fig. 1, the kernel of the framework is a finite state machine that consists of two states: detection and tracking. The system starts with multi-scale face detection using a boosted cascade classifier based on Haar-like features [Lienhart, 02]. Then, two cascade classifiers one for mouth and the other for mouth-with-beard, locate the mouth within the lower region of the face. If the mouth is detected successfully in several consecutive frames, the state machine enters the tracking state.

In tracking state the mouth detection algorithm is applied to a small region around the predicted mouth location from the previous frames. The center of the search region is estimated using a linear Kalman filter [Cordea, 01].

The mouth locations over time are smoothed and the outliers are rejected by a three-stage post processing module. First, a linear interpolation is employed to fill in the gaps in trajectory caused by detection failures. Then, a median filter eliminates incorrect detections. At last a Gaussian filter is utilized to suppress the jitter in the trajectory.

 

Fig.1: The mouth detection and tracking. The tree classifier represents the mouth detector.


Mouth Detection and Tracking Structures

In order to support the mouth detection and tracking functions, structure CvMouthTracker is defined to represent the mouth tracker in which contains the cascaded classifiers, tracking status, buffers and counters. The data will be filled by function InitMouthTracker:

typedef struct _CvMouthTracker

{

            // Down sample image and it's param

            float     DownSmpRate;

            IplImage *iplDownSmpImage;

           

            // Face detector

            CvHidHaarClassifierCascade *hidCascadeFace;

           

            // Mouth detector

            CvHidHaarClassifierCascade            *hidCascadeMouth0;

            CvHidHaarClassifierCascade            *hidCascadeMouth1;

            CvMemStorage *StorageDual0;

            CvMemStorage *StorageDual1;

            int        MouthWndWidth;

 

            // for tracking state

            CvTrackState TrackState;

            int InitialCounter;

            int ErrorCounter;

 

            // for AvsrKalmanFilter

            CvKalman *KalmanXY;

            int   KinematicOrder;

            CvMat MeasureVectXY;

            float PrevX, PrevY;

            float PrevVx, PrevVy;

            int iskalmanCreated;

           

            // TrackPostprocess for tracking

            int        RltPoolLen;

            int        RltPoolHeader;

            int        isRltPoolFull;

            CvMouthLocateResult ResultPool[CV_MOUTH_TRAJECTORY_LIST_SIZE];

}

CvMouthTracker;

Below is the description of the CvMouthTracker fields:

DownSmpRate            Resample rate of the downsampled image for face detection.

iplDownSmpImage      Pointer to the downsampled image for face detection

hidCascadeFace         Pointer to the cascaded classifier for face detection

hidCascadeMouth0     Pointer to the specified cascaded classifier for mouth-only / mouth-with-beard pattern

hidCascadeMouth1     Pointer to the specified cascaded classifier for mouth-with-beard / mouth-only pattern

StorageDual0              Buffer for mouth detection

StorageDual1              Buffer for mouth detection

MouthWndWidth         Window width of the cascaded classifier for mouth detection

TrackState                  The state of the finite state machine for mouth tracking, would be the FACEMOUTHDETECTION or MOUTHTRACKING of the enum type CvTrackState

InitialCounter             Counter to control the state transfer from detection to tracking

ErrorCounter              Counter to control the state transfer from tracking to detection

KalmanXY                   Pointer to the structure CvKalman for Kalman filter

KinematicOrder          Indicate the order of the kinematic model used in Kalman filter

MeasureVectXY           Matrix to store the measurement data for the Kalman filter

PrevX, PrevY              Buffer to record the location of mouth in the previous frame

PrevVx, PrevVy          Buffer to record the velocity of the mouth in the previous frame

iskalmanCreated         Indicates if the Kalman filter is initialized

RltPoolLen                  Size of the result buffer ResultPool

RltPoolHeader                        Indicates the header of the result buffer ResultPool

isRltPoolFull               Indicates if the result buffer ResultPool is full

ResultPool                   List to temporarily store the mouth detection and tracking result, array of the structure CvMouthLocateResult

For representation of the mouth locating result the following structure is defined:

typedef struct _CvMouthLocateResult

{

            float     CenterX;

            float     CenterY;

            float     Width;

            float     Height;

            float     RotAngle;

            float     Likeness;

            int        face_left;

            int        face_right;

            int        face_top;

            int        face_bottom;

}

CvMouthLocateResult;

Below is the description of the CvMouthLocateResult fields:

CenterX                       X coordination of the mouth region

CenterY                       Y coordination of the mouth region

Width                           Width of the mouth region

Height                         Height of the mouth region

RotAngle                     Rotation angle of the mouth region, always be 0

Likeness                      Likeness of the result mouth region, always be 1.0

face_left                      Left of the face region corresponding to the mouth

face_right                    Right of the face region corresponding to the mouth

face_top                      Top of the face region corresponding to the mouth

face_bottom                Bottom of the face region corresponding to the mouth

For representation of the mouth detection and tracking result the following structure is defined:

typedef struct _CvTrackResult

{

            CvTrackState TrackState;

            CvMouthLocateResult MouthResult;

            CvTrackRltSts  TrackRltSts;

}

CvTrackResult;

Below is the description of the CvTrackResult fields:

TrackState                   Indicates the state of the finite state machine for mouth tracking, would be FACEMOUTHDETECTION or MOUTHTRACKING of the enum type CvTrackState

MouthResult                 Mouth location result

TrackRltSts                      Indicates the result of the mouth detection and tracking, would be TRACKSUCCESSFUL, FACEDETECERROR, MOUTHDETECTERROR or MOUTHTRACKERROR of the enum type CvTrackRltSts


HaarDetectObjectR

Detect objects with limited scales in image using the cascade of boosted classifier based on Haar-like features

cvHaarDetectObjectsR( const IplImage* img, CvHidHaarClassifierCascade* cascade, 
                      CvMemStorage* storage, double scale_factor, int min_neighbors, 
                      int flags, int min_width, int max_width );

 

img

Input Ipl image.

cascade

Haar classifier cascade in internal representation.

storage

Memory storage to store the resultant sequence of the object candidate rectangles.

scale_factor

The factor by which the search window is scaled between the subsequent scans, for example, 1.1 means increasing window by 10%.

min_neighbors

Minimum number (minus 1) of neighbor rectangles that makes up an object. All the groups of a smaller number of rectangles than min_neighbors-1 are rejected. If min_neighbors is 0, the function does not any grouping at all and returns all the detected candidate rectangles, which may be useful if the user wants to apply a customized grouping procedure.

flags

Mode of operation. Currently the only flag that may be specified is CV_HAAR_DO_CANNY_PRUNING. If it is set, the function uses Canny edge detector to reject some image regions that contain too few or too much edges and thus can not contain the searched object. The particular threshold values are tuned for face detection and in this case the pruning speeds up the processing.  

min_width

Minimum width of the object to detect.

max_width

Maximum width of the object to detect.

The function cvHaarDetectObjectsR finds rectangular regions in the given image that are likely to contain objects the cascade has been trained for and returns those regions as a sequence of rectangles. CvHaarDetectObjectsR has the same functionality as cvHaarDetectObjects from OpenCV except that it restricts the width of the detected object to the range [min_width, max_width].


InitTrackMouth

Initialize the mouth tracker

cvInitTrackMouth( CvHidHaarClassifierCascade *hidCascadeFace, 
                     CvHidHaarClassifierCascade *hidCascadeMouth0,
                     CvHidHaarClassifierCascade *hidCascadeMouth1, CvMouthTracker *MouthTracker);
 

hidCascadeFace

Input the pointer of boosted cascaded classifier for face detection.

hidCascadeMouth0

Input the pointer of boosted cascaded classifier for mouth without beard.

hidCascadeMouth1

Input the pointer of boosted cascaded classifier for mouth with beard.

MouthTracker

Output the data structure that contains the pointers of mouth tracking classifiers, status and records.

The function cvInitTrackMouth initialize the data structure MouthTracker that contains the pointers of classifiers, tracking status and records used in mouth detection and tracking. The input arguments hidCascadeFace, hidCascadeMouth0 and hidCascadeMouth1 must be initialized by function cvLoadHaarClassifierCascade and function cvCreateHidHaarClassifierCascade in OpenCV before calling this function. The structure MouthTracker must allocated before using this function.


ResetTrackMouth

Reset the mouth tracker before processing a new input video squence

cvResetTrackMouth( const int imagewidth, const int imageheight, CvMouthTracker *MouthTracker);
 

imagewidth

Input image width of the video sequence to process.

imageheight

Input image height of the video sequence to process.

MouthTracker

Input the data structure initialized by function cvInitMouthTracker. Output the updated data structure.

The function cvResetTrackMouth updates the data structure MouthTracker according to the image size of the video sequence, and resets the Kalman filter and the finite state machine for mouth tracking. The data structure MouthTracker must be initialized by function cvInitTrackMouth before calling this function.


UpdateTrackMouth

Detect or track mouth in one image frame

cvUpdateTrackMouth (IplImage * img, CvMouthTracker* MouthTracker, CvTrackResult & result); 
 

img

Input 8-bit grayscale or 24-bit true color image.

MouthTracker

Input the data structure CvMouthTracker that contains the pointers of classifiers, tracking status and records. Output the updated one.

result

Output the data structure CvTrackResult that contains the mouth tracking result.

The function cvUpdateTrackMouth detects or tracks the mouth region in one give image, and updates the data structure CvTrackResult according to the detection/tracking result and the previous status of the finite state machine.


ReleaseTrackMouth

Release the classifiers, buffers and Kalman filter used in mouth detection and tracking.

cvReleaseTrackMouth ( CvMouthTracker* MouthTracker); 
 

MouthTracker

Input the data structure CvMouthTracker that contains the pointers of classifiers, tracking status and records.

The function cvReleaseTrackMouth releases three boosted cascade classifiers, the buffers and the Kalman filter used in mouth detection and tracking. The data structure CvMouthTracker itself will not be released.


PostProcessTrackMouth

Refine the trajectory of the mouth region after the mouth detection and tracking.

cvPostProcessTrackMouth( CvMouthLocateResult * TrackList, const int framenum ); 
 

TrackList

Input the list of mouth detection and tracking results. Output the updated list.

framenum

Input the number of results in TrackList.

The function cvPostProcessTrackMouth refine the trajectory of the mouth region using the linear interpolation, median filter and Gaussian filter.

 

Other Image Processing Functions


EqualizeHist

Apply histogram equalization to the image.

cvEqualizeHist(const IplImage *img); 
 

img

The input and output 8-bit grayscale image.

The function cvEqualizeHist applies the histogram equalization to the image img. The ROI of the image will be ignored.


NormalizeIllum

Implement the illumination gradient correction of the image.

cvNormalizeIllum (const IplImage *img, const float destMean, const float destMse); 
 

img

The input and output 8-bit grayscale image.

destMean

Input the given value of the means of destination image.

destMse

Input the given value of the mean square error of destination image.

The function cvNormalizeIllum implements the illumination gradient correction [Rowley, 98] and [Sung, 98] to reduce heavy shadows in the image due to extreme lighting angles. The function adjusts the mean and mean square error of the image img according to destMean and destMse parameters. The ROI of the image is ignored.

Let the brightness values of the pixels in the source image img be , , and let the parameters of the brightness plane that best fits the all  be . Then,

                        (1)

and using the Lagrange method, the parameters can be obtained as:

,                                                                                 (2)

where, ,                             (3)

Then the image is corrected by subtracting this plane:

                            (4)

In most of the cases, will exceed the valid range of 8-bit pixel value (0~255). Therefore a standardization operation called MSE normalization is used to adjust the mean and mean square error  of the illumination corrected image given in the function parameters. The pixel values of the normalized destination image are obtained as

                                                     (5)

where are the mean and mean square error of the destination image as described by parameters destMean and destMse.

 

References

[Cordea, 01] M. D. Cordea, E. M. Petriu, N. D. Georganas, et al, Real-time 2(1/2)-D head pose recovery for model-based video-coding, IEEE Trans. on Instrumentation and Measurement, 50(4): pp. 1007-1013, 2001

[Lienhart, 02] R. Lienhart, J. Maydt, An extended set of Haar-like features for rapid objection detection, IEEE ICIP, pp.900~903, 2002

[Rowley, 98] Rowley, H.A.; Baluja, S.; Kanade, T. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 20 Issue: 1 , Jan. 1998 Page(s): 23-38

[Sung, 98] Sung, K.-K.; Poggio, T. Example-based learning for view-based human face detection, Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 20 Issue: 1 , Jan. 1998 Page(s): 39-51