The functions described in this section integrate the boosted cascade classifiers [Lienhart, 02] into a general framework for mouth detection and tracking. As shown in Fig. 1, the kernel of the framework is a finite state machine that consists of two states: detection and tracking. The system starts with multi-scale face detection using a boosted cascade classifier based on Haar-like features [Lienhart, 02]. Then, two cascade classifiers one for mouth and the other for mouth-with-beard, locate the mouth within the lower region of the face. If the mouth is detected successfully in several consecutive frames, the state machine enters the tracking state.
In tracking state the mouth detection algorithm is applied to a
small region around the predicted mouth location from the previous frames. The
center of the search region is estimated using a linear Kalman filter [Cordea,
01].
The mouth locations over time are smoothed and the outliers are rejected by
a three-stage post processing module. First, a linear interpolation is employed
to fill in the gaps in trajectory caused by detection failures. Then, a median
filter eliminates incorrect detections. At last a Gaussian filter is utilized
to suppress the jitter in the trajectory.
Fig.1: The mouth detection and tracking. The tree classifier represents the mouth detector.
In
order to support the mouth detection and tracking functions, structure CvMouthTracker
is defined to represent the mouth tracker in which contains the cascaded
classifiers, tracking status, buffers and counters. The data will be filled by
function InitMouthTracker:
typedef struct
_CvMouthTracker
{
// Down sample image and it's param
float DownSmpRate;
IplImage *iplDownSmpImage;
// Face detector
CvHidHaarClassifierCascade
*hidCascadeFace;
// Mouth detector
CvHidHaarClassifierCascade *hidCascadeMouth0;
CvHidHaarClassifierCascade *hidCascadeMouth1;
CvMemStorage *StorageDual0;
CvMemStorage *StorageDual1;
int MouthWndWidth;
// for tracking state
CvTrackState TrackState;
int InitialCounter;
int ErrorCounter;
// for AvsrKalmanFilter
CvKalman *KalmanXY;
int KinematicOrder;
CvMat MeasureVectXY;
float PrevX, PrevY;
float PrevVx, PrevVy;
int iskalmanCreated;
// TrackPostprocess for tracking
int RltPoolLen;
int RltPoolHeader;
int isRltPoolFull;
CvMouthLocateResult ResultPool[CV_MOUTH_TRAJECTORY_LIST_SIZE];
}
CvMouthTracker;
Below
is the description of the CvMouthTracker fields:
DownSmpRate Resample rate of the downsampled image for
face detection.
iplDownSmpImage Pointer to the downsampled image for face
detection
hidCascadeFace Pointer to the cascaded classifier for
face detection
hidCascadeMouth0 Pointer to the specified cascaded
classifier for mouth-only / mouth-with-beard pattern
hidCascadeMouth1 Pointer to the specified cascaded
classifier for mouth-with-beard / mouth-only pattern
StorageDual0 Buffer for mouth detection
StorageDual1 Buffer for mouth detection
MouthWndWidth Window width of the cascaded classifier
for mouth detection
TrackState The state of the finite state
machine for mouth tracking, would be the FACEMOUTHDETECTION or MOUTHTRACKING
of the enum type CvTrackState
InitialCounter Counter to control the state
transfer from detection to tracking
ErrorCounter Counter to control the state
transfer from tracking to detection
KalmanXY Pointer
to the structure CvKalman for Kalman filter
KinematicOrder Indicate
the order of the kinematic model used in Kalman filter
MeasureVectXY Matrix
to store the measurement data for the Kalman filter
PrevX, PrevY Buffer to record the location of mouth in the previous
frame
PrevVx, PrevVy Buffer to record the velocity of the mouth in the previous
frame
iskalmanCreated Indicates
if the Kalman filter is initialized
RltPoolLen Size of the result buffer ResultPool
RltPoolHeader Indicates the header of the result
buffer ResultPool
isRltPoolFull Indicates if the result
buffer ResultPool is full
ResultPool List
to temporarily store the mouth detection and tracking result, array of the
structure CvMouthLocateResult
For
representation of the mouth locating result the following structure is defined:
typedef struct
_CvMouthLocateResult
{
float CenterX;
float CenterY;
float Width;
float Height;
float RotAngle;
float Likeness;
int face_left;
int face_right;
int face_top;
int face_bottom;
}
CvMouthLocateResult;
Below
is the description of the CvMouthLocateResult fields:
CenterX X
coordination of the mouth region
CenterY Y
coordination of the mouth region
Width Width
of the mouth region
Height Height
of the mouth region
RotAngle Rotation
angle of the mouth region, always be 0
Likeness Likeness
of the result mouth region, always be 1.0
face_left Left
of the face region corresponding to the mouth
face_right Right
of the face region corresponding to the mouth
face_top Top
of the face region corresponding to the mouth
face_bottom Bottom
of the face region corresponding to the mouth
For
representation of the mouth detection and tracking result the following
structure is defined:
typedef struct
_CvTrackResult
{
CvTrackState TrackState;
CvMouthLocateResult MouthResult;
CvTrackRltSts TrackRltSts;
}
CvTrackResult;
Below
is the description of the CvTrackResult fields:
TrackState Indicates
the state of the finite state
machine for mouth tracking, would be FACEMOUTHDETECTION or MOUTHTRACKING
of the enum type CvTrackState
MouthResult Mouth location result
TrackRltSts Indicates
the result of the mouth detection and tracking, would be TRACKSUCCESSFUL,
FACEDETECERROR, MOUTHDETECTERROR or MOUTHTRACKERROR of the
enum type CvTrackRltSts
Detect objects with
limited scales in image using the cascade of boosted classifier based on
Haar-like features
cvHaarDetectObjectsR( const IplImage* img, CvHidHaarClassifierCascade* cascade,
CvMemStorage* storage, double scale_factor, int min_neighbors,
int flags, int min_width, int max_width );
img
Input Ipl image.
cascade
Haar
classifier cascade in internal representation.
storage
Memory
storage to store the resultant sequence of the object candidate rectangles.
scale_factor
The
factor by which the search window is scaled between the subsequent scans, for
example, 1.1 means increasing window by 10%.
min_neighbors
Minimum
number (minus 1) of neighbor rectangles that makes up an object. All the groups
of a smaller number of rectangles than min_neighbors-1 are rejected. If min_neighbors is 0, the function does not
any grouping at all and returns all the detected candidate rectangles, which
may be useful if the user wants to apply a customized grouping procedure.
flags
Mode
of operation. Currently the only flag that may be specified is CV_HAAR_DO_CANNY_PRUNING. If it is set, the function
uses Canny edge detector to reject some image regions that contain too few or
too much edges and thus can not contain the searched object. The particular
threshold values are tuned for face detection and in this case the pruning
speeds up the processing.
min_width
Minimum width of the object to detect.
max_width
Maximum width of the object to detect.
The function cvHaarDetectObjectsR finds rectangular regions in the given image that are likely to contain objects the cascade has been trained for and returns those regions as a sequence of rectangles. CvHaarDetectObjectsR has the same functionality as cvHaarDetectObjects from OpenCV except that it restricts the width of the detected object to the range [min_width, max_width].
Initialize the mouth
tracker
cvInitTrackMouth( CvHidHaarClassifierCascade *hidCascadeFace,
CvHidHaarClassifierCascade *hidCascadeMouth0,
CvHidHaarClassifierCascade *hidCascadeMouth1, CvMouthTracker *MouthTracker);
hidCascadeFace
Input the pointer of boosted cascaded classifier for face detection.
hidCascadeMouth0
Input the pointer of boosted cascaded classifier for mouth without beard.
hidCascadeMouth1
Input the pointer of boosted cascaded classifier for mouth with beard.
MouthTracker
Output the data structure that contains the pointers of mouth tracking classifiers, status and records.
The function cvInitTrackMouth initialize the data structure MouthTracker
that contains the pointers of
classifiers, tracking status and records used in mouth detection and tracking. The input
arguments hidCascadeFace, hidCascadeMouth0 and hidCascadeMouth1
must be initialized by function cvLoadHaarClassifierCascade and function
cvCreateHidHaarClassifierCascade in OpenCV before calling this function.
The structure MouthTracker must allocated before using this function.
Reset the mouth tracker
before processing a new input video squence
cvResetTrackMouth( const int imagewidth, const int imageheight, CvMouthTracker *MouthTracker);
imagewidth
Input image width of the video sequence to process.
imageheight
Input image height of the video sequence to process.
MouthTracker
Input the data structure initialized by function cvInitMouthTracker. Output the updated data structure.
The function cvResetTrackMouth updates the data structure MouthTracker
according to the image size of the video sequence, and resets the Kalman filter
and the finite state machine for mouth tracking. The data structure MouthTracker
must be initialized by function cvInitTrackMouth before calling this function.
Detect or track mouth
in one image frame
cvUpdateTrackMouth (IplImage * img, CvMouthTracker* MouthTracker, CvTrackResult & result);
img
Input 8-bit grayscale or 24-bit true color image.
MouthTracker
Input the data structure CvMouthTracker that contains the
pointers of classifiers, tracking status and records. Output the updated one.
result
Output the data structure CvTrackResult that contains the mouth tracking result.
The function cvUpdateTrackMouth detects or tracks the mouth region in one
give image, and updates the data structure CvTrackResult
according to the detection/tracking result and the previous status of the
finite state machine.
Release the
classifiers, buffers and Kalman filter used in mouth detection and tracking.
cvReleaseTrackMouth ( CvMouthTracker* MouthTracker);
MouthTracker
Input the data structure CvMouthTracker that contains the
pointers of classifiers, tracking status and records.
The function cvReleaseTrackMouth releases three boosted cascade classifiers,
the buffers and the Kalman filter used in mouth detection and tracking. The data structure CvMouthTracker itself will
not be released.
Refine the trajectory
of the mouth region after the mouth detection and tracking.
cvPostProcessTrackMouth( CvMouthLocateResult * TrackList, const int framenum );
TrackList
Input the list of mouth detection and tracking results. Output the
updated list.
framenum
Input the number of results in TrackList.
The function cvPostProcessTrackMouth refine the trajectory of the mouth region
using the linear interpolation, median filter and Gaussian filter.
Apply histogram
equalization to the image.
cvEqualizeHist(const IplImage *img);
img
The input and output 8-bit grayscale image.
The function cvEqualizeHist applies the histogram equalization to the
image img. The ROI of the image will be ignored.
Implement the
illumination gradient correction of the image.
cvNormalizeIllum (const IplImage *img, const float destMean, const float destMse);
img
The input and output 8-bit grayscale image.
destMean
Input the given value of the means of destination image.
destMse
Input the given value of the mean square error of destination image.
The function cvNormalizeIllum implements the illumination gradient correction
[Rowley, 98] and [Sung, 98] to reduce heavy shadows in the
image due to extreme lighting angles. The function adjusts the mean and mean square
error of the image img according to destMean and destMse
parameters. The ROI of the image is ignored.
Let the brightness values of the pixels in the source image img be ,
, and let the parameters of the brightness plane that best
fits the all
be
. Then,
(1)
and using the Lagrange method, the parameters can be obtained as:
,
(2)
where,
,
(3)
Then the image is corrected by subtracting this plane:
(4)
In
most of the cases, will exceed the valid range of 8-bit pixel value (0~255).
Therefore a standardization operation called MSE normalization is used
to adjust the mean
and mean square error
of the illumination
corrected image given in the function parameters. The pixel values
of the normalized destination image are obtained as
(5)
where are the mean and mean square error of the destination image
as described by parameters destMean and destMse.
References
[Cordea,
01] M. D. Cordea, E. M. Petriu, N. D. Georganas, et al, Real-time
2(1/2)-D head pose recovery for model-based video-coding, IEEE Trans. on
Instrumentation and Measurement, 50(4): pp. 1007-1013, 2001
[Lienhart, 02] R. Lienhart, J. Maydt, An extended set of Haar-like
features for rapid objection detection, IEEE ICIP, pp.900~903, 2002
[Rowley, 98] Rowley, H.A.; Baluja, S.; Kanade, T. Neural network-based face detection. Pattern Analysis and Machine Intelligence,
IEEE Transactions on , Volume: 20 Issue: 1 , Jan. 1998 Page(s): 23-38
[Sung, 98] Sung, K.-K.; Poggio, T. Example-based
learning for view-based human face detection, Pattern Analysis and Machine Intelligence,
IEEE Transactions on , Volume: 20 Issue: 1 , Jan. 1998 Page(s): 39-51