Visual Speech Feature Extraction References



 

Introduction to Visual Speech Feature Extraction

The functions described in this section can be integrated into the cascade visual speech feature extraction system similar to that described in [Neti, 00] and  [Liang, 02]. Given the sequence of mouth regions obtained by the mouth tracking functions, the images are normalized to 32×32 in size and fed into the cascade feature extraction steps as shown in Fig. 1. Firstly, the input mouth ROI image is mapped to a 32-dimensional feature space using the principal component analysis (PCA) functions. Then the vector sequence is upsampled to 100Hz by function cvUpsampleFeature to match the audio feature, and standardized using the feature mean normalization (FMN) function NormalizeMean which algorithm described in [Neti, 00]. After that, the features that contain the concatenation of N consecutive feature vectors is obtained by function CvConcatVectors. Finally, the viseme-based linear discriminant analysis (LDA) functions are applied to the features and the observation vectors for visual-only or audio-visual speach recognition are obtained.

Fig.1 Flow Chart of visual feature extraction

 

PCA Functions for Visual Speech Feature Extraction

The functions described in this section perform principal component analysis (PCA) for a set of 8-bit images.

Let  be a N-dimensional vector that consists of the pixels values of an image of size arranged columnwise. Given a set of M input vectors , the mean vector and the covariance matrix are defined as follows:

                                                                                                 (1)

                                   (2)

In addition we define the sum vector  and the partial covariance matrix

                                                                                                                           (3)

                                                                                                             (4)

Let the first p largest eigenvalues and the corresponding eigenvectors be  and , the projection of a input vector u in the p-dimensional subspace is  that can be calculated by:

                                                                                                                         (5)

The result vector can also be normalized using eigenvalues as follows:

                                                                                                        (6)

The final projection result is .


InitPCA

Calculates the sum vector and the partial covariance matrix for a set of IPL images

cvInitPCA(const IplImage** srcImg, const int blockImgNum, 
          CvMat* sumVector, CvMat* pcovMatrix);

 

srcImg

pointer to the array of IplImage input images. All the images must be of the same size, 8-bit and without ROI selected.

blockImgNum

            number of images in srcImage.

sumVector

the sum vector  defined in Equation 3.

pcovMatrix

the partial covariance matrix  defined in Equation 4.

The function InitPCA calculates the sum vector in Equation 3 and the partial covariance matrix in Equation 4. The output matrices sumVector and pcovMatrix must be allocated before calling the function.


UpdatePCA

Update the sum vector and the partial covariance matrix by using the group of input images.

cvUpdatePCA(const IplImage** srcImg, const int blockImgNum, 
             int &totalImgNum, CvMat* sumVector, CvMat* pcovMatrix);

 

srcImg

pointer to the array of input IplImage images. All the images must be of the same size, 8-bit and with no ROI selected.

blockImgNum

number of images in srcImage.

totalImgNum

number of images used to calculate the input matrices, output the updated number.

sumVector

the sum vector. The function updates the current value of sumVector .

pcovMatrix

the partial covariance matrix. The function updates updates the current value of pcovMatrix.

 

The function cvUpdatePCA updates the sum vector  (Equation 3) and the partial covariance matrix   (Equation 4). The sum vector and the partial covariance matrix are computed “in place” in sumVector and pcovMatrix respectively. The total number of the images totalNumImg is also updated “in place”.


CalcPCAMatrix

Calculates the mean vector and covariance matrix,  the eigenvalues and eigenvectors of the covariance.

cvCalcPCAMatrix( const int totalImgNum, const CvMat* sumVector, const CvMat* pcovMatrix,
                        CvMat* meanVector, CvMat* covMatrix, CvMat* eigenValues, CvMat* eigenVectors);

 

totalImgNum

the number of input images used to calculate the matrices sumVector and pcovMatrix.

sumVector

the summary vector calculated by function cvInitPCA or cvUpdatePCA.

pcovMatrix

the partial covariance matrix calculated by function cvInitPCA or cvUpdatePCA.

meanVector

the mean vector.

covMatrix

the covariance matrix.

eigenValues

the eigenvalues in descending order.

eigenVectors

the eigenvectors corresponding to the eigenvalues in eigenValues. The eigenvectors are stored as the rows of the eigenVectors matrix.

The function cvCalcPCAMatrix calculates the mean vector, the covariance matrix, and  the eigen values and eigen vectors of the covariance marix. The function takes as inputs totalNumImgs, sumVector and pcovMatrix obtained with cvInitPCA or cvUpdatePCA.


PCAProjection

Transform the input images to the subspace obtained by PCA.

cvPCAProjection(const IplImage** srcImg, const int imgNum, const CvMat* eigenValues, 
                const CvMat* eigenVectors, const CvMat* meanVector, CvMat* projVectors, 
                bool eigenNorm );

 

srcImg

pointer to the array of input IplImage images. All the images must be of the same size, and have 8-bit depth.

imgNum

the number of input images.

eigenValues

the p largest eigenvalues obtained in PCA if the images will be transformed to p-dimensional subspace.

eigenVectors

the matrix in which each row is the eigenvector corresponding to the eigenvalue in eigenValue.

meanVector

the mean vector of the training images that used in PCA to obtain the eigenvalues and eigenvectors.

projVectors

output matrix in which each row is one feature vector in p-dimensional subspace transformed from the corresponding image in srcImg.

eigenNorm

Select if the transform result will be normalized using eigenvalues.

The function cvPCAProjection transforms the input images to the p-dimensional subspace using equation (5). If the optional input eigenNorm is true, the transform result will be normalized using equation (6).

 

LDA Functions for Visual Speech Feature Extraction

The functions in this section do the Linear Discriminant Analysis (LDA) to the given samples data. The objective of LDA is to find a projection matrix A that maximizes the ratio of between-class scatter S against within class scatter S (Fisher’s criterion):                                                                                                         

                                                                      (1)

Given a set of classes and a set of n-dimensional vectors X,  with their corresponding labels , the within-class scatter and the between-class scatter are defined as:

                                                    (2)

                            (3)

where , are the sample mean and covariance matrix of the vectors in class c, is the sample mean of all training vectors and is the class empirical probability mass function, where  is the number of samples in class c.

Matrix A is found through the following steps [Yu, 01]:

·        determine a matrix V such that  where  is a diagonal matrix with the diagonal elements sorted in descending order rowwise,

·        construct , where Y consists of the first p columns of V . Then .

·        determine a matrix U such that  where  is a diagonal matrix,

·        construct matrix  that satisfies  and     . The rows of A are sorted from top to bottom according to the descending order of the eigenvalues of .

·        Construct the normalized matrix

                                                                                  (4)

The projection of a vector X in the LDA space is given by :

                                                                                    (5)


InitLDA

Calculates number of samples, sum vector and partial covariance matrix for each class and the normalization factor associated with all input vectors

cvInitLDA( const CvMat* labels, const CvMat* inVectors, const int numClasses, 
           CvMat *numSamplesInClass, CvMat *sumVectorInClass, 
           CvMat **pCovMatrixInClass, CvMat *normFactor);

 

labels

class labels, each element is corresponding to one input vectors in inVectors. For C classes the value of the labels are in the range [0,C-1].

inVectors

input data, each row represents one sample vector.

numClasses

number of classes

numSamplesInClass

the number of samples in each class, the number of samples in the ith class are stored in the ith row.

sumVectorInClass

the sum vector for each class, the ith contains the sum vector of all vectors in the  ith class.

pCovMatrixInClass

the sequence of partial covariance matrices for each class

normFactor

normalization factor for each dimension of the input vectors

The function cvInitLDA calculates the number of samples, sum vectors and partial covariance matrices for each class, and the normalization factor of all input vectors. The output matrices must be allocated before call this function. Given the c-class n-dimensional input vectors, the matrices numSamplesInClass, sumVectorInClass, pCovMatrixInClass and normFactor are of size c×1, c×n, c list of n×n and 1×n matrices respectively.


UpdateLDA

Updates the number of samples, the sum vectors and partial covariance matrices for each class and the normalization factor.

cvUpdateLDA( const CvMat* labels, const CvMat* inVectors, const int numClasses, 
             CvMat *numSamplesInClass, CvMat *sumVectorInClass, 
             CvMat **pCovMatrixInClass, CvMat *normFactor);

 

labels

class labels, each element is corresponding to one input vectors in inVectors. For C classes the value of the labels are in the range [0,C-1].

inVectors

            input data, each row represents one sample vector.

numClasses

       number of classes

numSamplesInClass

the number of samples in each class updated “in place”.

sumVectorInClass

the values of the sum vectors updated “in place”.

pCovMatrixInClass

the partial covariance matrices for each class updated ‘in place”.

normFactor

the normalization factor updated “in place”.

The function cvUpdateLDA updates the number of samples, sum vectors and partial covariance matrices without for each class, and the normalization factor, from the values obtained with cvInitLDA. The sizes of matrices numSamplesInClass, sumVectorInClass, pCovMatrixInClass and normFactor must match that of the input feature data and class number. See the description of the size of each matrix in function cvInitLDA.


CalcLDASwSb

Calculates the within-class scatter and between-class scatter matrices

cvCalcLDASwSb( const CvMat* numSamplesInClass, const CvMat* sumVectorInClass, 
               CvMat** pCovMatrixInClass, const CvMat* normFactor,
                CvMat* withinClassScatter, CvMat* betweenClassScatter,
               CvMat* meanVectors, CvMat* mse);
 

numSamplesInClass

the number of samples in each class calculated by function cvInitLDA or cvUpdateLDA.

sumVectorInClass

the sum vectors of the samples in each class calculated by function cvInitLDA or cvLDATotal.

pCovMatrixInClass

the partial covariance matrices for each class calculated by function cvInitLDA or cvUpdateLDA.

normFactor

the normalization factor calculated by function cvInitLDA or cvUpdateLDA.

withinClassScatter

the within-class scatter matrix.

betweenClassScatter

the between-class scatter matrix.

meanVectors

the mean vector of all the sample data.

mse

the mean square error of all the sample data used.

The function cvCalcLDASwSb calculates within-class scatter and between-scatter matrices as well as the means and mean square error of all the sample data used. The output matrices must be allocated before calling this function. Suppose the input features are n-dimensional, the matrices withinClassScatter, betweenClassScatter, meanVectors and mse are of size n×n, n×n, n×1 and n×1 respectively.


CalcLDAMatrix

Calculates the LDA projection matrix A and the eigenvalues

cvCalcLDAMatrix( CvMat* withinClassScatter, CvMat* betweenClassScatter,
                   CvMat* projMatrix, CvMat* eigenvalues);

withinClassScatter

the within-class scatter matrix.

betweenClassScatter

the between-class scatter matrix.

projMatrix

the LDA projection matrix A.

eigenvalues

the resulting eigenvalues.

The function cvCalcLDAMatrix calculates the LDA projection matrix A and the corresponding eigenvalues using the algorithm described above. The output matrices must be allocated before calling this function. Suppose the input features are n-dimensional and the destination feature space is m-dimensional, the size of projMatrix and eigenvalue should be m×n and m×1 respectively.


NormLDAMatrix

Divides the vectors in the rows of LDA projection matrix by the square roots of the corresponding eigenvalues

cvNormLDAMatrix( const CvMat* eigenvalues, CvMat* projMatrix ); 
 

eigenvalues

the eigenvalues obtained using function cvCalcLDAMatrix.

projMatrix

the LDA projection matrix obtained by function cvCalcLDAMatrix. After the function call projMatrix is normalized.

The function cvNormLDAMatrix computes the normalized matrix  in Equation 4.


LDAProjection

Transforms the input data to the feature space obtained by LDA

cvLDAProjection( const CvMat* projMatrix, const CvMat* inVectors, CvMat* outVectors );
 

projMatrix

the LDA projection matrix obtained by function CalcEigenvectorsLDA or cvNormLDAMatrix.

inVectors

 the input vectors stored as the rows of inVectors.

outVectors

the transformed vectors stored as the rows of outVectors.

The function cvLDAProjection transforms the input data into the transformed space obtained by LDA. Suppose that the input vectors space is n-dimensional, the LDA space is m-dimension and the number of input vectors is p, then the matrices projMatrix, inVectors and outVectors are of size m×n, p×n and p×m respectively.

 

Other Functions for Visual Speech Feature Extraction

InterpolateFT

Upsample the input vectors

cvInterpolateFT( const CvMat* inVectors, const float R, CvMat* outVectors );

 

inVectors

Input vectors, stored as rows of inVectors

R

Interpolation rate, must be larger than 1

outVectors

Output vectors stored as the rows of outVectors

The function cvInterpolateFT interpolates to the input vectors using the Fourier transform. The interpolation is done on each column of the input feature inVectors and the result is stored in outVectors that must be allocated before calling this function. Assuming M N-dimensional input vectors stored in a M×N matrix inVectors, the matrix outVectors must be size of (MR) ×N.


NormalizeMean

Performs the  mean normalization  of the input vectors

cvMeanNorm( CvMat* inVectors );

 

inVectors

Feature data matrix, each row is a feature vector

The function cvNormalizeMean subtracts the mean from the input vectors [Neti, 00] to the feature vectors. The mean is computed over the whole set of input vectors.


ConcatVectors

Concatenates  J consecutive vectors

cvConcatVectors(const CvMat* inVectors, const int J, CvMat* outVectors );

 

inVectors

the input vectors stored as the rows of the inVectors matrix.

J

the number of input vectors concatenated to form one output vector.

outVectors

the output concatenated vector stored as the rows of outVectors.

The function cvConcatVectors concatenates the J consecutive feature vectors to one new vector to capture the dynamic information in the feature data described in [Neti, 00].  The matrix outVectors must be allocated before the function call.

 

References

[Liang, 02] L. H. Liang, X. X. Liu, Y. B. Zhao, X. Pi and A. V. Nefian, “Speaker independent audio-visual continuous speech recognition”, In Proc. of IEEE ICME, Lausanne, Switzerland, 2002

[Neti, 00] Neti C., Potamianos G., Luettin J., et al. “Audio-visual speech recognition, Final Workshop 2000 Report, Center for Language and Speech Processing”, The Johns Hopkins University, Baltimore, MD (Oct. 12, 2000).

[Yu, 01] H. Yu, J. Yang, “A direct LDA algorithm for high-dimensional data with application to face recognition”, Pattern Recognition 34(10), 2001, pp. 2067-2070