The functions described in this section can be integrated into the cascade visual
speech feature extraction system similar to that described in [Neti, 00] and [Liang, 02]. Given the sequence of mouth regions obtained by the
mouth tracking functions, the images are normalized to 32×32 in
size and fed into the cascade feature extraction steps as shown in Fig. 1.
Firstly, the input mouth ROI image is mapped to a 32-dimensional feature space
using the principal component analysis (PCA) functions. Then the vector
sequence is upsampled to 100Hz by function cvUpsampleFeature to
match the audio feature, and standardized using the feature mean normalization
(FMN) function
NormalizeMean which
algorithm described in [Neti, 00]. After that, the features that contain the concatenation
of N consecutive feature vectors is obtained by function CvConcatVectors. Finally, the
viseme-based linear discriminant analysis (LDA) functions are applied to the features and
the observation vectors for visual-only or audio-visual speach recognition are
obtained.
Fig.1 Flow Chart of visual feature
extraction
The functions described in this
section perform principal component analysis (PCA) for a set of 8-bit images.
Let be a N-dimensional
vector that consists of the pixels values of an image of size
arranged columnwise. Given a set of M input vectors
, the mean vector
and the covariance matrix
are defined as follows:
(1)
(2)
In addition we define the sum vector and the partial
covariance matrix
(3)
(4)
Let the first p largest eigenvalues and the
corresponding eigenvectors be and
, the projection of a input vector u in the
p-dimensional subspace is
that can be
calculated by:
(5)
The result vector
can also be normalized using eigenvalues as follows:
(6)
The final projection result is .
Calculates the sum vector and the partial covariance matrix for a set of IPL images
cvInitPCA(const IplImage** srcImg, const int blockImgNum,
CvMat* sumVector, CvMat* pcovMatrix);
srcImg
pointer to the array of IplImage input images. All the images must be of the same size, 8-bit and without ROI selected.
blockImgNum
number of images in srcImage.
sumVector
the sum vector defined in Equation
3.
pcovMatrix
the partial covariance matrix defined in Equation
4.
The function InitPCA
calculates the sum vector in Equation 3 and the partial covariance matrix in
Equation 4. The output matrices sumVector and pcovMatrix must be
allocated before calling the function.
Update the sum vector and the partial covariance matrix by using the group of input images.
cvUpdatePCA(const IplImage** srcImg, const int blockImgNum,
int &totalImgNum, CvMat* sumVector, CvMat* pcovMatrix);
srcImg
pointer to the array of input IplImage images. All the images must be of the same size, 8-bit and with no ROI selected.
blockImgNum
number of images in srcImage.
totalImgNum
number of images used to calculate the input matrices, output the updated number.
sumVector
the sum vector. The function updates the current value of sumVector .
pcovMatrix
the partial covariance matrix. The
function updates updates the current value of pcovMatrix.
The function cvUpdatePCA
updates the sum vector (Equation 3) and the
partial covariance matrix
(Equation 4). The
sum vector and the partial covariance matrix are computed “in place” in sumVector
and pcovMatrix respectively. The total number of the images totalNumImg
is also updated “in place”.
Calculates the mean vector and covariance matrix, the eigenvalues and eigenvectors of the covariance.
cvCalcPCAMatrix( const int totalImgNum, const CvMat* sumVector, const CvMat* pcovMatrix,
CvMat* meanVector, CvMat* covMatrix, CvMat* eigenValues, CvMat* eigenVectors);
totalImgNum
the number of input images used to calculate the matrices sumVector and pcovMatrix.
sumVector
the summary vector calculated by function cvInitPCA or cvUpdatePCA.
pcovMatrix
the partial covariance matrix calculated by function cvInitPCA or cvUpdatePCA.
meanVector
the mean vector.
covMatrix
the covariance matrix.
eigenValues
the eigenvalues in descending order.
eigenVectors
the eigenvectors corresponding to the eigenvalues in eigenValues. The eigenvectors are stored as the rows of the eigenVectors matrix.
The function cvCalcPCAMatrix calculates the mean vector, the covariance matrix, and the eigen values and eigen vectors of the covariance marix. The function takes as inputs totalNumImgs, sumVector and pcovMatrix obtained with cvInitPCA or cvUpdatePCA.
Transform the input images to the subspace obtained by PCA.
cvPCAProjection(const IplImage** srcImg, const int imgNum, const CvMat* eigenValues,
const CvMat* eigenVectors, const CvMat* meanVector, CvMat* projVectors,
bool eigenNorm );
srcImg
pointer to the array of input IplImage images. All the images must be of the same size, and have 8-bit depth.
imgNum
the number of input images.
eigenValues
the p largest eigenvalues obtained in PCA if the images will be transformed to p-dimensional subspace.
eigenVectors
the matrix in which each row is the eigenvector corresponding to the eigenvalue in eigenValue.
meanVector
the mean vector of the training images that used in PCA to obtain the eigenvalues and eigenvectors.
projVectors
output matrix in which each row is one feature vector in p-dimensional subspace transformed from the corresponding image in srcImg.
eigenNorm
Select if the transform result will be normalized using eigenvalues.
The function cvPCAProjection transforms the input images to the p-dimensional
subspace using equation (5). If the optional input eigenNorm is true, the
transform result will be normalized using equation (6).
The
functions in this section do the Linear Discriminant Analysis (LDA) to the given
samples data. The objective of LDA is to find a projection matrix A that
maximizes the ratio of between-class scatter S against within class scatter S
(Fisher’s criterion):
(1)
Given
a set of classes and a set of n-dimensional vectors X
,
with their corresponding
labels
, the within-class scatter and the between-class
scatter are defined as:
∑
(2)
(3)
where ,
are the sample mean and covariance matrix of the vectors in
class c,
is the sample mean of all training vectors and
is the class empirical probability mass function, where
is the number of samples in
class c.
Matrix
A is found through the following steps [Yu, 01]:
·
determine a matrix V such that where
is a diagonal matrix
with the diagonal elements sorted in descending order rowwise,
·
construct , where Y consists of the first p columns of V
. Then
.
·
determine a matrix U such that where
is a diagonal matrix,
·
construct matrix that
satisfies
and
. The rows of A are sorted from top to bottom
according to the descending order of the eigenvalues of
.
·
Construct the normalized matrix
(4)
The
projection of a vector X in the LDA space is given by :
(5)
Calculates number of
samples, sum vector and partial covariance matrix for each class and the normalization factor
associated with all input vectors
cvInitLDA( const CvMat* labels, const CvMat* inVectors, const int numClasses,
CvMat *numSamplesInClass, CvMat *sumVectorInClass,
CvMat **pCovMatrixInClass, CvMat *normFactor);
labels
class labels, each element is corresponding to one input vectors in inVectors. For C
classes the value of the labels are in the range [0,C-1].
inVectors
input data, each row represents one sample vector.
numClasses
number of classes
numSamplesInClass
the number of samples in each class, the number of samples in the ith
class are stored in the ith row.
sumVectorInClass
the sum vector for each class, the ith contains the sum vector of
all vectors in the ith class.
pCovMatrixInClass
the sequence of partial covariance matrices for each class
normFactor
normalization factor for each dimension of the input vectors
The function cvInitLDA calculates the number of
samples, sum vectors and partial covariance matrices for each class, and the
normalization factor of all input vectors. The output matrices must be allocated before call this function.
Given the c-class n-dimensional input vectors, the matrices numSamplesInClass,
sumVectorInClass, pCovMatrixInClass and normFactor are of size c×1, c×n, c list of
n×n and 1×n matrices respectively.
Updates the number of samples,
the sum vectors and partial covariance matrices for each class and the normalization factor.
cvUpdateLDA( const CvMat* labels, const CvMat* inVectors, const int numClasses,
CvMat *numSamplesInClass, CvMat *sumVectorInClass,
CvMat **pCovMatrixInClass, CvMat *normFactor);
labels
class labels, each element is corresponding to one input vectors in inVectors. For C
classes the value of the labels are in the range [0,C-1].
inVectors
input data, each row represents one sample vector.
numClasses
number of classes
numSamplesInClass
the number of samples in each class updated “in place”.
sumVectorInClass
the values of the sum vectors updated “in place”.
pCovMatrixInClass
the partial covariance matrices for each class updated ‘in place”.
normFactor
the
normalization factor updated “in place”.
The function cvUpdateLDA
updates
the number of samples, sum vectors and
partial covariance matrices without for each class, and the normalization
factor, from the values obtained with cvInitLDA. The sizes of matrices numSamplesInClass, sumVectorInClass,
pCovMatrixInClass and normFactor must match that of the input
feature data and class number. See the description of the size of each matrix
in function cvInitLDA.
Calculates the
within-class scatter and between-class scatter matrices
cvCalcLDASwSb( const CvMat* numSamplesInClass, const CvMat* sumVectorInClass,
CvMat** pCovMatrixInClass, const CvMat* normFactor,
CvMat* withinClassScatter, CvMat* betweenClassScatter,
CvMat* meanVectors, CvMat* mse);
numSamplesInClass
the number of samples in each class calculated by function cvInitLDA
or cvUpdateLDA.
sumVectorInClass
the sum vectors of the samples in each class calculated by function cvInitLDA
or cvLDATotal.
pCovMatrixInClass
the partial covariance matrices for each class calculated by function cvInitLDA
or cvUpdateLDA.
normFactor
the normalization factor calculated by function cvInitLDA or cvUpdateLDA.
withinClassScatter
the within-class scatter matrix.
betweenClassScatter
the between-class scatter matrix.
meanVectors
the mean vector of all the sample data.
mse
the mean square error of all the sample data used.
The function cvCalcLDASwSb calculates within-class
scatter and between-scatter matrices as well as the means and mean square error
of all the sample data used. The output matrices must be allocated before
calling this function. Suppose the input features are n-dimensional, the
matrices withinClassScatter, betweenClassScatter, meanVectors
and mse are of size n×n, n×n, n×1 and n×1
respectively.
Calculates the LDA
projection matrix A and the eigenvalues
cvCalcLDAMatrix( CvMat* withinClassScatter, CvMat* betweenClassScatter,
CvMat* projMatrix, CvMat* eigenvalues);
withinClassScatter
the within-class scatter matrix.
betweenClassScatter
the between-class scatter matrix.
projMatrix
the LDA projection matrix A.
eigenvalues
the resulting eigenvalues.
The function cvCalcLDAMatrix calculates the LDA
projection matrix A and the corresponding eigenvalues using the
algorithm described above. The output matrices must be allocated before calling
this function. Suppose the input features are n-dimensional and the
destination feature space is m-dimensional, the size of projMatrix
and eigenvalue should be m×n and m×1 respectively.
Divides the vectors in
the rows of LDA projection matrix by the square roots of the corresponding
eigenvalues
cvNormLDAMatrix( const CvMat* eigenvalues, CvMat* projMatrix );
eigenvalues
the eigenvalues obtained using function cvCalcLDAMatrix.
projMatrix
the LDA projection matrix obtained by function cvCalcLDAMatrix. After the function call projMatrix
is normalized.
The function cvNormLDAMatrix computes the normalized matrix in Equation 4.
Transforms the input
data to the feature space obtained by LDA
cvLDAProjection( const CvMat* projMatrix, const CvMat* inVectors, CvMat* outVectors );
projMatrix
the LDA projection matrix obtained by function CalcEigenvectorsLDA
or cvNormLDAMatrix.
inVectors
the input vectors stored as the
rows of inVectors.
outVectors
the transformed vectors stored as the rows of outVectors.
The function cvLDAProjection transforms the input data into the transformed
space obtained by LDA. Suppose that the input vectors space is n-dimensional,
the LDA space is m-dimension and the number of input vectors is p,
then the matrices projMatrix, inVectors and outVectors are
of size m×n, p×n and p×m
respectively.
Upsample the input
vectors
cvInterpolateFT( const CvMat* inVectors, const float R, CvMat* outVectors );
inVectors
Input vectors, stored as rows of inVectors
R
Interpolation rate, must be larger than 1
outVectors
Output vectors stored as the rows of outVectors
The function cvInterpolateFT interpolates to the input vectors using the
Fourier transform. The interpolation is done on each column of the input
feature inVectors and the result is stored in outVectors that
must be allocated before calling this function. Assuming M N-dimensional
input vectors stored in a M×N matrix inVectors, the matrix outVectors
must be size of (MR) ×N.
Performs the mean normalization of the input vectors
cvMeanNorm( CvMat* inVectors );
inVectors
Feature data matrix, each row is a feature vector
The function cvNormalizeMean subtracts the mean from the input vectors [Neti,
00] to
the feature vectors. The mean is computed over the whole set of input vectors.
Concatenates J consecutive vectors
cvConcatVectors(const CvMat* inVectors, const int J, CvMat* outVectors );
inVectors
the input vectors stored as the rows of the inVectors matrix.
J
the number of input vectors concatenated to form one output vector.
outVectors
the output concatenated vector stored as the rows of outVectors.
The function
cvConcatVectors concatenates the J consecutive feature
vectors to one new vector to capture the dynamic information in the feature
data described in [Neti, 00]. The
matrix outVectors must be allocated before the function call.
References
[Liang, 02] L. H. Liang, X. X. Liu,
Y. B. Zhao, X. Pi and A. V. Nefian, “Speaker independent audio-visual
continuous speech recognition”, In Proc. of IEEE ICME, Lausanne, Switzerland,
2002
[Neti, 00] Neti C.,
Potamianos G., Luettin J., et al. “Audio-visual speech recognition, Final
Workshop 2000 Report, Center for Language and Speech Processing”, The Johns
Hopkins University, Baltimore, MD (Oct. 12, 2000).
[Yu, 01] H. Yu, J. Yang, “A direct LDA
algorithm for high-dimensional data — with
application to face recognition”, Pattern Recognition 34(10), 2001, pp.
2067-2070