The API functions described in this section
implement the extraction of mel-frequency cepstral coefficients (MFCC) [Stork, 96]
from audio sequences. The MFCC are used as audio features in audio-visual continuous
speech recognition (AVCSR).
Initialize the MFCC
transform module. Return true if success.
bool InitMFCC();
Extract MFC coefficients from the input audio data.
Return true if success.
bool ExtractMFCC(short *pWaveData,int nSmpCount, float *pMfcc, int &nFeatureNum, long lSmpRate);
pWaveData
pointer to buffer that contains the input audio data.
nSmpCount
number of
data points of the input audio data
(size of the buffer pWaveData).
pMfcc
pointer to the buffer that contains the result MFCC feature vectors. The result feature data is a 2-dimensional array, each row is a 13-dimensional feature vector. Before calling this function, the buffer must be allocated with size of (13×100×nSmpCount / lSmpRate) ×sizeof(float).
nFeatureNum
actual number of MFCC data vectors.
lSmpRate
sample rate in Hz of the input data. The current API only supports the sample rates 16000Hz, 48000Hz
and 44100Hz.
Release the MFCC transform module. Return true if
success.
bool ReleaseMFCC();
References
[Stork, 96] D.G. Stork and M.E. Hennecke,
eds. Speechreading by Humans and Machines. Springer, Berlin, 1996.