Audio Speech Feature Extraction References


API functions for MFCC extraction

The API functions described in this section implement the extraction of mel-frequency cepstral coefficients (MFCC) [Stork, 96] from audio sequences. The MFCC are used as audio features in audio-visual continuous speech recognition (AVCSR).


InitMFCC

Initialize the MFCC transform module. Return true if success.

bool InitMFCC();

 


ExtractMFCC

Extract  MFC coefficients from the input audio data. Return true if success.

bool ExtractMFCC(short *pWaveData,int nSmpCount, float *pMfcc, int &nFeatureNum, long lSmpRate);

 

pWaveData

pointer to buffer that contains the input audio data.

nSmpCount

            number of data points of the input audio data (size of the buffer pWaveData).

pMfcc

            pointer to the buffer that contains the result MFCC feature vectors. The result feature data is a 2-dimensional array, each row is a 13-dimensional feature vector. Before calling this function, the buffer must be allocated with size of (13×100×nSmpCount / lSmpRate) ×sizeof(float).

nFeatureNum

            actual number of MFCC data vectors.

lSmpRate

            sample rate in Hz of the input data. The current API only supports the sample rates 16000Hz, 48000Hz and 44100Hz.


ReleaseMFCC

Release  the MFCC transform module. Return true if success.

bool ReleaseMFCC();

 

References

[Stork, 96] D.G. Stork and M.E. Hennecke, eds. Speechreading by Humans and Machines. Springer, Berlin, 1996.