Audio Visual Continuous Speech Recognition  Application

 

Software Installation

Install the latest version of OpenCV

1. Download the latest version of OpenCV on website http://sourceforge.net/projects/opencvlibrary/.

2. Install the OpenCV library. With the default settings OpenCV will be installed in C:\program files\OpenCV.

3. Add the path C:\program files\OpenCV\cv\include, C:\program files\OpenCV\cv\src, C:\program files\OpenCV\cvaux\include and C:\program files\OpenCV\cvaux\src into the VC++ include files setting (in the Directories tab of Tools\Options)

4. Add the path C:\program files\OpenCV\lib into the VC++ Library files setting (in the Directories tab of Tools\Options)

Copy data and compile the program

5. Suppose the ACVSR open source package is installed in C:\AvcsrDemo

6. Open the workspace in app\AvcsrDemoWin\AvsrDemo.dsw, select the project AvsrDemo in Build\Set Active Configuration…, then use Build\Rebuild All to re-compile the program.

7. Copy all your test data (see Before you start: data preparation) into folder C:\AvcsrDemo\Test\AvsrDemo\WinData.

Set configure files

        If  you use autorun.exe to install the package, the configuration files have been set automatically.

         8.  Check/Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\WinData\DemoConfig.cfg as following:

Figure 1. An example configuration file for the AVCSR application

 

9. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\VHMM_Model\VHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\VHMM_Model.

10. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\AHMM_Model\AHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\AHMM_Model.

11.  Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\CHMM_Model\CHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\CHMM_Model.

12. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\Front-End\MouthTrackSetting.cfg. Change the item:

HAARSAMPLEBASEPATH= C:\AvcsrDemo\Test\AvsrDemo\Front-End\HaarBase

DUALCASCADEPATH0= C:\AvcsrDemo\Test\AvsrDemo\Front-End \Mouth32_norm

DUALCASCADEPATH1= C:\AvcsrDemo\Test\AvsrDemo\Front-End \Mouth32_beard

Register the ActiveX control

 13. In command line: RegSvr32 C:\AvcsrDemo\Bin\AviPlayer.ocx. If  you use autorun.exe to install the package, the ActiveX control has been registered automatically.

Run the application

14.  C:\AvsrDemo\Bin\AvcsrDemo.exe

 

Usage

 

The components and files used in the AVCSR system are shown in Figure 2.

 

Figure 2. The components and files used in the AVCSR system

1.     1. Before you start: data preparation

Two sets of data files are needed for the AVCSR application: the input audio-video files and their corresponding transcript files (e.g. for the data file test.avi, its corresponding transcript file should be named as test.txt). 

The audio-video input files supported by the current application are uncompressed AVI. The input video can be true color or grayscale. For increased accuracy, the face of the speaker in the video sequence must be frontal upright and of width larger than 128 pixels. The audio channel is 16-bit, sampled at 32kHz stereo. The current application uses only the left channel.

The transcript files, one for each audio-visual input data file, provide the word transcript and the time boundaries of the audio-visual sequence. Figure 3 shows a sample transcript file. The first two lines are the file header, and the remaining lines describe the transcripts and the time boundaries of the words. In each line, the first two numbers represent the start and end point of one word (Note that the time boundary is counted in 10 miliseconds. For example, the line “94 165 six” mean that the word “six” starts at 94×10=940 ms and ends in 165×10=1650 ms) followed by the word label. In current application, the name of the words are zero, one, two, three, four, five, six, seven, eight or nine, since the system only supports digit recognition. To recognize other words the system must be retrained.

Note that both the AVI files and the transcript files must be placed in the same folder as the configure file DemoConfig.cfg. If you install the package in C:\AvcsrDemo, the data  folder should be C:\AvcsrDemo\Test\AvsrDemo\WinData.

Figure 3. Content of a sample transcript file

 

2. User interface and controls

 

Fig.4. User interface

Open data file: Use File/Open to open an AVI file in C:\AvcsrDemo\Test\AvsrDemo\WinData, the system will load the content of this file and list all the AVI files in this folder, as well as the file democonfig.cfg. The system also loads the corresponding transcript file (*.txt) at the same time.

Recognition: When pressing the Start button in the control region, the system will start the recognition of the words spoken in the selected file. The recognition progress is shown in the process indicator region. After finishing the recognition, the recognized sequence of words for the video-only (VSR), audio-only (ASR) and AVCSR are shown in the recognition results region, and the corresponding word error rate (WER) is displayed in statistic results region.

Batch recognition: When pressing the Batch button down, the system will continue to the recognition in batch mode for all the audio visual sequences in the current directory starting with the selected file.  To switch down the process-in-batch mode, stop the process (see Stop process), then press Batch button again.

Stop process: When pressing the Stop button the recognition is stopped after the results of the current audio-visual file are released.

Reset the WER: When pressing Reset, the word error rate of AVCSR, ASR and VSR will be reset to zero. The button is active when the recognition process is stopped.

Change the acoustic noise: When pressing the Noisy button, the system can add/remove noise to the audio channel of the audio-visual sequence. The signal to noise (SNR) ratio can be adjusted, when the Noisy button is pressed, using the slider bar.

File selection: An audio-visual file can be selected by double clicking the file name in the file name list in the data file list region.