Audio Visual Continuous Speech Recognition Application |
Software
Installation
Install the latest version of OpenCV
1. Download the latest version of OpenCV on website http://sourceforge.net/projects/opencvlibrary/.
2. Install the OpenCV library. With the default settings OpenCV will be installed in C:\program files\OpenCV.
3. Add the path C:\program files\OpenCV\cv\include, C:\program files\OpenCV\cv\src, C:\program files\OpenCV\cvaux\include and C:\program files\OpenCV\cvaux\src into the VC++ include files setting (in the Directories tab of Tools\Options)
4. Add the path C:\program files\OpenCV\lib into the VC++ Library files setting (in the Directories tab of Tools\Options)
Copy data and
compile the program
5. Suppose the ACVSR open source package is installed in C:\AvcsrDemo
6. Open the workspace in app\AvcsrDemoWin\AvsrDemo.dsw, select the project AvsrDemo in Build\Set Active Configuration…, then use Build\Rebuild All to re-compile the program.
7. Copy all your test data (see Before you start: data preparation) into folder C:\AvcsrDemo\Test\AvsrDemo\WinData.
Set configure files
If you use autorun.exe to install the package, the configuration files have been set automatically.
8. Check/Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\WinData\DemoConfig.cfg as following:
Figure 1. An example configuration file for the AVCSR application
9. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\VHMM_Model\VHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\VHMM_Model.
10. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\AHMM_Model\AHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\AHMM_Model.
11. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\CHMM_Model\CHMMConfig.cfg. Change the item REFS_PATH= to C:\AvcsrDemo\Test\AvsrDemo\CHMM_Model.
12. Check\Modify the configure file C:\AvcsrDemo\Test\AvsrDemo\Front-End\MouthTrackSetting.cfg. Change the item:
HAARSAMPLEBASEPATH= C:\AvcsrDemo\Test\AvsrDemo\Front-End\HaarBase
DUALCASCADEPATH0= C:\AvcsrDemo\Test\AvsrDemo\Front-End \Mouth32_norm
DUALCASCADEPATH1= C:\AvcsrDemo\Test\AvsrDemo\Front-End \Mouth32_beard
Register the ActiveX control
13. In command line: RegSvr32 C:\AvcsrDemo\Bin\AviPlayer.ocx. If you use autorun.exe to install the package, the ActiveX control has been registered automatically.
Run the application
14. C:\AvsrDemo\Bin\AvcsrDemo.exe
Usage
The components and files used in the AVCSR system are shown in Figure
2.
Figure 2. The components and files used in the AVCSR system
1.
1. Before you start:
data preparation
Two sets of data files are needed for the AVCSR application: the input audio-video files and their corresponding transcript files (e.g. for the data file test.avi, its corresponding transcript file should be named as test.txt).
The audio-video input files supported by the current application are uncompressed AVI. The input video can be true color or grayscale. For increased accuracy, the face of the speaker in the video sequence must be frontal upright and of width larger than 128 pixels. The audio channel is 16-bit, sampled at 32kHz stereo. The current application uses only the left channel.
The transcript files, one for each audio-visual input data file, provide the word transcript and the time boundaries of the audio-visual sequence. Figure 3 shows a sample transcript file. The first two lines are the file header, and the remaining lines describe the transcripts and the time boundaries of the words. In each line, the first two numbers represent the start and end point of one word (Note that the time boundary is counted in 10 miliseconds. For example, the line “94 165 six” mean that the word “six” starts at 94×10=940 ms and ends in 165×10=1650 ms) followed by the word label. In current application, the name of the words are zero, one, two, three, four, five, six, seven, eight or nine, since the system only supports digit recognition. To recognize other words the system must be retrained.
Note that both the AVI files and the transcript files must be placed in the same folder as the configure file DemoConfig.cfg. If you install the package in C:\AvcsrDemo, the data folder should be C:\AvcsrDemo\Test\AvsrDemo\WinData.
Figure 3. Content of a sample transcript file
2.
User interface and controls
Fig.4. User interface
Open data file: Use File/Open to open an AVI file in C:\AvcsrDemo\Test\AvsrDemo\WinData, the system will load the content of this file and list all the AVI files in this folder, as well as the file democonfig.cfg. The system also loads the corresponding transcript file (*.txt) at the same time.
Recognition: When pressing the Start button in the
control region, the system will start the recognition of the words spoken in
the selected file. The recognition progress is shown in the process
indicator region. After finishing the recognition, the recognized sequence
of words for the video-only (VSR), audio-only (ASR) and AVCSR are shown in the recognition
results region, and the corresponding word error rate (WER) is displayed in
statistic results region.
Batch recognition: When pressing the Batch button down,
the system will continue to the recognition in batch mode for all the audio
visual sequences in the current directory starting with the selected file. To switch down the process-in-batch
mode, stop the process (see Stop process), then press Batch
button again.
Stop process: When pressing the Stop button the recognition is
stopped after the results of the current audio-visual file are released.
Reset the WER: When pressing Reset, the word error rate of
AVCSR, ASR and VSR will be reset to zero. The button is active when the
recognition process is stopped.
Change the acoustic noise: When pressing the Noisy button, the
system can add/remove noise to the audio channel of the audio-visual sequence.
The signal to noise (SNR) ratio can be adjusted, when the Noisy button
is pressed, using the slider bar.
File selection: An audio-visual file can be selected by double clicking the file name in the file name list in the data file list region.