Monday, September 19, 2011

htkdict load error in Julius

some people met this error while trying to operate julius 4.1.x

Error: voca_load_htkdict: the line content was: DECORATE [DECORATE] d eh k er ey t sp
Error: voca_load_htkdict: line 3118: triphone "ax-d+sp" not found
Error: voca_load_htkdict: line 3118: triphone "d-sp+*" or biphone "d-sp" not found
Error: voca_load_htkdict: the line content was: DECORATED [DECORATED] d eh k er ey dx ax d sp
Error: voca_load_htkdict: line 3119: triphone "iy-s+sp" not found
Error: voca_load_htkdict: line 3119: triphone "s-sp+*" or biphone "s-sp" not found
Error: voca_load_htkdict: the line content was: DECREASE [DECREASE] d ix k r iy s sp
Error: voca_load_htkdict: line 3120: triphone "ax-z+sp" not found
Error: voca_load_htkdict: line 3120: triphone "z-sp+*" or biphone "z-sp" not found
Error: voca_load_htkdict: the line content was: DECREASES [DECREASES] d ix k r iy s ax z sp
Error: voca_load_htkdict: line 3121: triphone "ix-ng+sp" not found
Error: voca_load_htkdict: line 3121: triphone "ng-sp+*" or biphone "ng-sp" not found
Error: voca_load_htkdict: the line content was: DECREASING [DECREASING] d ix k r iy s ix ng sp
Error: voca_load_htkdict: line 3122: triphone "r-iy+sp" not found
Error: voca_load_htkdict: line 3122: triphone "iy-sp+*" or biphone "iy-sp" not found

and so on ( the error is sill long)
************************************

Error reason:
julius optput these messages when your word dictionary contains words that are not trained in the Acoustic Model because the "voca_load_htkdict.c" tries to match the triphones in dict file with the triphone list in Acoustic Model, so when it does not find it, it shows this error and stops the program.

Possible error solutions:
1. enable -forcedict option or uncomment it jconf file to Skip error words in dictionary and force running.
or..
2. map the "not found triphone" to the most close physical triphone in hmmlist file "tiedlist".
for example:
b-ey+t v-eh+t
The first column is the name of triphone (generated from your dictionary), and the second column is the name of the HMM actually defined in your AM.

but this solution can be done if the "not found triphones" are little not too many.

3. the best solution is to not to include words in your dict file that are not in the A.M
note that the first two solutions are for testing julius only because for production or comercial projects you must train the acoustic model and language model with the same corpus.

How software is produced

this is how software is produced from what the user wanted and what is really done, I found it while crawling in the net.
and believe me, as a developer .. I can say that this is really true :D