Training StrategyMonophone Training Making Triphones from Monophones Unclustered Triphone Training Making Tied-state Triphones N... Step 2 - the DictionaryHDMan.exe -m -w wlist -n monoph
Trang 1ATK-HTK
Trang 2An A pplication T ool K it for HTK
Trang 3The H idden Markov Model T ool k it
Trang 4Training Strategy
Monophone Training
Making Triphones from Monophones
Unclustered Triphone
Training
Making Tied-state Triphones
N
Trang 5Data Preparation
• Step 1 - the Task Grammar
• Step 2 - the Dictionary
• Step 3 - Recording the Data
• Step 4 - Creating a Transcription Files
• Step 5 - Coding the Data
Trang 6Step 1 - the Task Grammar
Gram.txt
$digit = MOOJT | HAI | BA | BOOSN | NAWM |
SASU | BARY | TASM | CHISN | KHOONG;
$name = [ THAAFY ] QUAAN |
[ HOAFNG ] HAJ;
( SENT-START ( NOOSI [MASY] TOWSI [SOOS] <$digit> |
(LIEEN LAJC | GOJI) $name ) SENT-END )
Trang 7Step 1 - the Task Grammar
I=25 W=SENT-START
I=24 W=NOOSI
I=0 W=SENT-END
I=26 W=!NULL
…
J=60
J=61
Trang 8Step 1 - the Task Grammar
Trang 9Step 2 - the Dictionary
HDMan.exe -m -w wlist -n monophones1
-l dlog dict beep names
Trang 10Step 3 - Recording the Data
S006 GOJI HOAFNG HAJ S007 NOOSI TOWSI CHISN S008 LIEEN LAJC THAAFY QUAAN S009 LIEEN LAJC HOAFNG HAJ S010 LIEEN LAJC QUAAN
Trang 11Step 4 – Creating a Transcription Files
Words.mlf
#!MLF!#
"S001.lab"
NOOSI MASY TOWSI TASM BA
"S002.lab"
GOJI QUAAN
Trang 12Step 4 – Creating a Transcription Files
HLEd.exe -l '*' -d dict.txt –i phones0.mlf
<S>
I
sp
MA
<S>
Y
sp
TOW
…
Trang 13Step 5 - Coding the Data
wav2mfc.scp
S001.wav S001.mfc S002.wav S002.mfc S003.wav S003.mfc S004.wav S004.mfc
…
Trang 14Step 5 - Coding the Data
• WAVEFORM sampled waveform
• LPC linear prediction filter coe±cients
• LPREFC linear prediction reflection
coe±cients
• LPCEPSTRA LPC cepstral coe±cients
• LPDELCEP LPC cepstra plus delta
• USER user defined sample kind
• DISCRETE vector quantised data
• E has energy
• N absolute energy suppressed
• D has delta coeffcients
• A has acceleration coeffcients
Trang 15Creating Monophone HMMs
• Step 6 – Creating Flat Start Monophones
• Step 7 – Fixing the Silence Models
• Step 8 – Realigning the Training Data
Trang 16Step 6 – Creating Flat Start Monophones
HCompV -C config_HCompV.txt
-f 0.01 -m -S train.scp -M hmm0 proto.txt
SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T
PREEMCOEF = 0.97 NUMCHANS = 26
CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F
Trang 18A Re-Estimation Tool - HERest
HERest -C config -I phones0.mlf
-t 250.0 150.0 1000.0
-S train.scp
-H hmm0/macros -H hmm0/hmmdefs -M hmm1
monophones0
HERest.exe [options] hmmList trainFile .
The flat start monophones stored in the directory hmm0
are re-estimated using HERest:
Trang 19Step 7 – Fixing the Silence Models
Trang 20Step 7 – Fixing the Silence Models
1 HERest x 2 for monophones0
<VARIANCE> 39 9.946199e+000 1.149288e+001
<VARIANCE> 39 5.828240e+000 7.320161e+000
<GCONST> 8.172852e+001
<TRANSP> 5 .
Trang 21Step 8 – Realigning the Training Data
-a
-H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0
-y lab -I words.mlf -S train.scp
dict.txt monophones1
• multiple pronunciations
Trang 22Step 8 – Realigning the Training Data
Trang 23Creating Tied-State Triphones
• Step 9 – Making triphones from Monophones
• Step 10 – Making Tied-state Triphones
Trang 24Step 9 – Making triphones from Monophones
mktri.led aligned.mlf
triphone_cross word
triphone within word:
“sil b i t sp b u t sil”
t-b+u b-u+t u-t+sil sil”
Trang 25Word Netword Expansion
FORCECXTEXP = F ALLOWXWRDEXP = F
FORCECXTEXP = T ALLOWXWRDEXP = F
FORCECXTEXP = T
Trang 26Step 9 – Making triphones from Monophones
HHEd -H hmm9/macros
-H hmm9/hmmdefs -M hmm10
Trang 27Step 10 – Making Tied-state Triphones
HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1
Trang 28Step 10 – Making Tied-state Triphones
fulllist: monophones + biphones + triphones
Trang 29Recogniser Evaluation Step 11 – Recogning the Test Data
HVite.exe -C config_hvite
-H hmm15/macros -H hmm15/hmmdefs -S test.scp
rec_out.mlf
====================== HTK Results Analysis ================ Date: Thu Dec 01 11:42:28 2005
Ref : words.mlf
Rec : rec_out.mlf
- Overall Results SENT: %Correct=83.33 [H=15, S=3, N=18]
Trang 30N
Trang 31Mixture Incrementing
-H hmm15/hmmdefs -M hmm16
<VARIANCE> 39 7.328565e+000 5.521523e+000
<MIXTURE> 2 5.000000e-001
<MIXTURE> 3 2.500000e-001
Trang 32Adapting the HMMs
• Step 12 – Preparation of the Adaptation Data
• Step 13 – Generating the Transforms
• Step 14 – Evaluation of the Adapted System
Trang 33Step 12 – Preparation of the Adaptation Data
The same as step 3, 4 and 5:
1 Prompt lists will be generated using HSGen
HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsADapt.txt
HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsTest.txt
2 Record the associated speech from the new user.
3 Both sets of speech can then be coded using HCopy
HCopy.exe –C config –S codeAdapt.scp
HCopy.exe –C config –S codeTest.scp
4 Both transcriptions are obtained using prompts2mlf
perl script.
5 Using HVite to perform a forced alignment of the
adaptation data to minimize the problem of multiple
pronuciations.
Trang 34Step 13 – Generating the Transforms
Create a regression class tree to cluster mixture
HHed -H hmm15/macros -H hmm15/hmmdefs -M hmm16 regtree.hed tiedlist
Generate a global transform
~r “rtree_32“
<REGTREE> 32
<NODE> 1 2 3
N: vecsizeglobal.tmf: a global transformrc.tmf: K transforms
Trang 35Step 13 – Generating the Transforms
1
a binary regression tree with four base classes
Trang 36Step 14 – Evaluation of the Adapted System
-p 0.0 -s 5.0 dict.txt tiedlist
HResults -f -t -I testWords.mlf
tiedlist rec_out_adapt.mlf
A speech corpus is very important and useful!
20hours DTNVN broadcast news is avaliable,
Trang 37The Gram of the PaintDemo
Trang 38The Gram of the PaintDemo
FN_LINE
!NULL
TỚI
SỐ TỪ
Trang 39The Gram of the PaintDemo
“HÃY VẼ ĐOẠN THẲNG
TỪ ĐIỂM MỘT HAI TỚI ĐIỂM BA BỐN”
“HÃY VẼ ĐOẠN THẲNG
[TỪ ĐIỂM
[MỘT]X1 [HAI]Y1
TỚI ĐIỂM
[BA]X2 [BỐN]Y2 ]LINE”
I=5 L=FN_LINE s=LINE
I=6 L=FN_CIRCLE s=CIRCLE
J=0 S=0 E=1
J=1 S=0 E=2
Trang 40The Gram of the PaintDemo
wav
mfcc
phrase
Trang 41HLStats -b bigfn -o wlist words.mlf
HBuild -n bigfn wlist wdnet_bigram