The hidden markov model toolkit

Training StrategyMonophone Training Making Triphones from Monophones Unclustered Triphone Training Making Tied-state Triphones N... Step 2 - the DictionaryHDMan.exe -m -w wlist -n monoph

Trang 1

ATK-HTK

Trang 2

An A pplication T ool K it for HTK

Trang 3

The H idden Markov Model T ool k it

Trang 4

Training Strategy

Monophone Training

Making Triphones from Monophones

Unclustered Triphone

Training

Making Tied-state Triphones

N

Trang 5

Data Preparation

• Step 1 - the Task Grammar

• Step 2 - the Dictionary

• Step 3 - Recording the Data

• Step 4 - Creating a Transcription Files

• Step 5 - Coding the Data

Trang 6

Step 1 - the Task Grammar

Gram.txt

SASU | BARY | TASM | CHISN | KHOONG;

$name = [ THAAFY ] QUAAN |

[ HOAFNG ] HAJ;

( SENT-START ( NOOSI [MASY] TOWSI [SOOS] <$digit> |

(LIEEN LAJC | GOJI) $name ) SENT-END )

Trang 7

I=25 W=SENT-START

I=24 W=NOOSI

I=0 W=SENT-END

I=26 W=!NULL

…

J=60

J=61

Trang 8

Trang 9

Step 2 - the Dictionary

HDMan.exe -m -w wlist -n monophones1

-l dlog dict beep names

Trang 10

Step 3 - Recording the Data

S006 GOJI HOAFNG HAJ S007 NOOSI TOWSI CHISN S008 LIEEN LAJC THAAFY QUAAN S009 LIEEN LAJC HOAFNG HAJ S010 LIEEN LAJC QUAAN

Trang 11

Step 4 – Creating a Transcription Files

Words.mlf

#!MLF!#

"S001.lab"

NOOSI MASY TOWSI TASM BA

"S002.lab"

GOJI QUAAN

Trang 12

Step 4 – Creating a Transcription Files

HLEd.exe -l '*' -d dict.txt –i phones0.mlf

<S>

I

sp

MA

<S>

Y

sp

TOW

…

Trang 13

Step 5 - Coding the Data

wav2mfc.scp

S001.wav S001.mfc S002.wav S002.mfc S003.wav S003.mfc S004.wav S004.mfc

…

Trang 14

Step 5 - Coding the Data

• WAVEFORM sampled waveform

• LPC linear prediction filter coe±cients

• LPREFC linear prediction reflection

coe±cients

• LPCEPSTRA LPC cepstral coe±cients

• LPDELCEP LPC cepstra plus delta

• USER user defined sample kind

• DISCRETE vector quantised data

• E has energy

• N absolute energy suppressed

• D has delta coeffcients

• A has acceleration coeffcients

Trang 15

Creating Monophone HMMs

• Step 6 – Creating Flat Start Monophones

• Step 7 – Fixing the Silence Models

• Step 8 – Realigning the Training Data

Trang 16

Step 6 – Creating Flat Start Monophones

HCompV -C config_HCompV.txt

-f 0.01 -m -S train.scp -M hmm0 proto.txt

SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T

PREEMCOEF = 0.97 NUMCHANS = 26

CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

Trang 18

A Re-Estimation Tool - HERest

HERest -C config -I phones0.mlf

-t 250.0 150.0 1000.0

-S train.scp

-H hmm0/macros -H hmm0/hmmdefs -M hmm1

monophones0

HERest.exe [options] hmmList trainFile .

The flat start monophones stored in the directory hmm0

are re-estimated using HERest:

Trang 19

Step 7 – Fixing the Silence Models

Trang 20

Step 7 – Fixing the Silence Models

1 HERest x 2 for monophones0

<VARIANCE> 39 9.946199e+000 1.149288e+001

<VARIANCE> 39 5.828240e+000 7.320161e+000

<GCONST> 8.172852e+001

<TRANSP> 5 .

Trang 21

Step 8 – Realigning the Training Data

-a

-H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0

-y lab -I words.mlf -S train.scp

dict.txt monophones1

• multiple pronunciations

Trang 22

Step 8 – Realigning the Training Data

Trang 23

Creating Tied-State Triphones

• Step 9 – Making triphones from Monophones

• Step 10 – Making Tied-state Triphones

Trang 24

Step 9 – Making triphones from Monophones

mktri.led aligned.mlf

triphone_cross word

triphone within word:

“sil b i t sp b u t sil”

t-b+u b-u+t u-t+sil sil”

Trang 25

Word Netword Expansion

FORCECXTEXP = F ALLOWXWRDEXP = F

FORCECXTEXP = T ALLOWXWRDEXP = F

FORCECXTEXP = T

Trang 26

Step 9 – Making triphones from Monophones

HHEd -H hmm9/macros

-H hmm9/hmmdefs -M hmm10

Trang 27

Step 10 – Making Tied-state Triphones

HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1

Trang 28

Step 10 – Making Tied-state Triphones

fulllist: monophones + biphones + triphones

Trang 29

Recogniser Evaluation Step 11 – Recogning the Test Data

HVite.exe -C config_hvite

-H hmm15/macros -H hmm15/hmmdefs -S test.scp

rec_out.mlf

====================== HTK Results Analysis ================ Date: Thu Dec 01 11:42:28 2005

Ref : words.mlf

Rec : rec_out.mlf

- Overall Results SENT: %Correct=83.33 [H=15, S=3, N=18]

Trang 30

N

Trang 31

Mixture Incrementing

-H hmm15/hmmdefs -M hmm16

<VARIANCE> 39 7.328565e+000 5.521523e+000

<MIXTURE> 2 5.000000e-001

<MIXTURE> 3 2.500000e-001

Trang 32

Adapting the HMMs

• Step 12 – Preparation of the Adaptation Data

• Step 13 – Generating the Transforms

• Step 14 – Evaluation of the Adapted System

Trang 33

Step 12 – Preparation of the Adaptation Data

The same as step 3, 4 and 5:

1 Prompt lists will be generated using HSGen

HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsADapt.txt

HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsTest.txt

2 Record the associated speech from the new user.

3 Both sets of speech can then be coded using HCopy

HCopy.exe –C config –S codeAdapt.scp

HCopy.exe –C config –S codeTest.scp

4 Both transcriptions are obtained using prompts2mlf

perl script.

5 Using HVite to perform a forced alignment of the

adaptation data to minimize the problem of multiple

pronuciations.

Trang 34

Step 13 – Generating the Transforms

Create a regression class tree to cluster mixture

HHed -H hmm15/macros -H hmm15/hmmdefs -M hmm16 regtree.hed tiedlist

Generate a global transform

~r “rtree_32“

<REGTREE> 32

<NODE> 1 2 3

N: vecsizeglobal.tmf: a global transformrc.tmf: K transforms

Trang 35

Step 13 – Generating the Transforms

1

a binary regression tree with four base classes

Trang 36

Step 14 – Evaluation of the Adapted System

-p 0.0 -s 5.0 dict.txt tiedlist

HResults -f -t -I testWords.mlf

tiedlist rec_out_adapt.mlf

A speech corpus is very important and useful!

20hours DTNVN broadcast news is avaliable,

Trang 37

The Gram of the PaintDemo

Trang 38

FN_LINE

!NULL

TỚI

SỐ TỪ

Trang 39

“HÃY VẼ ĐOẠN THẲNG

TỪ ĐIỂM MỘT HAI TỚI ĐIỂM BA BỐN”

“HÃY VẼ ĐOẠN THẲNG

[TỪ ĐIỂM

[MỘT]X1 [HAI]Y1

TỚI ĐIỂM

[BA]X2 [BỐN]Y2 ]LINE”

I=5 L=FN_LINE s=LINE

I=6 L=FN_CIRCLE s=CIRCLE

J=0 S=0 E=1

J=1 S=0 E=2

Trang 40

wav

mfcc

phrase

Trang 41

HLStats -b bigfn -o wlist words.mlf

HBuild -n bigfn wlist wdnet_bigram

Định dạng
Số trang	41
Dung lượng	597,5 KB