1. Trang chủ
  2. » Giáo án - Bài giảng

The hidden markov model toolkit

41 886 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 41
Dung lượng 597,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Training StrategyMonophone Training Making Triphones from Monophones Unclustered Triphone Training Making Tied-state Triphones N... Step 2 - the DictionaryHDMan.exe -m -w wlist -n monoph

Trang 1

ATK-HTK

Trang 2

An A pplication T ool K it for HTK

Trang 3

The H idden Markov Model T ool k it

Trang 4

Training Strategy

Monophone Training

Making Triphones from Monophones

Unclustered Triphone

Training

Making Tied-state Triphones

N

Trang 5

Data Preparation

• Step 1 - the Task Grammar

• Step 2 - the Dictionary

• Step 3 - Recording the Data

• Step 4 - Creating a Transcription Files

• Step 5 - Coding the Data

Trang 6

Step 1 - the Task Grammar

Gram.txt

$digit = MOOJT | HAI | BA | BOOSN | NAWM |

SASU | BARY | TASM | CHISN | KHOONG;

$name = [ THAAFY ] QUAAN |

[ HOAFNG ] HAJ;

( SENT-START ( NOOSI [MASY] TOWSI [SOOS] <$digit> |

(LIEEN LAJC | GOJI) $name ) SENT-END )

Trang 7

Step 1 - the Task Grammar

I=25 W=SENT-START

I=24 W=NOOSI

I=0 W=SENT-END

I=26 W=!NULL

J=60

J=61

Trang 8

Step 1 - the Task Grammar

Trang 9

Step 2 - the Dictionary

HDMan.exe -m -w wlist -n monophones1

-l dlog dict beep names

Trang 10

Step 3 - Recording the Data

S006 GOJI HOAFNG HAJ S007 NOOSI TOWSI CHISN S008 LIEEN LAJC THAAFY QUAAN S009 LIEEN LAJC HOAFNG HAJ S010 LIEEN LAJC QUAAN

Trang 11

Step 4 – Creating a Transcription Files

Words.mlf

#!MLF!#

"S001.lab"

NOOSI MASY TOWSI TASM BA

"S002.lab"

GOJI QUAAN

Trang 12

Step 4 – Creating a Transcription Files

HLEd.exe -l '*' -d dict.txt –i phones0.mlf

<S>

I

sp

MA

<S>

Y

sp

TOW

Trang 13

Step 5 - Coding the Data

wav2mfc.scp

S001.wav S001.mfc S002.wav S002.mfc S003.wav S003.mfc S004.wav S004.mfc

Trang 14

Step 5 - Coding the Data

WAVEFORM sampled waveform

LPC linear prediction filter coe±cients

LPREFC linear prediction reflection

coe±cients

LPCEPSTRA LPC cepstral coe±cients

LPDELCEP LPC cepstra plus delta

USER user defined sample kind

DISCRETE vector quantised data

• E has energy

• N absolute energy suppressed

• D has delta coeffcients

• A has acceleration coeffcients

Trang 15

Creating Monophone HMMs

• Step 6 – Creating Flat Start Monophones

• Step 7 – Fixing the Silence Models

• Step 8 – Realigning the Training Data

Trang 16

Step 6 – Creating Flat Start Monophones

HCompV -C config_HCompV.txt

-f 0.01 -m -S train.scp -M hmm0 proto.txt

SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T

PREEMCOEF = 0.97 NUMCHANS = 26

CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

Trang 18

A Re-Estimation Tool - HERest

HERest -C config -I phones0.mlf

-t 250.0 150.0 1000.0

-S train.scp

-H hmm0/macros -H hmm0/hmmdefs -M hmm1

monophones0

HERest.exe [options] hmmList trainFile .

The flat start monophones stored in the directory hmm0

are re-estimated using HERest:

Trang 19

Step 7 – Fixing the Silence Models

Trang 20

Step 7 – Fixing the Silence Models

1 HERest x 2 for monophones0

<VARIANCE> 39 9.946199e+000 1.149288e+001

<VARIANCE> 39 5.828240e+000 7.320161e+000

<GCONST> 8.172852e+001

<TRANSP> 5 .

Trang 21

Step 8 – Realigning the Training Data

-a

-H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0

-y lab -I words.mlf -S train.scp

dict.txt monophones1

• multiple pronunciations

Trang 22

Step 8 – Realigning the Training Data

Trang 23

Creating Tied-State Triphones

• Step 9 – Making triphones from Monophones

• Step 10 – Making Tied-state Triphones

Trang 24

Step 9 – Making triphones from Monophones

mktri.led aligned.mlf

triphone_cross word

triphone within word:

“sil b i t sp b u t sil”

t-b+u b-u+t u-t+sil sil”

Trang 25

Word Netword Expansion

FORCECXTEXP = F ALLOWXWRDEXP = F

FORCECXTEXP = T ALLOWXWRDEXP = F

FORCECXTEXP = T

Trang 26

Step 9 – Making triphones from Monophones

HHEd -H hmm9/macros

-H hmm9/hmmdefs -M hmm10

Trang 27

Step 10 – Making Tied-state Triphones

HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1

Trang 28

Step 10 – Making Tied-state Triphones

fulllist: monophones + biphones + triphones

Trang 29

Recogniser Evaluation Step 11 – Recogning the Test Data

HVite.exe -C config_hvite

-H hmm15/macros -H hmm15/hmmdefs -S test.scp

rec_out.mlf

====================== HTK Results Analysis ================ Date: Thu Dec 01 11:42:28 2005

Ref : words.mlf

Rec : rec_out.mlf

- Overall Results SENT: %Correct=83.33 [H=15, S=3, N=18]

Trang 30

N

Trang 31

Mixture Incrementing

-H hmm15/hmmdefs -M hmm16

<VARIANCE> 39 7.328565e+000 5.521523e+000

<MIXTURE> 2 5.000000e-001

<MIXTURE> 3 2.500000e-001

Trang 32

Adapting the HMMs

• Step 12 – Preparation of the Adaptation Data

• Step 13 – Generating the Transforms

• Step 14 – Evaluation of the Adapted System

Trang 33

Step 12 – Preparation of the Adaptation Data

The same as step 3, 4 and 5:

1 Prompt lists will be generated using HSGen

HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsADapt.txt

HSGen.exe -l -n 10 wdnet.txt dict.txt >> promptsTest.txt

2 Record the associated speech from the new user.

3 Both sets of speech can then be coded using HCopy

HCopy.exe –C config –S codeAdapt.scp

HCopy.exe –C config –S codeTest.scp

4 Both transcriptions are obtained using prompts2mlf

perl script.

5 Using HVite to perform a forced alignment of the

adaptation data to minimize the problem of multiple

pronuciations.

Trang 34

Step 13 – Generating the Transforms

Create a regression class tree to cluster mixture

HHed -H hmm15/macros -H hmm15/hmmdefs -M hmm16 regtree.hed tiedlist

Generate a global transform

~r “rtree_32“

<REGTREE> 32

<NODE> 1 2 3

N: vecsizeglobal.tmf: a global transformrc.tmf: K transforms

Trang 35

Step 13 – Generating the Transforms

1

a binary regression tree with four base classes

Trang 36

Step 14 – Evaluation of the Adapted System

-p 0.0 -s 5.0 dict.txt tiedlist

HResults -f -t -I testWords.mlf

tiedlist rec_out_adapt.mlf

A speech corpus is very important and useful!

20hours DTNVN broadcast news is avaliable,

Trang 37

The Gram of the PaintDemo

Trang 38

The Gram of the PaintDemo

FN_LINE

!NULL

TỚI

SỐ TỪ

Trang 39

The Gram of the PaintDemo

“HÃY VẼ ĐOẠN THẲNG

TỪ ĐIỂM MỘT HAI TỚI ĐIỂM BA BỐN”

“HÃY VẼ ĐOẠN THẲNG

[TỪ ĐIỂM

[MỘT]X1 [HAI]Y1

TỚI ĐIỂM

[BA]X2 [BỐN]Y2 ]LINE”

I=5 L=FN_LINE s=LINE

I=6 L=FN_CIRCLE s=CIRCLE

J=0 S=0 E=1

J=1 S=0 E=2

Trang 40

The Gram of the PaintDemo

wav

mfcc

phrase

Trang 41

HLStats -b bigfn -o wlist words.mlf

HBuild -n bigfn wlist wdnet_bigram

Ngày đăng: 31/01/2015, 12:12

TỪ KHÓA LIÊN QUAN

w