1. Trang chủ
  2. » Thể loại khác

Phân cụm trực quan tập các bài báo khoa học theo mô hình nguyên tử trong không gian ba chiều

8 2 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Visually clustering research papers using an atom model in 3D space
Chuyên ngành Khoa học máy tính
Thể loại Bài báo khoa học
Định dạng
Số trang 8
Dung lượng 382,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

THEO MO HINH NGUYEN TU^ TRONG KHONG GIAN BA CHIEU Visually clustering research papers using an atom model in 3D space Nghi VTnh Khanh' Tom tat Bdi bdo ndy de xudt mot each tiep can mai d

Trang 1

THEO MO HINH NGUYEN TU^ TRONG KHONG GIAN BA CHIEU Visually clustering research papers using an atom model in 3D space

Nghi VTnh Khanh'

Tom tat

Bdi bdo ndy de xudt mot each tiep can mai di

xdy dung mdt he thdng phdn cum true quan bang

hinh dnh 3D cho viec phdn nhdm cdc bdi bdo khoa

hgc bdng cdch sic dung kit hgp hai ky thudt trong

ITnh vuc tri tue nhdn tgo la SOM vd k-means Cu

the, SOM se ddng vai tro cho viec cung cdp mgt

hinh dnh true quan de quyet dinh tham sd K cho

thudt todn k-means tiep theo Mdt thiet ke dd thi si

the hi^n cdc cum md mdi cum duac dai dien bdi

mgt hgt nhdn (tdm cita cum) va cdc dien tic (cdc

hai bdo) bao quanh Cdc dien tie se quay quanh hgt

nhdn bdng cdc luc hdp ddn Ben cgnh do, chimg toi

sic dung ky thugt ArcBall trong tinh vuc do hga mdy

tinh ba chieu de ho trg su ticcmg tac ngudi dung

Dua tren he thdng ndy, ngudi dung cd the thuc

hi^n danh gid sic thdng nhdt ve cdu triic cum theo

cdch dan gidn han cdc phuang phdp truac ddy

Tic khda: phdn cum, true quan hda, tri tue nhdn

tgo, dd hoa ba chieu, tuang tdc ngudi diing

This paper proposes a new approach to construct a visually clustering system with 3D image for scientific papers by using two combined techniques in the field of artificial intelligence, which are SOM and k -means Specifically, the SOM will play an important role in providing a visual image in order to determine parameter K for k-means algorithm in the next step A graph layout is designed to show the clusters, each of which is represented by an atomic nucleus (the center of cluster) and electron (the papers) around The electron will orbit the nucleus by the force of gravity In addition, ArcBall techniques (in 3D computer graphics field) are used to support user interaction Based on this system, users are able to evaluate the unification of Cluster's structure in a simpler way than in the previous ones Key words: clustering, visualization, artificial intelligence, 3D computer graphic, user interaction

1 Gioi tbieu

Sy phan cum la mgt trong nhung ky thuat rat

can thidt cho viec kham phd tri thdc nhdn loai No

giup cho chung ta tach cac nhdm dii tugng td tap

dft iieu dya tren cac dac tinh tuong ding trong

nhom Ngay nay, cac ky thuat phan cum dugc sd

dung rong rai trong cac dng dung nhu khai pha dd

lieu, xir ly anh, nhan dang mau, thong kd, tin sinh

hoc va cac Imh vuc khac Ben canh do, ap dyng cac

k j thuat true quan trong viec phdn tich cum du lieu

rat quan trgng trong viec thd hi?n xu hudng cua

cac tgp du lieu, no cho chiing ta cai nhin tong quan

cung nhir sy hilu bilt chi tilt ve tap dd iieu Hien

nay, nhiiu nghien cdu da va dang tap trung ve van

dl phan tich tryc quan cac cym rat thanh cong nhu

Grand Tour, OPTICS, HD-EYE, H-BLOB, Star

Coordinate (Ankerst, Mihael; Grinstein, Georges;

Keim, Daniel;, 2002), SOM-based techniques

(Kohoren, 1997), H0V3 (Zhang, Ke-Bing; Orgun,

Mehmet A; Zhang, Kang;, 2006)

' nac SI Ban Phdt tnen H? thong CNTT, Trucmg DH Tra Vmh

Trong bai bao nay, chung tdi de xuat mgt thiet ke do hga true quan ba chieu ggi la mo hinh cau tnic nguyen tu cho viec phan cum du Heu Mo hinh nay su dung giai thuat SOM de udc lugng

so cac cum cdn dugc tach ra tii tap cac tai lieu noi chung va cac bai bao khoa hoc noi ridng (dugc viet bang tieng Anh) co dinh dang PDF Sau do, dua tren mo hinh khong gian vector, cu the la vector tf-idf, chung toi su dung thudt toan k-means dl tach tap dii lieu thanh k nhom Cuoi ciing cac cum se dugc tryc quan hda thanh d^ng cdc cau true nguyen tu trong khong gian ba chieu Dd don gian, chiing toi to chdc mgt nguyen tu co toi da nam muc nang luong dugc tinh bang do tuang ddng Cosin giua cac vector dien td va hat nhan Cach tiep can ciia bai bao nay se giai quyit dugc cac van de sau:

- Thu nhdt, hien thi dugc tap cac vector nhieu chieu (Ion hon 1000) trong khong gian ba chieu, trong do the hien ro sy phan phdi diJ lieu, mdi quan

he giffa moi tam ciia cac cum cung nhu giua cac bai bao khoa hgc vdi nhau

Trang 2

tranh dirge lya chon tuy y cac cum k bdi sy ket hgp

cua SOM va K-means Do do, mo hinh nay cung

cap cho ngudi diing mgt phucmg phap tryc quan c6

muc dich va co hieu qua vao viec phan tich cluster

- Thu ba, CO the vugt qua gidi han cua khong

gian khi so sanh vdi cac phuang phap 2D trude

do bang each sir dung ky thuat arcball de tuong

tac No tao cho chiing ta cam giac nhap vai vao he

thong va thao tac tdng dii tugng giong nhu chai

game 3D hay sd dung cac he thong thuc te ao - vi

du CAVE (Cruz-Neira C; Sandin D.; DeFanti T;

Kenyon R.; Hart J.;, 1992)

Trong cac phan sau, chiing toi td chdc cau tnic

bai bao nhu sau; phan 2 - tong quan cac ket qua

nghien cuu tnrdc day, phan 3 - phuang phap thyc

hien, phan 4 - danh gia ket qua va phan 5 - ket luan

2 Tong quan cac ket qua nghien cuu trirdc day

Dd xay dung dugc bg cong cu phan tich cac

cum Uyc quan, nhieu ky thuat da dugc nghien

Clin cho cac qua trinh bieu thi tryc quan cac doi

tugng tu mgt tap du lieu len man hinh may tinh

Point-based techniques, line-based technique,

region-based technique, hierarchical techniques

(Ward, Matthew; Grinstein, Georges; Keim,

Daniel;, 2010) la nhung ky thuat pho bidn dung

de true quan hoa cac tap du: lieu nhieu chieu Tuy

nhien, hdu hit chiing deu gap kho khan khi true

quan cac tap du lieu kha Idn cung nhu du lieu

CO chiiu cua vector rat cao Mgt vai ky thuat bi

gidi han trong viec cung cap mot sy nhan thuc

ro rang tir cac dang true quan cho ngudi dung

cum true quan da dugc phat trien, chdng han nhu Grand Tour, OPTICS, HD-EYE, H-BLOB, Fastmap, Star Coordinate, SOM-based techniques, H0V3, Nhin chung, cac ky thuat nay gop phan quan trgng trong viec phan tich cac cum va c6 the giai quyet dugc cac khia canh quan trgng cua viec nhan thuc true quan:

- True quan cac du lieu Idn va da chieu;

- Cung cdp mgt cai nhin tong quan ro rang va chi tiet ve cau true cum;

- Co do phue tap tinh toan tuyen tinh tren viec anh xa dii lieu tif khong gian chieu cao sang chiiu khdng gian thap hon;

~ Ho trg tuong tac dgng vdi cac da! dien tryc quan cua cum;

- Ket noi kien thuc lidn quan ciia cac chuydn gia vao ITnh vyc tham dd vao cum;

- Cho ngudi diing cac chi dan co muc dich va chinh xac cua viec khao sat/dieu tra cac cum cung nhu hgp le hoa cac cum chu khong phai chi doTi gian la tham do cum ngau nhien

Hau het cac ky thuat tren giai quyit dugc cac yeu cdu nay nhung chung van con han chd khi kich thudc va chieu cua tap du lidu kha Idn Hon niia, mgt vai ky thuat tren gap kho khan khi cung cdp mgt cai nhin tong quan sang sua ciia cdu tnic cua cum ciing nhu mdc do de su dung danh cho cac ngudi diing

3 Fhtr<mg phap thyc hien Nhu da gidi thieu, de giai quyet cac ban che tren, chiing toi de xuat giai phap tryc quan bdng mgt thidt kd d l thi tryc quan ba chidu co h i trg tuang tac dya theo cau tnic nguyen td Cac budc thyc hidn nhu sau:

Hinh 1 Qud trinh trtfc quan thong tin (Ware, 2004)

Trang 3

npcsc

document l i y text tu file Pdf

3E

JL

Slanmlttg

J^ diptg ma Ir^n

TF.IDF

Gi&m chi€u cua veaor TF-IDF (X.at«it Semaotic

iQdexing)

neuron 3D-S0M

Gn}A layoiit

cbo MS Uvk ci«

trdc Dgny^D t&

HilotbfvJitiKnQtic

//?«/( 2 Quy trinh x6y drnig he thong

Bud'c tien xir ly:

Sau khi tach cac ky ty tir file dinh dang PDF,

nhiem vu tidp theo la che bien tu vyng Ve co ban,

chung ta c§n ba hoat dgng: loai bo cac tu vo nghTa

hoac tir khong mang thong tin trong ngiJ canh can

xem xet (stop-word), chuyen cac tir vd dang goc

(steamming), va tinh trgng s6 cua tung tu so vdi

cac tu khac (term weighting)

Cac budc loai bo "Stop Words" va "Steamming"

se giup chiing ta giam kich thudc ciia tap tu vyng,

do do se tiet kiem dugc ngudn tai nguyen tinh toan

Bdi vi tap cac bai bao khoa hgc dau vao dugc \ iet

bang tieng Anh nen no khong kho de ap dung giai

thuat tim gdc tir; cu the giai thuat Porter stemming

(Porter, 1980) hidn dugc su dung rat hieu qua cho

mgt sd ngon ngir nhu tieng Anh mac dii chua ho

trg dugc nhidu ngon ngd trdn the gidi Chimg ta se

that sy gap kho khan neu tap cac bai bao dugc viet

bang tieng Viet vi can co nhieu nghien cihi chuyen

sau ve ngdn ngir ty nhidn cua tieng Viet de tim

dugc tir g6c ciia chiing

Kdt qua sau khi loai bo "Stop Words" va

"Steamming" ciia tat cd cac tu vyng trong tat ca

cac van ban, ta se xac djnh dugc mgt tap hgp duy

nhat cac tu vyng, ggi-la Bag-Of-Word Tiep theo,

chiing ta se tinh trgng so cua cac tu nay (terni

weighting) De xac djnh trgng s6 ciia moi td vyng,

chiing tdi six dung mgt cong thirc rdt ph6 bidn de

tinh dai lugng Term Frequency Invert Document

Frequency (TFIDF) (Manning, Christopher D.; Raghavan, Prabhakar; Schutze, Hinrich;, 2009)

Trong do: tf(w): term trequency (so lan tu w nay xudt hien trong mgt tai lieu), df(w): document frequency (so lugng tai lieu chua dung tu w nay), N: Tong so tai lieu

Dai lugng tfidf(w) noi len sy quan dgng ciia tu

\\ trong tai lieu Tir cong thuc nay, chiing ta tien hanh tinh gia tri ciia ma tran TFIDF Trong do, m6i hang la dai didn mgt tai lieu, cac cot la gia tri tfidf cua cac tu trong tap Bag-Of-Words

Bdi vi kich thudc ciia ma tran TFIDF c6 thd rat Idn ( b ^ g M tai lieu x N term trong Bag-Of-Words) Thuc td, neu ta c6 mgt tap g6m 100 bai bao khoa hgc cua ciing mgt ITnh vuc nghien cihj va moi bai bao khoang 10 Uang thi ma tran TFIDF co the

CO kich thudc la 100x10000 Can chu y ring ndu

ta chi diing cac keyword hay chi xac dinh Bag-Of-Words tu trong phan Abstract ciia bai bao (nhdm nit ggn kich thudc ma tran nay) thi v6 mat thdng

kd cung nhu ngir nghTa se dem lai mgt kdt qua khdng chfnh xac cho sy khac biet ngi dung giira cac bdi bao Cd nhieu phuong phap dugc diing cho vide giam kich thudc ciia ma tran TFIDF, trong do

ky thuat Latent semantic indexing analysis - LSI

Trang 4

Schutze, Hinrich;, 2009) dugc su dung kha pho Of-Words co chidu la 5 (ship, boat, ocean, voyage, bien No la mgt ky thuat thong ke nham co gan

udc lugng cdu tnic ngi dung dugc an ben trong van

ban bang each sir dung ky thuat dai so tuyen tinh

Singular-Value-Decomposition

LSI rdt hieu qua trong vide giam chieu ciia tap

trip) Sau khi su dung ky thuat LSI de giam chidu tu

5 thanh 2 thi cac chidu mdi la " 1 " va "2" se khong con mang y nghTa tucmg ling cua "ship", "boat",

"ocean", "voyage", "Uip" Dieu do cd nghTa la chung

ta khong the thyc thi truy van de tim thugc tinh ban dau, vi dy tir "ship" trong ma tran da giam chidu dit lidu Tuy nhidn, vide su dung ky thuat nay se gap jhyc te, thi da cd nhieu nghidn cuu dd giai quydt

trd ngai khi ta muon thyc hien truy van tim kiem tu vdn dB nay, tuy nhien chiing toi khong dh cap den

trong ma tran TFIDF Vi du tu hinh trdn, xet trudng do ndm ngoai pham vi nghien cuu cua bdi bao nay

ship

ocean

voyage

trip

rfl

i

u

1

0

<f:

0

1

0

<JA

1

0

u

0

dl

y

0

1

d^

t)

0

1

d&

ti

0

2 -0.46

0.35 0.65

the term-document nutrix C tliustratcs the documents hi ( f ) ^ in two dimensions

Hinh 3 Rut gon ma trgn C tic 5D thanh 2D bhng LSI

Sau qua trinh tidn xu ly, chiing ta tidp tuc tien cum thi c6 y nghia la chiing gdn giii, tinh tuong hanh mo hinh hoa va tryc quan cac tai lieu dua trdn ddng gdn nhau

ma tran TFIDF Cu the theo trinh ty nhu sau:

- Xay dyng mang Na ron nhan tao

Self-ogranizing Map ba chidu (3D-S.0.M)

- Xay dyng Graph layout cho 3D-S.O.M va

hien thi, tu do giiip ngudi dimg can nhac chgn so

nhom can phan cum, day chinh la thong so k diing

cho gidi thuat phan cum k-means

- Xdy dyng graph layout cho m6 hinh Chi tiet se dugc trinh bay sau day

Xay duTig 3D-SOM Chung toi xdy dyng mgt ludi ba chidu de thd hien cac noron Moi ncrron co tga do {X,Y,Z) , vector (cd ciing chieu vdi cac vector TFIDF cua

- Ap dung giai thuat k-means de phan cac tap dd lieu, cu the la cung chieu vdi vector Bag-vector trong ma tran TFIDF (dai dien cho moi bai Of-Word) va trgng s6 Vi du hinh dudi Id mgt ludi bao khoa hgc) thanh K cum Cac bai bao trong moi cd 27 noron (3x3x3)

Hinlt 4 3D SOM voi 3x3x3 na ron (tcdi) vd 6 winners (mdu ^o-phdi)

Dau tidn, chiingta gan tga do duy nhat (Idy 3D-S0M se gap han che vd vdn dh thdi gian

tu 3D-Grid) cho moi noron Trgng s6 cua noron thyc thi khi chieu cua vector khd Idn cung nhu Wj_jjj(dl,d2,d3, ,dn)cdgiatringdunhientrong do dn djnh cua Mang phy thugc vao sd Idn

khoang(0,1) I^p ^^^ ^j^j-^^ ^^ ^^ ,^j ^ y -,^ ^^^^ g^j^_

Tiep theo, chiing ta se huan luyen mdi noron tu Ogranizing Map cd thd phan loai dir lieu ma tap hgp cac mau (ma tran TFIDF) theo giai thuat khong cdn phai hudn luyen lai Mang khi ma SOM (Kohoren, 1997) no da on dinh

Trang 5

3D-S0M, dya vao hinh anh tryc quan nhin thay

dugc, chiing ta co thd udc lugng dugc gia tri K (so

nhom can dugc phan cum) cho giai thuat phan cum

K-means trong budc tidp theo Theo hinh tren, ta

CO thd de dang chi ra K = 3 cho tap dir lieu

Hinh S Lu&i 3D-S0M 10x10x10 cua 25

Mo hinh ciia chung toi su dung dang cau tnic

nguyen tu cho vide hien thi va tuong tac, vi vay

chung ta can biet tam ciia Cum ma no se the hien

nhu la hat nhan ciia nguyen tu Trong cac phucmg

phap phan cum, chung ta thay phuong phap phan

hoach bang k-means (MacQueen, 1967) la phii

hgp nhat bdi vi ta sau khi phan cum, ta co them gia

tri vector la tam cua cum Can noi ro them la hien

nay da co nhieu giai thuat cai tien ciia K-means (vi

du k-means ++) nhung vi dd don gian nen chiing

toi chi su dung k-means

Mo hinh dang cau true nguyen tu cho viec

phan cum tap tai lieu:

Sau khi xay dyng 3D-S0M va K-means, chiing

ta se tidn hanh xay dyng md hinh Chiing ta qui

udc nhu sau:

a) Moi cluster co la mgt nguyen tu

Moi nguyen tu la dai didn ciia mgt Cum Cu the,

hat nhan la Centroil fi cua Cum_^, cac electron

bao quanh hat nhan la cac vector x (vector TFIDF

ciia mgt bai bao) trong cimg nhom

Centroil M' ciia cluster W:

MCW) = j^SsewX

- Khoang each giira electron va hat nhan cua

nd dugc do bang su giong nhau ve ngii' nghTa giira

chiing (la he so cosin giiJa 2 vector), con ggi la

nang lugng ciia electron

ETigergy^x.\x) = Distance C x.JT) = T^-•:',-,•

- Nhiing electron c6 ciing muc nang lugng se

ndm trdn cung quy dao va dugc phan bo ddu trdn

cCia hat nhan bang muc nang lugng cua no khi he

thong a trang thai khong chuyen dgng

- Moi electron se co ciing kich thudc quy udc

- Kich thudc ciia nguyen tu = so electron * kich thudc electron

b) Neu chiing ta co k nguyen tu (clusters):

Sy phan bo ciia chiing se dugc tinh dya tren kich thuac ciia no (cu the la so lugng electron - la cac vector TFIDF cua tai lieu) theo sau:

- Tat ca hat nhan ciia cac nguyen tu dugc bo tri tren cimg mgt mat phang

- Tao ra mgt vong tron tudng tugng, chia vong nay thanh k goc - moi goc se chua mgt nguyen tir, do Idn moi goc tuong ung ti Id vdi kich thudc

nguydn tir ciia no Vector c ciia tam vong tron

nay dugc tinh theo cong thirc:

'^ ^ M 2xewX,

trong do W la tap tat ca cac vector TFIDF cua tap tai lieu, x la vector TFIDF ciia mgt tai lieu trong W

- Hat nhan ciia moi nguyen tii (tam n cua

moi cum) se nam tren dudng phan giac cua goc;

khoang each ciia hat nhan so vdi tam c ciia dudng

tron dugc tinh theo cong thuc:

7

Luu y, de dam bao ngii nghTa ve "tinh tuong ty" nen se khong diing cong thiic khoang each Euclid

- Dya tren vi tri cua cac hat nhan vira tinh dugc,

ta se xac dinh dugc su phan bo cua cac electron so vdi hat nhan cua no dua tren cong thuc tinh muc nang lugng nhu da dd cap phan tren

Vi du minh hga Hinh 6, chiing ta c6 bdn Cum

Cl, C2, C3, C4 vdi sd electron tuong dng nl < n2 < n3 < n4 Trong hinh bdn trai, chiing ta co the thay ro cac vimg se chiia cac nguydn tu dugc tinh toan dya tren so lugng electron cua no Trong Hinh 6, chiing ta dinh vi tri cac hat nhan ciia cac cum Cl, C2, C3, C4 sau khi tinh dugc vi tri ciia hat nhan nguyen tu so vdi tam cua hinh tron Luu _

y, khi xay ra sy chong cheo, dan xen giira cac quy dao cac Cum, mgt he so ti le can dugc them vao

dd dich chuyen cac vi tri hat nhan ra xa tam vong tron nham tao vimg khong gian each ly rgng han

Trang 6

^ ^ ^ ;

Hinh 6 Tinh vi tri cho bdn clusters dua trin kich thu&c

Nhu da trinh bay, moi nguyen tu se co toi da

nam muc nang lugng dd phan bo cac electron ciia

chiing Sau khi chuan hoa cac gia tri khoang each tii'

mgt electron den hat nhan theo khoang each toi da

qui udc, chiing se la mgt so thyc nam trong khoang

[0, max_distance] Trong mgt muc nang lugng cu

thd d trang thai tinh, cac electron se dugc phan bo

ddu trdn mgt qua cau ma co tam la vi tri cua hat

nhan va ban kinh chinh la gia tri muc nang lugng

ciia no H^ giai quyet bai toan phan bo ddu cac

diem tren mot qua cau, chung toi tham khao cac

giai phap tu cac dien dan thao luan (Bulatov, 1996)

mm w:-<'':i^ t > (.'-'•••: ^ • - ' «

mm

evenly distiibuied points on sphera random^ dislribuleii pointson^hue

Hinh 7 PItdn bS diu cdc diim trin qud cau

Hinh 9 4 nguyen tu v&i 25 electrons (trdi) vd S nguyen tie vai

Theo each tidp cap nay, chiing ta se co dugc

mgt cai nhin day dii dd phan tich timg Cum va

moi quan he giira cac cum Dya tren khong gian

ba chidu, sy gidi han ve khong gian cho viec tryc

quan hda da dugc xu ly mgt each hidu qua

Truy van thong tin

Dya vao tinh chat the hien tai lieu bang vector

(cu the la TFIDF), ta co thd xem mgt truy van

la mgt vector Xet vi du vdi mgt truy van q =

(visualization, cluster) tu tap vector tf-idf cua ba

tai lieu nhu sau:

Trude tien, chiing ta se bien doi truy van nay

thanh vector dan vi:

= ( 0, visualization, cluster)

^ (0,1,1) —* vector don vj

q = (0, V0*+ l»-f 1» , VO^ -r 1' -r 1» ) = (0,0.707 0 707)

Tinh didm Score (q,d) ciia moi tai lieu d ung vdi truy van q theo cdng thuc dg tuong ty cosine: (Manning, Christopher D.; Raghavan, Prabhakar; Schutze, Hinrich;, 2009)

vCq;-vCd)

Score (q.d) = —: •^—

^ | v ( q ) l | v ( d ) | Theo Bang 1, Doc 3 cd Score cao nhat ung vdi truy vdn q = (visualization, cluster) Dieu ndy ndi Idn rdng Doc 3 cd moi quan h? gan gui nhat vdi truy vdn nay

Trang 7

Term

Computer

Visualization

Cluster

Score(q,d)

Doc 1

0.996

0.087

0.017

0.074

Doc 2 0.993 0.120

0 0.085

Doc 3 0.847 0.466 0.254

0.509

Dddanhgiaheth6ng,chungtaxemxetbatiduchi: (1) Tinh tryc quan: tieu chi quan trgng dau tien la sy td chdc, sap xep cac doi tugng tren mgt man hinh may tinh ma no phai thoa man cac ydu cau vd sy dd hieu, sy kha dyng va sy tham my Dua trdn sy td chuc cac ddi tugng theo mo hinh nguydn tu, he thong dd dang the hien mgt each hidu qua cai nhin t6ng quan cung nhu moi quan he giua cac doi tugng ridng re trong tap dii lieu Kdt hgp vdi kha nang tuang tac tot, he thong se cho chiing

ta hidu dugc nhieu thong tin hem Cu thd nhu chung

ta CO thd so sanh mirc do tuong ddng giira cac Cum dir lieu mgt each gian tidp dya trdn khoang each cua cac hat nhan so vdi tam chung Khi muon tim hieu moi quan he tryc tiep, he thong nay se hd trg chung ta so sanh d che do ludi 3D-SOM (2) Thdi gian thyc thi: la mgt trong nhimg yeu cau thiet yeu ciia he thdng Chiing ta xem thdi gian thyc thi cua cac tac vu sau day:

Hinh 8 Ky thudt Arcball

Bdng 2 Thoi gian thyc thi cho 3 cum v&i Idn lugt 25x5052, 50x8865,100x12348

(bdi bdo x s6 chiiu cua vector tfidf)

Tiromg tac:

Chung toi hien thyc he thong nay trong khdng

gian ba chidu vdi day dii tinh nang cua mgt he thong

tryc quan nhu la overview, zoom, filter

Detail-on-demand, relative and extract (Shneiderman, 2010)

Chiing toi chgn ky thuat Arcball (Shoemake, 1992)

cho cac thao tac trong mgt the gidi 3D mgt each

tryc quan bdi vi nd khdng doi hoi cac thidt bi dac

biet ho trg tuong tdc nhu ky thuat 3D ball va Tracer

Tinh toan th6i gian thtrc thi tirng tac vu

Xay dimg TFIDF

K-means

SOM

Phan bo deu tren qua cau

0(NxM) 0{IKNM)

O ( I x N x M x ) NumNeuron^

O ( I x N ' )

25x5052 00:00:17.66 00:00:00.23 00:00:24.15 00:00:00.03

50x8865 00:00:53.93 00:00:00.91 00:00:41.16 00:00:00.10

100x12348 00:02:36.86 00:00:03.24 00:02:01.55 00:00:00.44 Don vi: gi6:phiit:giay.%giay I:s6 vong lap, K: so cum, N: so lugng bai bao, M: so chieu Vector cua bai bao

Thdi gian thyc thi he thdng bang tong thdi gian

cac tdc vu Nhin chung, cd thd thay thdi gian tinh

toan cho vide phan cum tryc quan 100 tai lieu chay

d-dn may CPU Core i5, RAM 8 GB xdp xi 5 phut la

chap nhan dugc Do khong cd so lieu do ciia cac he

thdng khac ndn chua thd danh gia so sanh chung

(3) Tinh dung dan cua giai thuat: nhu da Ae cap,

each tidp can nay sir dung hai giai thuat qua phd

bidn la SOM va K-Means vdn da dugc chiing minh

tinh dung ddn Trong do, chung toi tryc quan hoa

giai thuat SOM de tao tidn de xac dinh tham s6

K cho giai thuat K-means Nhu v|y, he thdng vdn

dam bao tinh diing dan cua viec phan cum

5 Kdt luan

He thong tryc quan hoa thong tin 3D nay dugc chiing tdi phat trien nham hd trg viec phan cum tryc quan tap cac bai bao khoa hgc noi rieng va cac tai lieu, van ban noi chung theo md hinh nguyen

tu trong khong gian 3 chidu, nd trg giiip cac nha khai pha dir lidu trong viec phan tich Cum vdi mgt tap du lieu co chidu rat Idn Cac thi nghiem

da cho thay rang phucmg phap tidp can ciia chung toi cd the cai thien hidu qua ciia tryc quan de phan tich cluster Nhu chiing ta da thay, he thdng nay

Trang 8

moi quan he giiia cac bai bao khoa hgc Uong mgt nay, nhimg ngudi khai pha dir lieu c6 the dd dang tap hang ngan bai bao Cong cu nay se rat hiiu ich udc tinh so lugng cum cung nhu c6 mgt hudng trong viec hien thj cum va phai bay nhirng khoang ddn hieu qua cho viec phan tich dii lieu trong trdng trong bg dii lieu (xu hudng tiet 16 thong tin nhirng budc tiep theo vdi thong tin chinh xac hon

Tdi lieu tham khao

Ankerst, Mihael, Grinstein, Georges, Keim, Daniel 2002 Visual Data Mining; Background, Techniques, and Drug Discovery Application Alberta, s.n

Bulatov, v 1996 The Mathematical Atlas A Geteway to modern mathematics Xem 20.01.2013

<http://www.math.niu.edu/~rusin/known-math/96/repulsion>

Cruz-Neira C, Sandin D., DeFanti T, Kenyon R., Hart J 1992 The CAVE Communications of the ACM 35(6), pp 64-12

Kohoren, T 1997 Seft-Organizing Maps In: Second extended Edition ed Berlin: Springer MacQueen, J B 1967 Some methods for classification and analysis of multivariate observations

Berkeley, University ofCalifomia Press, pp 281-297

Manning, Christopher D., Raghavan, Prabhakar, Schutze, Hinrich 2009 "An introduction to

Information Retrieval" Cambridge University Press

Porter, M F 1980 "Algorithm for suffix stripping" Program, pp 130-137

Shneiderman, P 2010 Designing the user interaction interface: strategies for effective Human-Computer Interaction 5th ed s.I.:Addison Wesley

Shoemake 1992 "Arcball: a user interface for specilying three-dimensional orientation using a

mouse" Proceedings of Graphics Interface'92, pp 151-156

Ward, Matthew, Grinstein, Georges, Keim, Daniel 2010 Interaction Data Visualization: Foundation, Techniques, and Application s.l.:AK Peter, Ltd

Ware, C 2004 Information Visualization: Perception for design 2nd ed s.l.:Morgan Kaufinan

Zhang, Ke-Bing, Orgun, Mehmet A, Zhang, Kang 2006 H0V3: "An approach for Visual Cluster

Analysis" In Proceedings of The 2nd International Conference on Advanced Data Mining and Application, Volume LNAI4093, pp 316-327

Ngày đăng: 08/12/2022, 17:25

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w