Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tha Phdn A Khoa hgc Ttr nhiin, Cdng nghi vd Mdi Irudng 33 (2014) 49 57 Tap chl Khoa hpc TnTdng f)ai hpc Can Thtf website sj ctu edu vn PHAT HIEN MON HQC QUAN TRONG[.]
Trang 1Tap chl Khoa hpc TnTdng f)ai hpc Can Thtf
website: sj.ctu.edu.vn
PHAT HIEN MON HQC QUAN TRONG ANH HlTCJNG DEN Ktl QUA HOC TAP
SINH VIEN NGANH CONG NGHE T H 6 N G TIN
Dd Thanh Nghj', Pham Nguyen Khang', Nguydn Minh Trung^ va Trinh Trung Hung^
' Khoa Cdng nghi Thdng tin & Truyin thong Trudng Dal hgc Can Tha
^ Khoa Khoa hgc Tif nhien, Trudng Dgi hgc Cdn Tha
^ Trung tdm Liin kit Ddo tgo, Trudng Dgi hgc Cdn Tha
Thdng tin chung:
Ngdy nhgn: 20/01/2014
Ngdy chdp nhgn: 28/08/2014
Title:
Detection of the key courses
affecting the learning
outcomes of information
technology students
Tirkhda:
Chuang trinh ddo tgo ngdnh
CNTT, Khai mo dd lieu,
Rimg ngdu nhien, Rut trich
dgc trung
Keywords:
Study program of information
technology Data mining
Random forests Feature
ABSTRACT
This paper presents data mining approach for detecting the key courses which affect the learning outcomes of information technology students We collect the study results of undergraduate students studying information technology programs at Can Tho University; and then the pre-processing step is to transform the dataset into structured one (i.e the table formal) suited for the input of data mining algorithms used in the next step The random forest model is learnt from the dataset lo extract the important features (the key courses) The experimental results showed that the key courses extracted by our proposed approach provide useful information to educational managers to improve the training efficiency
TOM T A T
Trong bdi ndy, chung tdi gidi thieu tiip cgn khai mo dit liiu di phdt hiin mdn hgc quan trgng dnh hudng den ket qua hgc tap ciia sinh vien ngdnh cdng nghe thdn^ tin (CNTT) Chung toi tiin hdnh suu tap dir lieu hgc tap ciia sinh vien tdt nghiep ngdnh CNTT tgi Trudng Dgi hgc Cdn Tha, sau do thifc hiin budc tiin xit ly dd lieu, dua dH liiu vi cdu true bdng Chiing tdi
di xudt su dyng gidi thugt rimg ngdu nhien hgc tit dit liiu di rdt trich cdc mdn hgc quan trgng trong chuang trinh ddo tgo ngdnh CNTT Kit qud thu dugc sau khi rut trich cd thi cung cdp thdng tin hiru ich cho cdc nhd qudn
ly gido diic trong viec td chdc gidng dgy di ndng cao hiiu qud ddo tgo
1 GlOfI THIEU
Trong nhieu nim qua, ngay ca khi so lugng dio
tgo nhan luc t ^ cic tnrdng dgi hgc, cao ding
chuydn nganh v l cdng nghf thdng tm (CNTT) da
ting gip 3 cho den 4 lan, nliu ciu ngudn nhin lye
CNTT ting nhanh Nhung theo dinh gia eua cae
nhi myln dung, dao tgo CNTT d cic trudng hien
chua dip ling dugc nhu eiu thyc tien, Nguyen
nhin chii ydu do chit lugng diu ra eua sinh vidn
nginh CNTT van cdn thip De nang cao dugc chit
lugng ciia sinh vidn nhim dap iing dugc nhu ciu
thyc tien, cin phii cd sy phdi hgp nbjp nhang gida
nhi myen dyng, ca sd dao tao mi d do vai ttd cua nha c^iaxi ly giao due, dpi ngu giing vien, giao vien
ed v ^ hpc tip v i sinh vidn Lam sao giio vien cd vin hpc tip tu van de sinh vien biet duge kien thdc nio la quan Upng inh hudng ddn ket qui khi ra trudng Nhd dd sinh vidn chu tim ban d cac mdn hpc quan Ugng nhim cii thifn dugc chit lugng hpc tgp Ddng thdi, nhi quan ly cung cd eo hgi bd tri, sip xep chuang ttinh, dpi ngu giing vien phu hgp vdi cic mdn hpc tbufc phan kien thuc quan Ugng
Chiing tdi dl xuit tilp can phit hien mdn hpc
Trang 2vien CNTT tai Tnroi^ Dgi hpc Cin Tha (DHCT),
dya tten cdng nghf kham phi tri thiic v i khai md
dii lifu (Fayyad el ai, 1996) Qua do, nhi cjuan \y
cd ehiln lupc quin ly phu hop nbim cai tien chit
lugng giang dgy cho nhom mdn hpc quan ttgng,
giao vien ed vin tu vin cho sinh vien tap trung cai
tiiifn chit lugng hpc tip Nang cao hifu qui dau ra
cua sinh vien CNTT Cac budc tiiyc hien nghien
ciiu cua chiing tdi bao gdm suu tap dii lieu hpc tip
ciia sinh vien tdt nghifp nganh CNTT, sau do thyc
hifn budc tiln xu ly du lifu, dua du lieu ve ciu
tnic bing ma tir dd giii thugt rimg ngiu nhien
(Breiman, 2001) duge huin luyen dl nit ttich cac
mdn hpc quan trpng Uong chuong trinh dao tao
Kit qui tiiu dupe sau khi rut m'ch bao gdm cac
mdn hgc nhu xic suit thdng ke, toan rdi rac, ciu
tnic diJ lifu, cd the cung cip thdng tin huu ich cho
cic nha quan ly giio dye, ^ang vien, smh vien
ttong vifc td chuc giing dgy dl ning cao hifu qui
dio tgo
Phin tilp theo cua bai viet dugc ttinh bay nhu
sau: Phin 2 ttinh bay ngin gpn ve cac nghien eiiu
liSn quan; Phin 3 ttinh bay giai thugt hpc riing
ngiu nhien vi cic nit trich dgc tnmg; Phin 4 ttinh
biy cie kit qua thyc nghifm tiep theo sau do la ket
lu^n vi hudng phat triln
2 NGHIEN CCrU LIEN QUAN
Nghien ciiu dng dung khai mo dii lifu vio quin
ly giio dye dao tao dupe xem rit cin thiet cho cac
hogch dinh chien lupc giao due ngiy cang hieu
qui Gin diy cd eic cdng Uinh nghidn eiiu irng
ich ttong giao due
Nghien ciiu ciia (Le, 2002) de xuit su "duiig
khai phd lugt ket hgp (Agrawal et al., 1993) va
logic md (Zadeh, 1965) tten kit qui thi tot nghifp
THPT vi THCS cho muc tieu dinh gia hifu qua
dao tgo vi cung cip cac thdng tin can thiet cho qua
ttinh ning cao chat lupng hgc sinh Mdt hudng tiep
c|n tuong ty ciia tic gii (Nguyen, 2002) su dung
luat kit hgp Uong vifc tinh diem de phat hifn hgc
sinh ydu, cae hpc sinh cin phu dgo thdm
Lugn vin thgc si ciia (Pban, 2009) da n^ien
cdu phuong phip khai md tim luat kdt h(^ Uen dii
lifu giao due LTng dyng thyc nghifm tten dd lifu
ket qui hpc tgp cua sinh vien trudng Dai hgc Tdn
Due Thing, nhim ho tig dinh gia va dy doan kit
qui hgc tgp cua sinh vien, qua dd ning cao chit
lugng dio tgo
Dd tai thac sT cua (Nguydn, 2011) t§p trung xiy
dyng hf th6ng du doan tdt nghiep ph^ tiidng trung
hgc Tic gii ap dung thuit toin khai phi lu§t ket hgp md vao vide dy doin kit qui tot nghifp phd thdng trung hpc dua ttdn hpc lyc vi hanh kilm cOa hpc sinh
Nghidn cdu khic cua (Nguyin, 2012) ttinh bay kit qui di dat dugc khi tiln hanh ap dyng gidi thuit gom cum dii Ufu, kMeans (MacQueen, 1967)
dl khai thac tiidng tin td dilm hpc smh cua uudng Cao ding nghd Van Lang H i Ndi Tic gii tim bilu
sy anh hudng ciia viing mien, ciia hoin cdnh gia dinh, din tpc, dgo ddc din kit qui hpc tip eiia hpc sinh, phin logi kit qui hpc tip de danh gii mdt each nhanh chdng nhan thdc ciia ngudi bge Tir d6
ed nhung dilu chinh giing dgy ciia giio vifin phu hgp vdi ning lye ngudi hgc
Nghien ciiu cua (Nguyen et al., 2007) dl xuat
su dyng gidi tiiuit may hpc ciy quylt dmh
(Breiman et al., 1984), (Qumlan, 1993) vi mang
Bayes (Pearl, 1985) trong dy doin kit qui hpc tgp cua smh viSn dai hpc v i sau dai hpc eiia Trirdng
DHCT Mdt nghien ciiu khic cua (Nguyen et ai,
2011) dl xuit sd dung ky thuit phin ra ma trin dl
dy doan ket qua hpc tgp cua sinh vidn
Nghien eiiu cua (Pal & Pal, 2013) dl xuit silt dyng giii tiiuit may hpc ciy quylt djnh (Breiman
et ai, 1984), (Quutian, 1993) va Bagging
(Breiman, 1996) dl dy doin kit qui hpc tgp eia sinh vien Dgi hpc Purvanchal, An Df
Nghien cdu eua (Bukralia et ai, 2012) di dl
xuit sd dung cic ky thugt may hpc nhu mang
no-ron, hdi quy logistic (Hastie et ai, 2001), ciy quylt dinh (Breiman et ai, 1984), (Quinlan, 1993), miy
hpc vec-ta hd ttg SVM (Vapnik, 1995) dl dy doan ket qua hpc tgp cua sinh vidn theo hf dio tgo tir xa cua Dai hpc Midwest, Hoa Ky
Cd the thiy dugc ring, cic nghidn curu tren diy deu t£^ trung vio dy doin ket qui hgc tap, dy doan diem mdn hpc Nghien ciiu ciia chiing tdi de xuat kbdng di theo hudng dy doin chinh xic ket qua bge tip Chiing tdi quan tim den phdt hifn mdn hpc quan Ugng inh hudng din kit qui hpc tip eia sinh vien nganh CNTT dya tten giii thu§t hpc rdng ngiu nhien
3 GIAI THUAT RlTNG NGAU NHIEN
Tiep can riing ngau nhien do (Breiman, 2001) dua ra li mdt ttong nbOng phuong phip tgp hgp
md hinh thanh cdng nhit Giii thuit rime ngiu
Trang 3(Breiman et ai, 1984), (Qumlan, 1993) khdng cit
nhanh, mdi ciy dugc xay dyng ttdn tip mau
bootsttap (liy miu cd hoin lgi tir tip hpc), tgi m6i
mit phan hogch tdt obit dugc thyc hifn tu vifc
chpn ngiu nhidn mdt tip con cic thudc tinh, Loi
tong quit cua rdng phu thupc vao dp chirdi xac ciia
tiing ciy thinh vidn ttong rdng vi sy phu thudc ldn
nhau giua cic ciy thdnh vidn Giii thu^t rdng ngau
nhien xiy dyng cay khdng cit nhinh nhim giii cho
tiianh phin l6i bias tiiip (tiianh phin ldi bias l i
thdnh phin ldi ciia gidi thuit hpc, nd ddc lip vdi
tap dir lifu hpc) v i diing tinh ngiu nhien dh dieu
khidn tinh tuong quan thip giiia cie ciy ttor^
rimg Tiep cin rung ngiu nhien cho dd chinh xic
hifn nay Nhu trinh bay ttong (Breiman, 2001),
rimg ngiu nhien hpc nhanh, chju dyng nhieu tdt va
khdng bj tinh ttang hpc vet Giii thuit rdng ngau
nhidn sinh ra md hinh cd dp chinh xac cao dip ung
dugc ydu ciu thyc tien cbo van de phan loai, hdi
qui
3.1 Giai thuit xay dung riing ngiu nhien Giii thu|it may hgc riing ngiu nhien (Hinh 1)
cd the duge trinh biy ngan ggn nhu sau:
Tii tap du Ufu hpc LS cd m phin td v i n bien (thuOe tinh), xay dyng T ciy quylt dinh mdt
each dfc lip nhau
Md hinh ciy quydt dinh thu t dugc xiy
dyng tren tgp miu Bootsttap thd / (lay miu m phin tii cd hoin lgi tii tip hgc LS)
- Tai ndt ttong, chpn ngau nhien n' bien
(n ' « / i ) vd tinh toan phan hogch tdt nhat dya tten n'bidn nay
- Ciy dugc xiy dung ddn dp siu tdi da khdng citnhanh
Kdt thiic qua trinh xiy dyng T md hinh ca sd, tiling chien lupc binh chpn so ddng dk phan ldp mdt phan tii mdi den X
iy^ 'Y^ -yiX
?,(x)
iSM^smi^s^iSiE imtsm^^S^
3.2 Riit trich die trung
Rut ttich die tnmg quan ttgng duge thyc hien
ttong khi huin luyfn md hlnb cua rimg ngiu nhidn
M5i budc /, sd dung tip Bootstrapi (liy miu cd
hoan lgi m phin td td tgp huin luyfn LS) dk xay
Hinh 1: Giai thuat riing ngau nhien
ttong rimg ngiu nhien; giai thuat liy tip
Out-Of-Bootsttapt, OOBt (cae phin tu Uong tap du lifu huin luyen LS nhung khdng nam Uong tip
Bootstrapi) lam tip kiem Ua dl tinh dd chinb xac
phin ldp ciia ciy DTi ttong rimg ngau nhidn
Trang 4cac thudc ti'nh Su khic bift cing Imi tiii tiiudc tinh
tuong ling cing quan ttpng Tu -y tudng niy, chiing
ta thyc hien nit ttich mdn hpc quan ttpng anh hudng din kit qui hpc tip ciia sinh vidn nganh CNTT Chung ta cd thi xem smh vien nhu la 1 ddng (miu tin, phin td cua dir lifu), cic mdn hpc ciia sinh vien xem nhu thudc tinh (cdt, trudng) vi ket qui xdp logi hoc tip khi ra trudng co the xem Id ldp (nhan) Nhu viy, dfi lifu hpc tap ciia sinh vien chinh li bing dii lifu Chung tdi su dung riing ngiu nhien hpc dd phan logi sinh vien Trong qua ttinh xay dung md hinh hpc, rimg ngiu nhien thyc hifn nit trich cac indn hpc (thudc tinh) quan trpng nhu vira dugc md ti Cd the di^n giii rang nhiing mdn hpc quan trpng dugc rut trich tii md hinh hpc rdng ngau nhien li nhiing mdn hgc lim anh hudng rat ldn din kit qui phin logi hgc tip cda sinh vien
Thudc tinh quan ttgng dugc hidu li tiiudc ti'nh
lim dnh hudng rit nhieu den kdt qui phan ldp eua
rimg ngiu nhifin Cu thi la ndu ed nhiing tiiay ddi
(hoan vi cic gia tri eiia thupc tinh) thi dp chinh xac
phin ldp ciia rimg ngiu nhien hi giim nhieu so vdi
khi ehua tie dpng lam thay ddi thufc tinh
Vifc thyc hien cac tmh toan dk xac djnh thudc
tinh quan ttpng trong rimg ngau nhien nhu sau Khi
xiy dyng ciy thu t hpc tii tip Bootstrapi, ti'nh dp
ehinh xdc ciia ciy DTt su dung tip OOB,
(Out-Of-Bootstrap), la Acc(DTi, OOB^ Lan lugt thyc hien
OOBi, Id OOBi(rand(i)) Tinh lgi dp chinh xac ciia
cay DTi su dyng tap OOB,(rand(i)), Acc(DT,,
OOB, (rand(l)) TiSp ddn, can tinh lgi sy khac bift
ve dp ehinh xac tiudc vi sau khi hoin vi eae gia tri
cua thupc tinh thd / ciia ciy DT, Vdi cac thupc tinh
1=1,2, , k, chiing ta cd:
Aacc,,! = Acc(DT,, OOBi)
OOB,(rand(I)))
Aacc,^ = Acc(DT,, OOB,)
00B,(rand(2)))
Acc(DTi,
Acc(DT,,
OOB,) Acc(DT,, Aaccyt = Acc(DT[,
00B,(rand(k)))
Vdi md hmh rdng ngiu nhien RF cd T eay,
chiing ta ed duge tong sy khac bift vl do ehinh xac
ttudc vi sau khi hoan vi cac gia tti ciia thudc tinh
eua rimg ngiu nhien RF Id:
thu^ctinbl: a.\ - Aacci.i -r Aacc2,i
+AaccT.i
thufc ti'nh 2: Ui - Aaeci^ -r Aaccj.:
+AaccT.2
tiiufc ttnh k: uk = Aaeci^ -r Aacc2,k +
+AaecT,k
Sip xep 0/, a;, , at theo thd ty giim dan, dilu
niy ddng nglua vdi thii ty tdng sy khic bift vl df
chinh xac trudc va sau khi hoin vj cac gii tri cua
4 KET QUA THV'C NGHIfiM Trong phin thyc nghidm, chiing tdi tien hanh thu thgp du lifu ket qui hgc tip cua sinh vidn t^l phong dao tao, Trudng DHCT, Dii lifu tiiu tii|lp bao gdm kdt qui hgc tap ciia sinh vien ngdnh CNTT tiiudc eae khda tu 20 din 29 (tuyln sinh tir nim 1994 din 2003) Cae khda tir 30 ttd vl sau dung phuang phap dinh gii kit qui hgc tip theo tiiang dilm chii (A, B, C,„) ngn dii lieu khdng ddng nliit, do sd lugng dtt lifu thu thgp da du ldn eic khoa niy Du lifu thu thip duge c6 dgng ciu true bdng, dugc td chuc theo tiing hgc 1^ nam hpc Mdi hgc ky nam hgc cd cac tip tin dii lifu nhu;
diem (luu dilm sinh vidn), dtolng (luu sinh vien tit
nghifp), ctdt (luu chuong trinh dio tgo) va cdc
tip tin dii lifu khic tgi hgc k^- dd Ben cgnh do, cdn
CO eic tap tin luu trii thdng tin didn giii cie mi s6~ eua hf tiling nhu: hg tdn smh vidn, ten mdn hpc, ten nganh,
Mdi tip tin diem chiia eic thdng tin: rai sd sinh
vien, ma sd mdn hgc (tuy chpn v i bit bupc), dilm thi ciia cac mdn hpc, m i sinh vidn tham gia hpc vao timg hpc ky (Hinh 2)
Trang 51
1
»
]>
1380001
F mawlilF nanhJF (fienfat F I O | F cfieml | F cliem2|F TM002C i04
^°etsm !VLOD2C
1980001 'THtraiC
1380001 jTNOOGC
13aomi JTH004C
1380001 :VL092C
1900002 ITN002C
1380002 iMLIOIC
lffi0002 iTHOOIC
04
04
04
04
04 1380Ca2 :HH002C 104
I98aro2 ;TNDOH:
1980002 "'IfHCSMC
04
ISBOroz 1VL092C m
1990003 iMLloic VoV
1990003
1990003""
1980003
1380003
i M 5 :
i : S 0 '
diemlll
• 7 0 ; j
; 7.0 : :
; a o (
•-; ao : •-;
• ; 5.0 : ;
1 : RO : \
; 5.0 : i : 6.5 • ;
; : 4.0 ^ I
! ! 6 0 = i
\ ' ; 4-5 : \
; ' & 0 •
VL002C 104 1 TH001C iHH052C
04 i TN006C ;04 1 TH004C -04 VL092C 104
: 13.0 i
i 1 6.0 :
L.„ j 4 0 ISO
j
„ J
' 4 0 1 i
1 5.Q ! !
150 ; 1
1 ! 5 0 i 1
Hinh 2: Cau true tap tin diem
Tap tin dtotng chira thdng tin ket qud xep loai
tdt nghifp ciia sinh vien bao gdm: ma so sinh vien,
fBJf'Dtotng
ma nganh hpc, diem tdt nghiep, xep logi tdt nghiep, (Hinh 3)
F m a s v
1950233
iVdiWi'""
1960216
1970423 '"'"
19704S3
1970525
1970554
19B05-'l
19S0551
1980SSI
F m a n s
5 6
5 6
5 6
56
5 6
5 6
5 6
5 6
F m a b e d t F n h h k F d t b t n
5.61
5,46
5.97
5.71
6.07 6.17 6.33
6.03 6J20
6.65
F xeploai Tning_Bliih Trong Binh
T n m ^ B l s h Tbinh^IOiS
T b i n l u K i i ^
Tbinh Kh£
TbJnh_Kha
T b i n l L K h i Tbinh K h i
n > i n h _ K h i
T>itTiti_ K h i j
n ) l n h _ K l i 4
F d o t t n
3 0 0 7 4 _
3 0 0 6 4 ^ ,
30064'
30084 30074"
30064
30064
30074
300S4
300S4 1
30074
30064
Hinh 3: Cau true tip tin didm tot nghifp
Ngoai ra, chiing tdi cdn sd dung tap tm ctdmh
(luu ten mdn hpc) dl diln gidi ten mdn hoc tir ma
sd mdn hpc (Hinh 4)
F mamh
TH343C
TH344C
TH345CJ
TH346C
TH347C
TH34SC
TH349C _
THSSOC "
TH351C
F tenrnhvn
Do in sidu hoc 2 - Tin hoc
X i K- tin hieu s5
Ca sa vien thong
TT J Ihtiat do & vi XEt
K-Dignt&cdng suSi&iitig dnng
TT.Di6n tn c6ng snat &i!nf dung
Ky thuat Audio & video
Ati-ten & t n i y ^ sdng
£)d 5n in6n hoc 1 - D i ^ tir
1 F dvhl i
21
31 4i
X
2-J 1_
_l" 5'
; " 2;
F ts I F it|
30: "l 45; 45? 60| 45i
45 Ol 30! 30'
301 0 75i 75^
""451'" 451
30] _0j
Trang 6Ben canh do, chung toi ciing tim hiSu plmong V<HX,\kdito monliocthili; a i a s 6 tin cUeia pliap tinh dilm hoc tap cua sinh vien tk xip ioai tdt " o n hgc thii,; H la so mon hgc sinh viSn tich luy nghiep BiSm t6t nghi$p (BTN): la trung bmh c6 4.1 Ti«n xft \-j dir li|ii
trgng s« ciia dijm cac m6n hgc da tich % tinh ffln ^^^^ ^j ^^^ ,^ ^^ ^ ^ ^.^ ^^ ^ ^ ^^ thoi diem xet (khong bao gom^ cac mon hoc dieu ^^^ ^ ^^^^ ,= ^ j_^ ^ ^ ki?n va cac mon hpc b, diem F) Cong thuc _j j ^ ^^^j_ ^ j ^^ _ ^ ^ ^
bieu dien ten cua moi mon hpc, moi dong (record)
n md t i ket qua hpc tgp toan khda hpc cua moi sinh
ZJ ^ J ^ / vidn De lam duac didu nay chiing tdi tien hdnh -^ ^ ^ * thue hien hai budc:
nhu sau:
7 5.5
S 8.5
S
8 7,5
8
10
95
9.50
950 9J0 3.50 S.5Q
950 9.50
930
950 9.50
7.69
74 7-4
769 7S9 7.4 7.4 7.69
769 7.4
Ten_MH Diem Diem_TB Diefn_TB_TN
N i « i luan l-Tm hgc ( L ^ b t i h
Nien kjan l-Tm hgc { L ^ idnh
r^en fuan 2 - Tin hpc
Nien tuan 2 - Tin hoc
Nien luSn S-Tin hgc (XD HTTT)
Nien l u a i S-Tin hgc (XD H T T f )
Phan t ich he thong
Phan t i c i i he ^ o n g
Phan tichSSiia ke t h u i t loan
F^an t i c h S t i i i 3 ke thuat toan
Hinh 5: Dir lifu tdng hpp
- Budc 1: lpc liy du lifu diem cua sinh vi6n the tham gia trong sudt khda hpc Vi cd mdt so ngdnh CNTT, xoa hd cac sinh vien nganh khac; mdn hpc sinh vien khdng tham gia hgc (e6 thd eic xoa bd cae dii lifu khdng hc^ If; cic thudc tinh khda cd cic mdn hpc khic nhau) hoac dupe miln, khdng quan trpng va chuyin da Ufu tir cac tap tin trudng hop nay dilm ciia cac mdn nay cd gia trj la diem vi tip tin tot nghifp ve mdt bang duy nhit NULL vi chiing tdi xii ly eae gia trj NULL thanh (nhu bing d Hinh 5) 6 budc niy chiing ta thu gia tri 'NA' Sau qua trinh tiln xii 1^ dii lifu, chiing dupe mpt bing d§ bd qua cac thudc tfnh khdng tdi thu dupe bdng d& lifu cd 317 dong va 249 quan trpng vi dy nhu ma sd, hp tdn, tdn nganh, truong (m§i truong tuong ting vcri mpt mdn hpc, Budc 2: Dua trdn bang dft heu vira xay S*^ ^i ciia mdi ti-uong la dilm ciia mdn hpc dd) va
dimg a budc 1, ching tdi tilp tuc bilu diln dft lifu ^ ° ^ 8 ^"^^ ^^& ^^ ^^^f" t ^ ^ g bmh tit nghifp trong bang sao cho cac cdt md ta cae mdn hpc, eae ^"^^ Binh 6)
ddng md td digm eic mdn hpc ma m§i sinh vien cd
Alloa AnhvJn CDel »J!y CEeZTriiue, Trbgc To3(iroii?c Tri tui W-7i[*i., : ^ ^ ^ XI/ikligheMq X3lySi~Rsii_TB DiemJB3
n.1 nft 0 S fl.S NA NA
Hinh 6: Ket qua hpc tip thu dupe sau bud'c tiln xw Iy
Trang 7Tiep theo chiing tdi bd cdt chira cic mdn hpc
mang tmh chit dilu kidn trong chuang trinh dio
tgo nhu: giao dye thi chit, giao due qudc phdng
Cac mdn hpc niy sinh vien chi cin hpc dat (hoan
dupe su dyng dc dinh gia phin loai ket qui hpc
tip Trudng (cdt) cudi cimg li diem trung binh tdt
nghiep dugc lim trdn ddn phin nguydn duac dimg
8,9 ddn 10
Sau khi thuc hien tidn xtr ly dft lifu, chung tdi
dupe tip tin dft lifu cd ciu tnic bing, sir dyng de
phin tich kdt qui hpc tgp ctia sinh vien
4.2 Xay dung md hinh riing ngau nhien va rut trich mdn hoc quan trpng
Chuong trinh xir ly ciia ching tdi dua tren goi chuong trinh riing i^iu nhidn randomForest cung cap sin trong mdi tnrcmg ngdn ngft R (Ihaka & Gentieman, 1996) Tidn hinh xay dyng md hinh thuc nghifm rimg ngiu nhien vol 200 ciy guyet dinh, thyc hifn lay ngau nhien 50 thudc tinh de tinh phin hogch tgi mdi niit Qui trinh nit trich die tnmg (mdn hpc) nhu md t i d phin trudc Sau do sip xdp thii ty quan trpng cac mdn hpc giim din Ket qua nit trich lay 10 mdn hpc cd tinh quan trpng nhat (anh huong den ket qui diu ra eua sinh vien CNTT), duac trinh biy trong Hinh 7
model
Xacsuat.thong.ke.A
Diem_LuanVan
TTn.hoc.ly thuyet
Cau.tmcdu.lieu
Lap.trinh.huong.d_Tuong.C
Ngon ngu.lap trinh
Toan.roi rac 1 dS_Bool TH
C^.SQ.du.lieu
Chuong triitfi.dich
Ly.thuyet.xep.hang
MeaiDec^easeAcctffacy Hinh 7: Top 10 mdn hpc quan trpng Trong top 10 mdn hpc quan trpng nhit dupe nit
trich tir rad hinh hpc rimg ngau nhien, chiing ta eo
thi thiy ring cic mdn hpc phin bo vao 3 nhom
nhu sau
Nhdm cae mon hpc dai cucmg: xac suit thdng ke, toin rdi rac, vgt l^ lupng tOr,
- Nhdm mdn hpc ca sd nganh: Tin hpc ly thuylt, Co sd dft lifu, Ngdn ngft lap trinh, Cau true
Trang 8Nhom mdn hpc chuyen nganh: luan vin tdt
nghiep, chuong trinh dich
Theo nhu kit qui ti-inh bay till diy li nhftng
mdn hpc ding quan tam nlulu nhit, vi kien tiiiic
cic mon hpc niy anh hudng din chit luong dao tgo
ciia sinh vidn nganh CNTT
Dl kilm chiing kit qua nit ttich dgc tnmg
(mdn hpc) quan trpng, chiing tdi danh gia lai hieu
qua ciia phin ldp dft lifu, lan lupt tren tap dii lifu
vdi tip diy dii 249 dac tnmg (mdn hpc) v i tip dft
lieu chi vdi top 10 dac tnmg (mdn hpc) quan trong
lifu vdi diy dii 249 dac tiimg li 21,14%, ti-ong khi
ti If ldi ti'dn tip dft lieu vdi top 10 dac trung chi la
20,82% Dilu nay cho thiy rang ta chi can xac dinh
dupe 10 mdn hpc quan trpng nay la cd till phan
logi duac kit qua hpc tip ciia sinh viSn
Tom lai, vdi kit qua tren, nghien ciiu da phat
hien ra nhung mdn hpc quan trpng inh hirdng dSn
kit qui xip logi tot i^hifp ciia sinli vien nganh
CNTT Hay ndi cich khac, sinh vien hpc tdt cic
Nhftng mdn hpc ndy cd the sii dung dl phan loai
kit qud hpc tip ciia smh vien tdt nghiep
5 KET LUAN VA HlTOfNG PHAT TRIEN
Chiing tdi vua trinh bay mdt tiep cin khai mo
do lifu dl phat hifn mdn hpc quan trpng inh
hudng din kit qui hpc tip cua smh vien nganh
CNTT tgi Tmdng DHCT Cac buoc tiiuc hifn bao
gdm suu tap dft lifu hoc tip eua sinh vien tdt
nghifp nginh CNTT, sau do thuc hifn budc tiln xir
1;^ dft lifu dl cd the huin luyfn md hinh rimg ngau
nhiSn cho phep nit trich cae mdn hpc quan trpng
Kit qui thu'dupe, sau khi nit trich cdc mdn hpc
cac nha quan l;y giio dye, giing vien, sinh vien
trong viec to chiic giang dgy di: nang cao hifu qua
Trong tuong lai, chiing tdi dy dinh md rdng
nghidn ciiu va phit trien cho cac nganh ddo tgo
khac nhu kinh te hay mdi tniong Ngoai ra, can
phai tham khdo them y kien cua cac chuydn gia
quin ly giio due, ngudi sft dung lao ddng dl gop
phin quan trpng
TAI L i f u THAM KHAO
1 R Agrawal, T Imielinski and A Swami.:
Mining Associations between Sets of Items
hi Massive Databases, in proc of
ACM-Management of Data, Washington, USA,
pp 207-216 (1993)
2 L Breiman, J Friedman, R Olshen, C
Stone.: Classification and Regression Trees
Chapman & Hall (1984)
3 Breiman, L.: Edging predictors Machine
Learning24(2):l23~l4(i (1996)
4 Breiman, L.: Random forests Machine
Learning 45(1): 5-32 (2001)
J R Bukralia, A-V Deokar, S Samikar, M Hawkes.: Using Machine Learning Techniques in Smdent Dropout Prediction
Chapter 7 in Cases on Institutional
Research Systems, Hansel Burley Eds., IGI
Global, pp 117-131(2012)
6 U Fayyad, G Piatetsky-Shapiro, P Smytii.: From Data Mining to Knowledge Discovery
in Databases, in AI Magazine, 17(3): 37-54
(1996)
7 T Hastie, J-H Friedman, R Tibshlrani.:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer
(2001)
8 R Ihaka and R Gentieman A language for
data analysis and graphics Journal of
Computational and Graphical Statistics,
5(3): 299-314 (1996)
9 J MacQueen.: Some metiiods for classification and analysis of multivariate observations, in proc of 5th Berkeley Symposium on Mathematical Statistics and
Probabili^, Berkeley, University of
California Press Vol.1, pp 281-297 (1967)
10 T-N Nguyen, L Drumond, T Horvitii, L Schmidt-Thieme.: Multi-Relational Factorization Models for Predicting Smdent Performance, in proc of the KDD 2011 Workshop on Knowledge Discovery in Educational Data (2011)
11 A-K Pal, S Pal.: Analysis and Mining of Educational Data for Predicting the
Performance of Smdents in International
Joumai of Electronics Communication and Computer Engineering Vol.4(5): 2278-4209
(2013)
12 J Pearl.: Bayesian Networks: a Model of Self-Activated Memory for Evidential Reasonmg in proc of Cognitive Science Socie^, UC Irvme, pp 329-334 (1985)
Trang 913, J,R Quinlan.: C4.5: Programs for Machine
Learning Morgan Kaufinann, San Mateo,
CA(1993)
14.V Vapnik.: The Nature of Statistical
Learning Theory Sprmger-Verlag (1995)
15 L.A Zadeh: Fuzzy sets Information and
Conti-oi 8(3): 338-353 (1965)
16 Le Thanh Minh.; Khai khoang dilm thi tdt
nghifp phuc vu danh gia phan loai hpc sinh
Luin vin Thgc sT Dgi hpc Khoa hpc Ty
nhidn TP.HCM (2002)
17 Nguyin Qudc Thong.: Phat ti-iln rapt s6 iing
dung kliai thie dft lieu vao giao dye dio tao
Luin van Thgc si Dai hoc Khoa hoc Tu
nhien TP.HCM (2002)
18 Nguyen Thai Nghe.: Mdt phan tich giua cic
ky thugt trong dir doan ket qui hgc tip Ky ylu Hdi thao qudc gia lin thii 10 vl cdng nghf tiidng tin, ttang 19-31 (2007)
19 Phan Dinh Thd Huin.: Nghidn cihi va img dung phuang phap khai md luit kit hpp tten
do lifu giao due Lu?in vin Thac sT Dai hpc Khoa hgc Ty nhidn TP.HCM (2009)
20 Nguydn Thi Vin Hao.: Xiy dyng hf thdng
dy doan ket qua tdt nghifp phd tbdng trung hpc Luan van Thac si Dgi hoc Lac Hdng, Ddng Nai (2011)
21 Nguyen Dang Nhupng: Khai pha dft lifu vl ket qui hoe tip eiia hpc smh ttudng Cao ding nghd Vin Lang Hd Ndi Luan vin Thgc sT Dai hgc Cdng nghf, DHQGHN (2012)