1.Mô tả bài toán Meta Data được sử dụng để đưa ra lời khuyên về phương pháp phân loại thích hợp cho một số liệu cụ thể (lấy từ các kết quả của Statlog dự án) 2.Xây dựng cơ sở dữ liệu -Dataset sửdụng: Meta Data -Thông tin dataset -Các thuộc tính: DS_Name{Aust_Credit,BT,Belgian,CUT,Chromosone,Credit,DNA,Diabetes,Digits,Faults,German_Credit,Head,Heart,KlDigits,Letters,NewBelgian,SatImage,Segment,Shuttle,Technical,TseTse,Vehicle} T real N real p real k real Bin real Cost real SDratio real correl real cancor1 real cancor2 real fract1 real fract2 real skewness real kurtosis real Hc real Hx real MCx real EnAtr real NSRatio real Alg_Name {Ac2,Alloc80,BackProp,Bayes,BayesTree,C4.5,CART,Cal5,Cascade,Castle,Cn2,Default,Dipol92,Discrim,ITrule,IndCART,KNN,Kohonen,LVQ,LogDisc,NewId,QuaDisc,RBF,Smart} Norm_error real -Training data 3.TIến hành trong Weka -Đưa dữ liệu vào Weka
Trang 1BTL KHAI PHA DU LIEU
Dé bai :Meta Data
Sinh viên thực hiện: Vũ Lê Hồng
Lớp : HTTT6 GVHD :H6 Nhat Quang
1.Mơ tả bài tốn
Meta Data được sử dụng để đưa ra lời khuyên về phương pháp phân loại thích hợp cho một số liệu cụ thể (lấy từ các kết quả của Statlò dự
án)
2.Xây dựng cơ sở dữ liệu
Trang 2Meta Data 2011
-Dataset sudung: Meta Data
- Thông tin dataset
Data Set Characteristics: || Multivariate Number of Instances: || 528 || Area: N/A Attribute Characteristics: || Categorical, Integer, Real || Number of Attributes: || 22 || Date Donated 1996-03-01 Associated Tasks: | Classification Missing Values? 10937
-Các thuộc tính:
DS Name{Aust_Credit,BT,Belgian,CUT,Chromosone,Credit,DNA,Diabetes,Digits,Faults,G
erman Credit,Head,Heart,KlDigits, Letters ,NewBelgian,SatImage, Segment, Shuttle, Tech
nical, TseTse, Vehicle}
T real
N real
p real
k real
Bin real
Cost real
SDratio real
correl real
cancorl real
cancor2 real
fractl real
fract2 real
skewness real
kurtosis real
Hc real
Hx real
MCx real
EnAtr real
NSRatio real
Alg_ Name
{Ac2,Alloc80,BackProp, Bayes, BayesTree,C4.5,CART, Cal5, Cascade, Castle, Cn2,Default,D 1poL92,D1scrim, TTruLe, TndCART ,KNN, Kohonen, LVQ,LogD1sc, NewTd, QuaD1sc, RBF , Smart}
Norm_error real
-Training data
3.TIến hành trong Weka
-Dua dt liéu vao Weka
Trang 3Í # :
Preprocess | Classify | Cluster | Associate | Select attributes | Visualize |
| Openafie |[ OpenURL || OpenDB || Geneate | Undo J{ Edit l Save |
Filter
[ Choose |None Í Apply | Current relation Selected attribute
Relation: meta Name: T Type: Numeric
Instances: 528 Attributes: 22 Missing: 0 (0%) Distinct: 20 Unique: 0 (0%)
Minimum 270 Mean 4569.045
=|| | Class: Norm_error (Num) >| Visualize All |
360
48
24 24 24 24 24
CT TL]
-Sử dụng toàn bộ dữ liệu dé training
Trang 4
Meta Data 2011
Preprocess | Classify | Cluster | Associate | Select attributes | Visualize |
Classifier
Test options Classifier output
© Use training set kurtosis ˆ
Hc
© Supplied test set Set Hx
@ Cross-validation Folds 10 | MCx
= > ] EnAtr
© Percentage split ⁄ |66 | NSRatio
[ More options | Alg_Name
Norm_error Test mode:10-fold cross-validation (Num) Norm_error xv
— === Classifier model (full training set) ===
Result list (right-click for options) ZeroR predicts class value: 99.55247727272732 E
07:13:56 - rules.ZeroR
Time taken to build model: Oseconds
=== Cross-validation ===
=== Summary ===
Mean absolute error 151.5401 Root mean squared error 764.994 Relative absolute error 100 $ Root relative squared error 100 $ Total Number of Instances 528
Status
-Nội dung kếtquả
=== Classifier model (full training set) ===
ZeroR predicts class value: 99.55247727272732
Time taken to build model: Oseconds
=== Cross-validation ===
Trang 5Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
-Cay quyét dinh
-0.1108 151.5401 764.994
100 %
100 %
528
|| Weka Classifier Tree Visualizer: 16:38:21 - treesJ48 (acute) L=i
Tree View
mu
= no
meno
`
-05% dữ liệu để xây dựng, 5% dữ liệu test
5
Trang 6Meta Data 2011
lợi
Fereprocess| Cz=5y [cher | Associate | Sclectatibutes [use]
Classifier
| Choose |ZeroR
Test options
© Use training set
Hc
© Supplied test set Set Hx
© Cross-validation Folds MCx
EnAtr
@ Percentage spit % NSRatio
[ More options | Alg_Name
Norm_error
| Test mode:split 95.0% train, remainder test
vr
| (Num) Norm_error
Result list (right-click for options) ZeroR predicts class value: 99.55247727272732
07:13:56 - rules.ZeroR
07:17:57 -rules.ZeroR Time taken to build model: Oseconds
=== Evaluation on test split ===
Correlation coefficient 0 Mean absolute error 217.8454 Root mean squared error 603.4871 Relative absolute error 100 $ Root relative squared error 100 $ Total Number of Instances 26
-90% dữliệuđêxâydựng, 10% dữliệu test
-85% dữliệuđêxâydựng, 15% dữ liệu test
Trang 7“® Weka Explorer
Classifier
I|[ Preprocess| Classify | Cluster | Associate | Select attributes | visualize |
| Choose |348-C0.25-m2
Test options Classifier output
© Use training set
E
- Size of the tree : 7
© Supplied testset | = Set |
© Cross-validation Folds
@ Percentage split % |35 Time taken to build model: 0 seconds
More options === Evaluation on test split ===
=== Summary ===
| (Nom) Viém th?n v |
Correctly Classified Instances 18 100 $
| Start [ Stop Incorrectly Classified Instances 08 08 $
Kappa statistic 1
Result list (right-dick for options) Mean absolute error 0
16:38:21 - trees.J48 Root mean squared error 0
16:42:26 - trees J48 Relative absolute error 0
16:46:31 - trees.J48 Root relative squared error 0
5 : Total Number of Instances 18
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
Weighted Avg 1 0 1 1 1
=== Confusion Matrix ===
a b < classified a3
Status
-70% dữliệuđêxâydựng, 30% dữliệu test
Trang 8Meta Data
-
® Weka Explorer
Preprocess | Classify | Cluster | Associate | Select attributes | Visualize |
Classifier
Choose | 348 -C.0.25-M2
Test options Classifier output
© Use training set Number of Leaves : 4
© Cross-validation Folds
@ Per ge split % m ` Time taken to build model: 0 seconds
Size of the tree : 7
=== Evaluation on test split ===
| ion) vim thn " === Summary ===
[stat] Stop Correctly Classified Instances 36 100 $
Incorrectly Classified Instances 0 0 $
Result list (right-click for options) Kappa statistic 1
16:38:21 - trees.148 Mean absolute error 0
16:42:26 - trees 148 Root mean squared error 0
16:46:31 - trees.J48 Relative absolute error 0 $
16:47:52 - trees 148 Root relative squared error 0 $
16:48:21 - trees.J48 Total Number of Instances 36
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
Weighted Ava 1 0 1 1 1 1
=== Confusion Matrix ===
| a b- < classified as
24 0 | a=ne
Status
-50% dữliệuđêxâydựng, 50% dữliệu test
Trang 9
Classifier
Test options Classifier output
Size of the tree : 7
© Supoed test et
© Cross-validation Folds [10 |
Time taken to build model: 0.01 seconds
@ Percentage split %
More options === Evaluation on test split ===
=== Summary ===
| ion) vim tn v _
Correctly Classified Instances 60 100 $ [ Start ] Stop | Incorrectly Classified Instances 0 0 $
Kappa statistic 1
Result list (right-click for options) Mean absolute error 0
16:38:21 - trees 148 Root mean squared error 0
16:42:26 - trees.J48 Relative absolute error 0
16:46:31 - trees.148 Root relative squared error 0
16:47:52 - trees.148 Total Number of Instances 60
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision
1 0 1
1 0 1 Weighted Avg 1 0 1
=== Confusion Matrix ===
a b < classified as
39 0| a=no
0 21 | b = yes
Recall F-Measure
1 1
1 1
1 1
ROC Area Class
no yes
Status