1. Trang chủ
  2. » Nghệ sĩ và thiết kế

Bài giảng 19. Đánh giá mô hình

21 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 21
Dung lượng 1,95 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Distribution drift : the distribution of data changes overtime, so keep track the models performance on the validation metrics of live data.... ▪ Ex: spam detection, prostitute detecti[r]

Trang 1

Sonpvh.2019.05.May

Trang 3

Evaluation

Trang 4

Why is it so complicated?

1 Offline evaluation: accuracy, precision, recall, MSE … Online evaluation: business metrics

2 Distribution drift : the distribution of data changes

overtime, so keep track the models performance on the validation metrics of live data.

Trang 6

Confusion Matrix

Th 0.5

Th = 0.5

TP 9

8

FN 1

1

Acc 0.85

Pre 0.81

Recall 0.9

F 0.85

Trang 7

6

Trang 9

𝐹PR = FP

TN + FP TPR = TP

TP + FN = RECALL (SENSITIVITY)

Trang 10

9

Trang 11

▪ Ex: prediction…

10

Trang 12

▪ Ex: spam detection, prostitute detection…

Trang 13

▪ Ex: search ranker, personalized recommendation

12

"The precision is the proportion of recommendations that are

good recommendations, and recall is the proportion of good

recommendations that appear in top recommendations."

Trang 14

▪ Evaluation metrics # model log loss function: Train a personalized recommender

by minimizing the loss between its predictions and observed ratings, and then use

this recommender to produce a ranked list of recommendations AVOID

▪ Skewed data, imbalanced, classes, outliers, rare data: analysis carefully before

doing anything else

13

Trang 15

Cross validation:

Independently and Identically distributed

Trang 16

▪ Model parameter: y = WT x

▪ Hyper-parameter (nuisance parameters): optimization state

▪ Ex:

▪ Linear regression: regularization parameter,

▪ Decision trees: desired depth and number of leaves

▪ SVMs: misclassification penalty term

Trang 17

1. Split into randomized control/experimentation groups

2. Observe behavior of both groups on the proposed methods

3. Compute test statistics

4. Output decision

16

Trang 18

1. Baggage of the old: should do A/A testing first

2. Choose metrics, indexes (business design)

3. Did you count right?

4. How many observations do you need?

5. Is the distribution of the metric Gaussian?

6. Variances equal?

7. Multiple models, multiple hypotheses: A/A1/A2/…/B testing

8. How long to run the test?

9. Catching distribution drift: stationarity assumption

17

Trang 19

1 (Conditional) independence

2 Common support

3 TOT = EP(X)|T=1 {E[Y (1) |T=1, P(X)] – E[Y (0) |T=0, P(X)]}

Trang 20

19

Trang 21

1. Alice Zheng - Evaluating Machine Learning Models - O'Reilly Media, Inc 2015

20

Ngày đăng: 13/01/2021, 05:11

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w