1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Bài 2 Slide Linear Regression

84 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Linear Regression
Định dạng
Số trang 84
Dung lượng 1,99 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Bài 2 Slide Linear Regression. Linear Regression Linear Regression Regression Given – Data X = x(1), , x(n) – Corresponding labels y = where x(i) y(1), , y(n) where y(i) 2 R 2 9 8 7 6 5 4 3 2 1 0 1975 1980 1985 1990 1995 2000 2005.

Trang 1

Linear Regression

Trang 3

• 97 samples, partitioned into 67 train / 30 test

• Eight predictors (features):

– 6 continuous (4 log transforms), 1 binary, 1 ordinal

• Continuous outcome variable:

– lpsa: log(prostate specific antigen level)

Prostate Cancer Dataset

Based on slide by Jeff Howbert

Trang 5

Least Squares Linear Regression

Trang 6

Based on example by Andrew

Intuition Behind Cost Function

Trang 7

Intuition Behind Cost Function

Trang 8

Intuition Behind Cost Function

Trang 9

Intuition Behind Cost Function

Trang 10

Intuition Behind Cost Function

11 Slide by Andrew Ng

Trang 11

Intuition Behind Cost Function

12 Slide by Andrew Ng

Trang 12

Intuition Behind Cost Function

13 Slide by Andrew Ng

Trang 13

Intuition Behind Cost Function

14 Slide by Andrew Ng

Trang 14

Intuition Behind Cost Function

15 Slide by Andrew Ng

Trang 15

Basic Search Procedure

• Choose initial value for

• Until we reach a minimum:

1 0

16 Figure by Andrew Ng

Trang 16

Basic Search Procedure

• Choose initial value for

• Until we reach a minimum:

1 0

17 Figure by Andrew Ng

Trang 17

Basic Search Procedure

• Choose initial value for

• Until we reach a minimum:

Since the least squares objective function is conv1ex (concave),

we don’t ne0ed to worry about local minima

Trang 18

J (✓) simultaneous update for j = 0 d

learning rate (small) e.g., α = 0.05

Trang 19

J (✓) simultaneous update for j = 0 d

For Linear Regression:

Trang 20

J (✓) simultaneous update for j = 0 d

For Linear Regression:

✓x — y (i )

! 2

Trang 21

J (✓) simultaneous update for j = 0 d

For Linear Regression:

✓x — y (i )

!

Trang 22

J (✓) simultaneous update for j = 0 d

For Linear Regression:

Trang 23

Gradient Descent for Linear Regression

• To achieve simultaneous update

• At the start of each GD iteration, compute h

• Use this stored value in the update step loop

x( i )

2

s X

Trang 24

Gradient Descent

h(x) = -900 – 0.1 x

25 Slide by Andrew Ng

Trang 25

Gradient Descent

26 Slide by Andrew Ng

Trang 26

Gradient Descent

27 Slide by Andrew Ng

Trang 27

Gradient Descent

28 Slide by Andrew Ng

Trang 28

Gradient Descent

29 Slide by Andrew Ng

Trang 29

Gradient Descent

30 Slide by Andrew Ng

Trang 30

Gradient Descent

31 Slide by Andrew Ng

Trang 31

Gradient Descent

32 Slide by Andrew Ng

Trang 32

Gradient Descent

33 Slide by Andrew Ng

Trang 33

Increasing value for J(✓)

To see if gradient descent is working, print out J(✓) each iteration

Trang 34

Extending Linear Regression to More Complex

Models

• e.g log, exp, square root, square, etc.

example: x3 = x1 x2

This allows use of linear regression techniques to fit non-linear datasets.

Trang 35

Linear Basis Function Models

• Generally,

• In the simplest case, we use linear basis functions :

Trang 36

Linear Basis Function Models

• Polynomial basis functions:

affects all basis functions

• Gaussian basis functions:

basis functions μj and s control location and scale

(width).

Based on slide by Christopher Bishop (PRML)

Trang 37

Linear Basis Function Models

• Sigmoidal basis functions:

where

These are also local; a small change in x only affects nearby basis functions μj and s

control location and scale (slope).

Based on slide by Christopher Bishop (PRML)

Trang 38

Example of Fitting a Polynomial Curve with a Linear Model

Trang 39

Linear Basis Function Models

• Basic Linear Model:

• Generalized Linear Model:

• Once we have replaced the data by the outputs of the basis functions, fitting the

generalized model is exactly the same problem as fitting the basic model

Based on slide by Geoff Hinton

Trang 40

Linear Algebra Concepts

Trang 41

• Transpose: reflect vector/matrix on line:

Trang 42

Based on slides by Joseph Bradley

• Vector dot product:

Trang 43

Based on slides by Joseph Bradley

Linear Algebra Concepts

Trang 44

✓1

d

7

Trang 45

1 1

x( 1 ) 1

x( 1 )

d

.

1

✓ =

2

6

64

0 1

d

77

R(d + 1 ) ⇥ 1 Rn ⇥ ( d + 1 )

Trang 46

1 2n

1 2n

y( n )

7 7

Trang 47

Closed Form Solution: ✓ = (X | X )

Closed Form Solution

Instead of using GD, solve for optimal ✓ analytically

Trang 48

Closed Form Solution

If X T X is not invertible (i.e., singular), may need to:

• In python, numpy.linalg.pinv(a)

y =

2

y ( 1)

6 6 4

y( 2 ) .

y( n )

7 7

X =

2

6 6

6 4

1 x1( 1 )

x ( 1 )

d

.

.

x (n )

.

.

Trang 49

Gradient Descent vs Closed Form

Trang 50

Improving Learning: Feature

Scaling

Makes gradient descent converge much faster

20

15

10

5 0

0 5 10 15 20

✓ 1

✓ 2

20 15 10 5 0

Trang 51

Feature Standardization

j

– Let μ be the mean of feature j:

sj is the standard deviation of feature j

for sj

• Must apply the same transformation to instances for both training and prediction

• Outliers can cause problems

Trang 52

Quality of Fit

Underfitting (high bias)

Overfitting:

• The learned hypothesis may fit the training set very well ( J (✓) ⇡ 0 )

• but fails to generalize to new examples

Correct fit

Based on example by Andrew Ng

Trang 53

• Can also address overfitting by eliminating features (either manually or via model

selection)

Trang 54

– λ is the regularization parameter (λ

0)

J (✓) =

1 2n

Trang 55

Understanding Regularization

• Note that

j = 1

– This is the magnitude of the feature coefficient vector!

• We can also think of this as:

λ2

d

X

j = 1

j 2

Trang 58

Regularized Linear Regression

✓0 ← ✓0 — ↵

1n

λ2

Trang 59

Regularized Linear Regression

60

1 n

j ← ✓j — ↵

1 n

Trang 60

Regularized Linear Regression

Trang 61

Regularized Linear Regression

• To incorporate regularization into the closed form

solution:

• Can derive this the same way, by solving

• Can prove that for λ > 0, inverse exists in the equation

Trang 62

Logistic Regression

Trang 63

Classification Based on Probability

• Instead of just predicting the class, give the probability of the instance being that class

• Comparison to perceptron:

Trang 64

Logistic / Sigmoid Function

Trang 65

Interpretation of Hypothesis Output

Therefore, p(y = 0 | x ; ✓) = 1 — p(y = 1 | x ; ✓)

Based on example by Andrew Ng

Trang 66

Another Interpretation

• Equivalently, logistic regression assumes that

• In other words, logistic regression assumes that the log odds is a linear function of

5

p(y = 0 | x ; ✓)

p(y = 1 | x ; ✓)

Side Note: the odds in favor of an event is the quantity

p / (1 − p), where p is the probability of the event

E.g., If I toss a fair dice, what are the odds that I will have a 6?

odds of y = 1

Based on slide by Xiaoli Fern

Trang 67

Based on slide by Andrew Ng

✓| x should be large negative values for negative instances

✓| x should be large positive values for positive instances

Trang 68

Non-Linear Decision Boundary

• Can apply basis function expansion to features, same as with linear regression

1

68

x =

x

x

1 2

2

6666666664

x x

1

x x

2 2 2

77777777

Trang 69

✓1

Trang 70

Logistic Regression Objective Function

• Can’t just use squared loss as in linear regression:

results in a non-convex optimization

Trang 71

Deriving the Cost Function via Maximum Likelihood Estimation

• Likelihood of data is given by:

Trang 72

Deriving the Cost Function via Maximum Likelihood Estimation

• Substitute in model, and take negative to yield

Logistic regression objective:

Trang 73

Intuition Behind the Objective

• Can re-write objective function as

Trang 74

Intuition Behind the Objective

Trang 75

Intuition Behind the Objective

cost

75 Based on example by Andrew Ng

Trang 76

Intuition Behind the Objective

cost ( h✓ ( x ) , y) =

— log(1 — h✓ ( x ) ) if y = 0

1 0

cost

If y = 1 If y

= 0

76 Based on example by Andrew Ng

Trang 77

Regularized Logistic Regression

• We can regularize logistic regression exactly as before:

= J (✓) + k [1:d ] k 2

2

Trang 78

Gradient Descent for Logistic Regression

• Repeat until convergence

Trang 79

Gradient Descent for Logistic Regression

Trang 80

Gradient Descent for Logistic Regression

• Repeat until convergence

• Initialize

1

Trang 81

Multi-Class Classification

x1

Disease diagnosis:

x1

healthy / cold / flu / pneumonia

Object classification: desk / chair / monitor / bookcase

81

Trang 83

Multi-Class Logistic Regression

• Train a logistic regression classifier for each class

Trang 84

exp(✓Tx)

C

c = 1 exp(✓Tx)c

Implementing Multi-Class Logistic

Regression

c

• Gradient descent simultaneously updates all parameters for all models

– Same derivative as before, just with

the above hc(x)

• Predict class label as the most probable label

max h c (x )

Ngày đăng: 18/10/2022, 09:38

w