Running the Multiple Regression

Một phần của tài liệu Spss® Data Analysis For Univariate, Bivariate And Multivariate Statistics (2019).Pdf (Trang 117 - 123)

Recall the nature of the model we wish to run. We can specify the equation for the regression as follows:

GAF AGE PRETHERAPY N THERAPY_ To run the regression:

ANALYZE → REGRESSION → LINEAR

● We move GAF over to the Dependent box (since it is our dependent or “response” variable).

● We move AGE, PRETHERAPY, and N_THERAPY over to the Independent(s) box (since these are our predictors, they are the variables we wish to have simultaneously predict GAF).

● Below the Independent(s) box is noted Method and is currently, by default, set at Enter. What this means is that SPSS will con- duct the regression on all predictors simultane- ously rather than in some stepwise fashion (forward selection, backward selection, and stepwise selection are other options for regres- sion analysis, as we will soon discuss).

Next, we will click the box Statistics and select some options:

When we run the multiple regression, we obtain the following (below is the syntax that represents the selections we have made via the GUI):

REGRESSION

/DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE

/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN /DEPENDENT GAF

/METHOD=ENTER AGE PRETHERAPY N_THERAPY /CASEWISE PLOT(ZRESID) OUTLIERS(3).

Descriptive Statistics Mean

GAF AGE PRETHERAPY N_THERAPY

28.0000 26.8000 54.8000 13.2000

15.89549 7.77174 3.88158 9.01604

10 10 10 10 Std. Deviation N

● Under Regression Coefficients, we have selected Estimates and Confidence Intervals (at a level of 95%). We have also selected Model Fit, R‐squared Change, Descriptives, Part and Partial Correlations, and Collinearity Diagnostics. Under Residuals, we have selected Casewise Diagnostics and Outliers outside of three standard deviations. Click on Continue. We would have selected the Durbin–Watson test had we had time series data and wished to learn whether evi- dence existed that errors were correlated. For details on time series models, see Fox (2016, chapter 16).

● There are other options we can select under Plots and Save in the main Linear Regression window, but since most of this information pertains to evaluating residuals, we postpone this step until later after we have fit the model. For now, we want to get on with obtaining output for our regression and demon- strating the interpretation of parameter estimates.

To the left are some of the descriptive statistics we had requested for our regression. This is the same information we would obtain in our exploratory survey of the data. It is helpful however to verify that N = 10 for each variable, oth- erwise it would indicate we have missing values or incom- plete data. In our output, we see that GAF has a mean of 28.0, AGE has a mean of 26.8, PRETHERAPY has a mean of 54.8, and N_THERAPY has a mean of 13.2. Standard devia- tions are also provided.

Correlations

Pearson Correlation

Sig. (1-tailed) GAF AGE PRETHERAPY N_THERAPY

1.000 .797 .686 .493

.797 1.000 .411 .514

.686 .411 1.000 .478

.493 .514 .478 1.000

GAF AGE PRETHERAPY N_THERAPY

GAF AGE PRETHERAPY N_THERAPY

.003 . .119 .064

.014 .119 . .081

.074 .064 .081 . .

.003 .014 .074

N GAF

AGE PRETHERAPY N_THERAPY

10 10 10 10

10 10 10 10

10 10 10 10

10 10 10 10

Variables Entered/Removeda Variables

Entered

Variables Removed Model

1 N_THERAPY,

PRETHERAPY, AGEb

Enter .

Method

a. Dependent Variable: GAF b. All requested variables entered.

SPSS also provides us with a matrix of Pearson correlation coefficients between all variables, along with p‐values (Sig. one‐

tailed) denoting whether they are statisti- cally significant. Having already surveyed the general bivariate relationships among variables when we plotted scatterplots, this matrix provides us with further evi- dence that variables are at least somewhat linearly related in the sample. We do not care about the statistical significance of correlations for the purpose of performing the multiple regression, and since sample size is quite small to begin with (N = 10), it is hardly surprising that many of the cor- relations are not statistically significant.

For details on how statistical significance can be largely a function of sample size, see Denis (2016, chapter 3).

Next, SPSS reports on which variables were entered into the regression and which were left out. Because we conducted a “full‐

entry” regression (recall we had selected Enter under Method), all of our variables will be entered into the regression simultaneously, and none removed. When we do forward and stepwise regres- sions, for instance, this Variables Removed box will be a bit busier!

Model Summaryb

Change Statistics Model

a. Predictors: (Constant), N_THERAPY, PRETHERAPY, AGE b. Dependent Variable: GAF

R

.890a .791 .687 8.89418 .791 7.582 3 6 .018

R Square Adjusted R Square

Std. Error of the Estimate

R Square

Change F Change df1 df2 Sig. F

Change 1

Above is the Model Summary for the regression. For a relatively detailed account of what all of these statistics mean and the theory behind them, consult Denis (2016, chapters 8 and 9) or any book on regression. We interpret each statistic below:

R of 0.890 represents the coefficient of multiple correlation between the response variable (GAF) and the three predictors considered simultaneously (AGE, PRETHERAPY, N_THERAPY). That is, it is the cor- relation between GAF and a linear combination of AGE + PRETHERAPY, and N_THERAPY. Multiple R can range in value from 0 to 1.0 (note that it cannot be negative, unlike ordinary Pearson r on two variables that ranges from −1.0 to +1.0).

Next, SPSS reports the ANOVA summary table for our analysis:

R‐square is the coefficient of multiple correlation squared (called the coefficient of multiple determi- nation) and represents the proportion of variance in the response variable accounted for or

“explained” by simultaneous knowledge of the predictors. That is, it is the proportion of variance accounted for by the model, the model being the regression of GAF on the linear combination of AGE + PRETHERAPY, and N_THERAPY.

Adjusted R‐square is an alternative version of R‐square and is smaller than R‐square (recall we had discussed Adjusted R‐square earlier in the context of simple linear regression). Adjusted R‐square takes into consideration the number of parameters being fit to the model relative to the extent to which they contribute to model fit.

Std. Error of the Estimate (standard error of the estimate) is the standard deviation of residuals for the model (with different degrees of freedom than the typical standard deviation). A very small esti- mate here would indicate that the model fits fairly well, and a very high value is suggestive that the model does not provide a very good fit to the data. When we interpret the ANOVA table for the regression shortly, we will discuss its square, which is the Variance of the Estimate.

● Next, SPSS reports “Change Statistics.” These are more applicable when we conduct hierarchical, for- ward, or stepwise regression. When we add predictors to a model, we expect R‐square to increase.

These change statistics tell us whether the increment in R‐square is statistically significant, crudely meaning that it is more of a change than we would expect by chance. For our data, since we entered all predictors simultaneously into the model, the R‐square Change is equivalent to the original R‐

square statistic. The F‐change of 7.582 is the F‐statistic associated with the model, on the given degrees of freedom of 3 and 6, along with the p‐value of 0.018. Notice that this information dupli- cates the information found in the ANOVA table to be discussed shortly. Again, the reason for this is because we had performed a full‐entry regression. Keep an eye on your Change Statistics when you do not enter your predictors simultaneously to get an idea of how much more variance is accounted for by each predictor entered into the model.

a. Dependent Variable: GAF

b. Predictors: (Constant), N_THERAPY, PRETHERAPY, AGE ANOVAa

Model

1 Regression Residual Total

1799.362 474.638 2274.000

3 6 9

599.787 79.106

7.582 .018b Sum of

Squares df Mean Square F Sig.

The ANOVA table for regression reveals how the variance in the regression was partitioned, analogous to how the ANOVA table does the same in the Analysis of Variance procedure. Briefly, here is what these numbers indicate:

SS Total of 2274.000 is partitioned into SS Regression (1799.362) and SS Residual (474.638). That is, 1799.362 + 474.638 = 2274.000.

● What makes our model successful in accounting for variance in GAF? What would make it successful is if SS Regression were large relative to SS Residual. SS Regression measures the variability due to imposing the linear regression equation on the data. SS Residual gives us a measure of all the

Next, SPSS reports the coefficients for the model, along with other information we requested such as confidence intervals, zero‐order, partial, and part correlations and collinearity statistics:

variability not accounted for by the model. Naturally then, our hope is that SS Regression is large relative to SS Residual. For our data, it is.

● To get a measure of how much SS Regression is large relative to the total variation in the data, we can take the ratio SS Regression/SS Total, which yields 1799.362/2274.000 = 0.7913. Note that this value of 0.7913 is, in actuality, the R‐square value we found in our Model Summary Table. It means that approximately 79% of the variance in GAF is accounted for by our three predictors simultaneously.

● The degrees of freedom for Regression, equal to 3, are equal to the number of predictors in the model (3).

● The degrees of freedom for Residual are equal to n – k – 1, where “n” is sample size. For our data, we have 10 – 3 – 1 = 6.

● The degrees of freedom for Total are equal to the sum of the above degrees of freedom (i.e. 3 + 6 = 9).

It is also equal to the number of cases in the data minus 1 (i.e. 10 – 1 = 9).

● The Mean Square for Regression, equal to 599.787, is computed as SS Regression/df = 1799.362/3 = 599.787.

● The Mean Square for Residual, equal to 79.106, is computed as SS Residual/df = 474.638/6 = 79.106.

The number of 79.106 is called the variance of the estimate and is the square of the standard error of the estimate we considered earlier in the Model Summary output. Recall that number was 8.89418.

The square root of 79.106 is equal to that number.

● The F‐statistic, equal to 7.582, is computed by the ratio MS Regression to MS Residual. For our data, the computation is 599.787/79.106 = 7.582.

● The p‐value of 0.018 indicates whether obtained F is statistically significant. Conventional signifi- cance levels are usually set at 0.05 or less. What the number 0.018 literally means is that the probabil- ity of obtaining an F‐statistic as we have obtained (i.e. 7.582) or more extreme is equal to 0.018. Since this value is less than a preset level of 0.05, we deem F to be statistically significant and reject the null hypothesis that multiple R in the population from which these data were drawn is equal to zero. That is, we have evidence to suggest that multiple R in the population is unequal to zero.

Coefficientsa Model

Unstandardized Coefficients

Standardized

Coefficients 95.0% Confidence Interval

for B Correlations Collinearity Statistics B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part Tolerance VIF 1 (Constant)

AGE PRETHERAPY N_THERAPY

–106.167 1.305 1.831 –.086

45.578 .456 .891 .408

.638 .447 –.049

–2.329 2.863 2.054 –.210

.059 .029 .086 .840

–217.692 .190 –.350 –1.084

5.357 2.421 4.011 .912

.797 .686 .493

.760 .643 –.086

.534 .383 –.039

.700 .735 .650

1.429 1.361 1.538 a. Dependent Variable: GAF

We interpret the numbers above:

● SPSS reports that this is Model 1, which consists of a constant, AGE, PRETHERAPY, and N_THERAPY. The fact that it is “Model 1” is not important, since it is the only model we are running. Had we performed a hierarchical regression where we were comparing alternative models, then we may have 2 or 3 or more models, and hence the identification of “Model 1” would be more relevant and important.

● The Constant in the model is the intercept of the model. It is the predicted value for the response vari- able GAF for values of AGE, PRETHERAPY, and N_THERAPY all equal to 0. That is, it answers the ques- tion, What is the predicted value for someone of zero age, zero on PRETHERAPY, and zero on N_THERAPY?

Of course, the question makes little sense, since nobody can be of age zero! For this reason, predictors in a model are sometimes mean centered if one wishes to interpret the intercept in a meaningful way.

Mean centering would subtract the mean of each variable from the given score, and hence a value of AGE = 0 would no longer correspond to actual zero on age, but rather would indicate MEAN AGE.

Regressions with mean centering are beyond the scope of our current chapter, however, so we leave this topic for now. For details, see Draper and Smith (1995). As it stands, the coefficient of −106.167 represents the predicted value for GAF when AGE, PRETHERAPY, and N_THERAPY are all equal to 0.

● The coefficient for AGE, equal to 1.305, is interpreted as follows: for a one‐unit increase in AGE, on aver- age, we expect GAF to increase by 1.305 units, given the inclusion of all other predictors in the model.

● The coefficient for PRETHERAPY, equal to 1.831, is interpreted as follows: for a one‐unit increase in PRETHERAPY, on average, we expect GAF to increase by 1.831 units, given the inclusion of all other predic- tors in the model.

● The coefficient for N_THERAPY, equal to −0.086, is interpreted as follows: for a one‐unit increase in N_THERAPY, on average, we expect GAF to decrease by 0.086 units, given the inclusion of all other predic- tors in the model. It signifies a decrease because the coefficient is negative.

● The estimated standard errors in the next column are used in computing a t‐test for each coefficient, and ultimately helping us decide whether or not to reject the null hypothesis that the partial regres- sion coefficient is equal to 0. When we divide the Constant of −106.167 by the standard error of 45.578, we obtain the resulting t statistic of −2.329 (i.e. −106.167/45.578 = −2.329). The probability of such a t or more extreme is equal to 0.059 (Sig. for the Constant). Since it is not less than 0.05, we decide to not reject the null hypothesis. What this means for this data is that we have insufficient evidence to doubt that the Constant in the model is equal to a null hypothesis value of 0.

● The standard error for AGE is equal to 0.456. When we divide the coefficient for AGE of 1.305 by 0.456, we obtain the t statistic of 2.863, which is statistically significant (p = 0.029). That is, we have evidence to suggest that the population partial regression coefficient for AGE is not equal to 0.

● The standard errors for PRETHERAPY and N_THERAPY are used in analogous fashion. Both PRETHERAPY and N_THERAPY are not statistically significant at p < 0.05. For more details on what these standard errors mean theoretically, see Fox (2016).

● The Standardized Coefficients (Beta) are partial regression coefficients that have been computed on z‐scores rather than raw scores. As such, their unit is that of the standard deviation. We interpret the coefficient for AGE of 0.638 as follows: for a one‐standard deviation increase in AGE, on average, we expect GAF to increase by 0.638 of a standard deviation. We interpret the other two Betas (for PRETHERAPY and N_THERAPY) in analogous fashion.

● Next, we see the 95% Confidence Interval for B with lower and upper bounds. We are not typically interested in the confidence interval for the intercept, so we move right on to interpreting the confi- dence interval for AGE. The lower bound is 0.190 and the upper bound is 2.421. We are 95% confident that the lower bound of 0.190 and the upper bound of 2.421 will cover (or “capture”) the true popula- tion regression coefficient. We interpret the confidence intervals for PRETHERAPY and N_THERAPY in analogous fashion.

● Next are the zero‐order, partial, and part correlations. Zero‐order correlations are ordinary bivari- ate correlations between the given predictor and the response variable not taking into account

Một phần của tài liệu Spss® Data Analysis For Univariate, Bivariate And Multivariate Statistics (2019).Pdf (Trang 117 - 123)

Tải bản đầy đủ (PDF)

(206 trang)