Comparing Slope Estimates with Stata

Một phần của tài liệu Statistical modeling for medical researcher (Trang 84 - 88)

The following log file and comments illustrate how to perform the calcula- tions and draw the graphs from the preceding section using Stata.

. * 2.20.Framingham.log . *

. * Regression of log systolic blood pressure against log body mass . * index at baseline in men and women from the Framingham Heart Study.

. *

. use C:\WDDtext\2.20.Framingham.dta, clear {1}

. generate logsbp = log(sbp) {2}

. generate logbmi = log(bmi) (9 missing values generated)

. codebook sex {3}

{Output omitted}

tabulation: Freq. Numeric Label

2049 1 Men

2650 2 Women

. regress logsbp logbmi if sex==1 {4}

Source| SS df MS Number of obs = 2047

---+ --- F( 1, 2045) = 137.93 Model| 2.59012941 2.5901294 Prob > F = 0.0000 Residual| 38.4025957 2045 .018778775 R-squared = 0.0632 ---+ --- Adj R-squared = 0.0627 Total| 40.9927251 2046 .020035545 Root MSE = .13704

66 2. Simple linear regression

--- logsbp| Coef. Std. Err. t P>|t| [95% Conf. Interval]

---+--- logbmi| .272646 .0232152 11.744 0.000 .2271182 .3181739 _cons| 3.988043 .0754584 52.851 0.000 3.84006 4.136026 ---

. predict yhatmen, xb {5}

(9 missing values generated)

. regress logsbp logbmi if sex==2 {6}

Source| SS df MS Number of obs = 2643

---+ --- F( 1, 2641) = 461.90 Model| 12.0632111 1 12.0632111 Prob > F = 0.0000 Residual| 68.9743032 2641 .026116737 R-squared = 0.1489 ---+ --- Adj R-squared = 0.1485 Total| 81.0375143 2642 .030672791 Root MSE = .16161 --- logsbp| Coef. Std. Err. t P>|t| [95% Conf. Interval]

--- logbmi| .3985947 .0185464 21.492 0.000 .3622278 .4349616 _cons| 3.593017 .0597887 60.095 0.000 3.475779 3.710254 --- . predict yhatwom, xb

(9 missing values generated) . sort logbmi

. graph logsbp yhatwom logbmi if sex==2, connect(.l[-#]) symbol(oi) {7}

>xlabel(2.71,3.4,3.81,4.09) xtick(3.0,3.22,3.56,3.69,3.91,4.01)

>ylabel(4.61,5.01,5.30,5.52) ytick(4.38,4.5,4.7,4.79,4.87,4.94,5.08,5.14,

>5.19,5.25,5.35,5.39,5.43,5.48) gap(4) yscale(4.382,5.599)

{Graph omitted. See Figure 2.14, right panel}

. graph logsbp yhatmen yhatwom logbmi if sex==1 , connect(.ll[-#]) symbol(oii)

>xlabel(2.71,3.4,3.81,4.09) xtick(3.0,3.22,3.56,3.69,3.91,4.01)

>ylabel(4.61,5.01,5.30,5.52) ytick(4.38,4.5,4.7,4.79,4.87,4.94,5.08,5.14,

>5.19,5.25,5.35,5.39,5.43,5.48) gap(4) yscale(4.382,5.599)

{Graph omitted. See Figure 2.14, left panel} . generate s2 = (0.018778775*2045 + 0.026116737*2641)/(2047 + 2643 - 4) {8}

. generate varb_dif = s2*(0.0232152ˆ2/0.018778775+0.0185464ˆ2/0.026116737) {9} . generate t = (0.272646 - 0.3985947)/sqrt(varb_dif) {10}

67 2.20. Comparing slope estimates with Stata

. generate ci95_lb = (0.272646 - 0.3985947) {11}

> - invttail(4686,.025)*sqrt(varb_dif) . generate ci95_ub = (0.272646 - 0.3985947)

> + invttail(4686,.025)*sqrt(varb_dif)

. list s2 varb_dif t ci95_lb ci95_ub in 1/1 {12}

s2 varb_dif t ci95_lb ci95_ub

1. .0229144.0009594-4.066185 -.1866736 -.0652238

. display 2*ttail(4686,abs(t)) {13}

.00004857

Comment

1 This data set contains long term follow-up on 4699 people from the town of Framingham. In this example, we focus on three variables collected at each patient’s baseline exam:sbp, bmiandsex. The variablesbprecords systolic blood pressure in mm Hg;bmirecords body mass index in kg/m2. 2 An exploratory data analysis (not shown here) indicates that the relation- ship between log[sbp] and log[bmi] comes closer to meeting the assump- tions of a linear model than does the relationship betweensbpandbmi.

3 There are 2049 men and 2650 women in this data set;sexis coded 1 or 2 for men or women, respectively.

4 This regression command is restricted to records wheresex==1is true.

That is, to records of men. The statistics from this regression that are also in Table 2.1 are highlighted. Two of the 2049 men in this data set are missing values for eithersbporbmi, giving a total of 2047 observations in the analysis.

5 The variableyhatmencontains the expected value of each man’s log[sbp]

given his body mass index. These expected values are based on the regression of logsbp againstlogbmi among men. There are nine sub- jects with missing values oflogbmi(two men and seven women). The variableyhatmenis missing for these people. Note that thispredictcom- mand definesyhatmenfor all subjects including women. The command predict yhatmen if sex==1, xbwould have definedyhatmenfor men only.

6 This regression oflogsbpagainstlogbmiis restricted to women with non- missing values ofsbpandbmi.

7 This graph is similar to the right panel of Figure 2.14. The axis labels are written in terms of logsbpandlogbmi. In Figure 2.14 these labels have been replaced by the corresponding values ofsbpandbmiusing a graphics editor.

68 2. Simple linear regression

In Figure 2.14 we want thex- andy-axes to be drawn to the same scales in order to facilitate comparisons between men and women. By default, the range of they-axis includes allytickandylabelvalues plus all values of they-variable. In panels of Figure 2.14 the range of values oflogsbpis dif- ferent for men and women and extends beyond theytickandylabelvalues.

To force the scale of they-axis to be the same for both men and women we use theyscaleoption, which specifies a minimum range for they-axis.

They-axes for men and women will be identical as long as this range in- cludes alllogsbpvalues for both men and women. Thexscaleoption works the same way for thex-axis, but is not needed here since thexlabelvalues span the range of observedlogbmivalues for both men and women.

In Figure 2.14 we distinguish between the regression lines for women and men by using a dashed line for women. Stata allows a wide variety of patterned lines. The desired pattern is specified by symbols placed in brackets following the connect symbol of theconnectoption. For exam- ple, in this command theconnect(.l[#])option contains two connect symbols: “.” and “l”. These symbols dictate how the first and second vari- ables are to be connected, which in this example arelogsbpandyhatwom (see comment 12 of Section 2.12). The second of these symbols, “l”, is followed by “[#]”,which specifies that a dashed line is to connect the observations of the second variable (yhatwom). Hence, the effect of this command is to use a dashed line for the regression oflogsbp against logbmiamong women. It is important when using a patterned line that the data be sorted by thex-axis variable prior to thegraphcommand.

See the Stata Graphics Manual for further details.

8 This command definess2to equals2in equation (2.37);s2is set equal to this constant for each record in the database.

9 This command definesvarb_dif to equal var(b1−b2) in equation (2.39).

10 This is thetstatistic given in equation (2.40).

11 The next two lines calculate the lower and upper bound of the 95%

confidence interval given in equation (2.41).

12 This command lists the values of s2, varb_dif, t, ci95_lbandci95_ub in the first record of the data file (all the other records contain identical values). Note that these values agree with those given fors2, var[b1−b2] and (b1−b2)±tn1+n2−4, 0.025√

var(b1−b2) in the example from Section 2.19.1.

13 The functionttail(df,t)gives the probability that atstatistic withdfde- grees of freedom is greater thant. The functionabs(t)gives the absolute value oft. H ence2*ttail(4686,abs(t))gives the two-sidedP value asso- ciated with atstatistic with 4686 degrees of freedom. In this example, t= −4.07 givingP = −0.00005.

69 2.22. Exercises

Một phần của tài liệu Statistical modeling for medical researcher (Trang 84 - 88)

Tải bản đầy đủ (PDF)

(405 trang)