The Stata Statistical Software Package

Một phần của tài liệu Statistical modeling for medical researcher (Trang 26 - 35)

The worked examples in this text are performed using Stata (2001). This software comes with excellent documentation. At a minimum, I suggest you

8 1. Introduction

read theirGetting Startedmanual. This text is not intended to replicate the Stata documentation, although it does explain the use of those commands needed in this text. The Appendix provides a list of these commands and the section number where the command is first explained.

1.3.1. Downloading Data from My Web Site

An important feature of this text is the use of real data sets to illustrate meth- ods in biostatistics. These data sets are located athttp://www.mc.vanderbilt.

edu/prevmed/wddtext/. In the examples, I assume that you have down- loaded the data into a folder on your C drive called WDDtext.I suggest that you create such a folder now. (Of course the location and name of the folder is up to you but if you use a different name you will have to modify the file address in my examples.) Next, use your web browser to go tohttp://www.mc.vanderbilt.edu/prevmed/wddtext/and click on the blue underlined text that says Data Sets. A page of data sets will appear. Click on 1.3.2.Sepsis. A dialog box will ask where you wish to download the sepsis data set. EnterC:/WDDtext and click the download button. A Stata data set called1.3.2.Sepsis.dtawill be copied to yourWDDtextfolder. Purchase a license forIntercooled Stata Release 7for your computer and install it fol- lowing the directions in theGetting Startedmanual. You are now ready to start analyzing data with Stata.

When you launch the Stata program you will see a screen with three win- dows. These are the Stata Command window where you will type your com- mands, the Stata Results window where output is written, and the Review window where previous commands are stored. A Stata command is exe- cuted when you press the Enter key at the end of a line in the command window. Each command is echoed back in the Results window followed by the resulting output or error message. Graphic output appears in a separate Stata Graph window. In the examples given in this text, I have adopted the following conventions: all Stata commands and output are written in a type- writer font (all letters have the same width). Commands are written in bold face while output is written in regular type. On command lines, variable names and labels and other text chosen by the user are italicized; command names and options that must be entered as is are not. Highlighted output is discussed in the comments following each example. Numbers in braces on the right margin refer to comments that are given at the end of the example.

Comments in the middle of an example are in braces and are written in a proportionally spaced font.

9 1.3. The Stata statistical software package

1.3.2. Creating Dot Plots with Stata

The following example shows the contents of the Results window after entering a series of commands in the Command window. Before replicating this example on your computer, you must first download1.3.2.Sepsis.dtaas described in the preceding section.

. * Examine the Stata data set 1.3.2.Sepsis.dta. Create a dot plot of {1}

. * baseline APACHE scores in treated and untreated patients . *

. use C:\WDDtext\1.3.2.Sepsis.dta {2}

. describe {3}

Contains data from C:\WDDtext\1.3.2.Sepsis.dta

obs: 455

vars: 2 16 Apr 2002 15:36

size: 5,460 (99.4% of memory free)

--- 1. treat float %9.0g treatment Treatment

2. apache float %9.0g Baseline APACHE Score --- Sorted by:

. list treat apache in 1/3 {4}

treat apache

1. Placebo 27

2. Ibuprofen 14

3. Placebo 33 {5}

. edit {6}

. dotplot apache, by(treat) center {7}

{Graph omitted. See Figure 1.8}

Comments

1 Command lines that start with an asterisk (∗) are treated as comments and are ignored by Stata.

2 Theusecommand specifies the name of a Stata data set that is to be used in subsequent Stata commands. This data set is loaded into memory where it may be analyzed or modified. In Section 4.21 we will illustrate how to create a new data set using Stata.

10 1. Introduction

3 Thedescribecommand provides some basic information about the current data set. The1.3.2.Sepsisdata set contains 454 observations. There are two variables calledtreatandapache. The labels assigned to these variables are TreatmentandBaseline APACHE Score.

4 Thelistcommand gives the values of the specified variables;in 1/3restricts this listing to the first through third observations in the file.

5 At this point the Review, Variables, Results, and Command windows should look like those in Figure 1.6. (The size of these windows has been changed to fit in this figure.) Note that if you click on any command in

Figure 1.6 The Stata Review, Variables, Results, and Command windows are shown im- mediately after thelist command is given in Example 1.3.2. The shapes and sizes of these windows have been altered to fit in this figure.

11 1.3. The Stata statistical software package

Figure 1.7 The Stata Editor shows the individual values of the data set, with one row per patient and one column per variable.

the Review window it will appear in the Command window where you can edit and re-execute it. This is particularly useful for fixing command errors. When entering variables in a command you may either type them directly or click on the desired variable from the Variables window. The latter method avoids spelling mistakes.

6 Typingeditopens the Stata Editor window (there is a button on the toolbar that does this as well). This command permits you to review or edit the current data set. Figure 1.7 shows this window, which presents the data in a spreadsheet format with one row per patient and one column per variable.

7 Thisdotplotcommand generates the graph shown in Figure 1.8. This figure appears in its own Graph window. A separate dotplot of the APACHE variable is displayed for each value of thetreatvariable;centerdraws the dots centered over each treatment value. Stata graphs can either be saved as separate files or cut and pasted into a graphics editor for additional modification (see the File and Edit menus, respectively).

1.3.3. Stata Command Syntax

Stata requires that your commands comply with its grammatical rules. For the most part, Stata will provide helpful error messages when you type something wrong (see Section 1.3.4). There are, however, a few instances where you may be confused by its response to your input.

12 1. Introduction

Figure 1.8 This figure shows the Stata Graph window after the dotplot command in Example 1.3.2. The dot plot in this window is similar to Figure 1.1. We will explain how to improve the appearance of such graphs in subsequent examples.

Punctuation The first thing to check if Stata gives a confusing error mes- sage is your punctuation. Stata commands are modified byqualifiersand options. Qualifiers precede options; there must be a comma between the last qualifier and the first option. For example, in the command

dotplot apache, by(treat) center

the variableapacheis a qualifier whileby(treat)andcenterare options. With- out the comma, Stata will not recognizeby(treat)orcenteras valid options to thedotplot command. In general, qualifiers apply to most commands while options are more specific to the individual command. A qualifier that precedes the command is called acommand prefix. Most command prefixes must be separated from the subsequent command by a colon. See the Stata reference manuals or the Appendix for further details.

Capitalization Stata variables and commands are case sensitive. That is, Stata considersageandAgeto be two distinct variables. In general, I recom- mend that you always use lower case variables. Sometimes Stata will create variables for you that contain upper case letters. You must use the correct capitalization when referring to these variables.

13 1.3. The Stata statistical software package

Abbreviations Some commands and options may be abbreviated. The minimum acceptable abbreviation is underlined in the Stata reference manuals.

1.3.4. Obtaining Interactive Help from Stata

Stata has an extensive interactive help facility that is fully described in the Getting StartedandUser’s Guide manuals (Stata, 2001). I have found the following features to be particularly useful.

rIf you typehelp commandin the Stata Command window, Stata will pro- vide instructions on syntax for the specified command. For example,help dotplotwill generate instructions on how to create a dotplot with Stata.

rTypingsearch wordwill provide a table of contents from the Stata database that relates to the word you have specified. You may then click on any command in this table to receive instructions on its use. For example, search plotwill give a table of contents of all commands that provide plots, one of which is thedotplotcommand.

rWhen you make an error specifying a Stata command, Stata will provide a terse error message followed by the coder(#), where #is some error number. If you then typesearch r(#)you will get a more detailed descrip- tion of your error. For example, the commanddotplt apachegenerates the error messageunrecognized command: dotpltfollowed by the error code r(199). Typingsearch r(199)generates a message suggesting that the most likely reason why Stata did not recognize this command was because of a typographical error (i.e. dotpltwas misspelt).

1.3.5. Stata Log Files

You can keep a permanent record of your commands and Stata’s responses in a log file. This is a simple text file that you can edit with any word processor or text editor. You can cut and paste commands from a log file back into the Command window to replicate old analyses. In the next example we illustrate the creation of a log file. You will find log files from each example in this text atwww.mc.vanderbilt.edu/prevmed/wddtext.

1.3.6. Displaying Other Descriptive Statistics with Stata

The following log file and comments demonstrate how to use Stata to obtain the other descriptive statistics discussed above.

14 1. Introduction

.log using C:\WDDtext\1.3.6.Sepsis.log {1}

. * 1.3.6.Sepsis.log . *

. * Calculate the sample mean,median,variance and standard deviation . * for the baseline APACHE score in each treatment group. Draw box plots . * and histograms of APACHE score for treated and control patients.

. *

. use C:\WDDtext\1.3.2.Sepsis.dta

. sort treat {2}

. by treat: summarize apache, detail {3}

-> treat= Placebo

Baseline APACHE Score

--- Percentiles Smallest

1% 3 0

5% 5 2

10% 7 3 Obs 230

25% 10 4 Sum of Wgt. 230

50% 14.5 Mean 15.18696

Largest Std. Dev. 6.922831

75% 19 32

90% 24 33 Variance 47.92559

95% 28 35 Skewness .6143051

99% 33 41 Kurtosis 3.383043

-> treat=Ibuprofen

Baseline APACHE Score

--- Percentiles Smallest

1% 3 3

5% 5 3

10% 7 3 Obs 224

25% 10 4 Sum of Wgt. 224

50% 14 Mean 15.47768

Largest Std. Dev. 7.261882

75% 21 31

90% 25 34 Variance 52.73493

95% 29 36 Skewness .5233335

99% 34 37 Kurtosis 2.664936

15 1.3. The Stata statistical software package

. graph apache, box by(treat) {4}

{Graph omitted. See Figure 1.3}

. graph apache, bin(20) {5}

{Graph omitted. See Figure 1.4}

. by treat: graph apache, bin(20) {6}

{Graph omitted.} -> treat= Placebo

-> treat=Ibuprofen

. log close {7}

Comments

1 Thelog usingcommand creates a log file of the subsequent Stata session.

This file, called 1.3.6.Sepsis.log will be written in theWDDtext folder.

There is also a button on the Stata toolbar that permits you to open, close and suspend log files.

2 Thesortcommand sorts the data by the values oftreat,thereby grouping all of the patients on each treatment together.

3 Thesummarizecommand provides some simple statistics on theapache variable calculated across the entire data set. With thedetailoption these include means, medians and other statistics. The command prefixby treat:

subdivides the data set into as many subgroups as there are distinct values oftreat, and then calculates the summary statistics for each subgroup. In this example, the two values oftreatarePlaceboandIbuprofen. For patients on ibuprofen, the mean APACHE score is 15.48 with variance 52.73 and standard deviation 7.26; their interquartile range is from 10 to 21. The data must be sorted bytreatprior to this command.

4 Thegraphcommand produces a wide variety of graphics. With thebox option Stata draws box plots for theapachevariable that are similar to those in Figure 1.3. Theby(treat)option tells Stata that we want a box plot for each treatment drawn in a single graph. (The commandby treat:

graph apache, box would have produced two separate graphs: the first graph would have had a single box plot for the placebo patients while the second graph would be for the ibuprofen group.)

5 With thebin(20) option, thegraphcommand produces a histogram of APACHE scores with the APACHE data grouped into 20 evenly spaced bins, and one bar per bin.

6 Adding theby treat: prefix to the preceding command causes two separate histograms to be produced which give the distribution of APACHE scores

16 1. Introduction

in patients receiving placebo and ibuprofen, respectively. The first of these graphs is similar to Figure 1.4.

7 This command closes the log fileC:\WDDtext\1.3.2.Sepsis.dta.You can also do this by clicking theClose/Suspend Logbutton and choosingClose log file.

Một phần của tài liệu Statistical modeling for medical researcher (Trang 26 - 35)

Tải bản đầy đủ (PDF)

(405 trang)