1. Trang chủ
  2. » Thể loại khác

Using stata for quantitative analysis

239 9 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Using Stata for Quantitative Analysis
Tác giả Kyle C. Longest
Trường học Furman University
Chuyên ngành Social Sciences
Thể loại book
Năm xuất bản 2012
Thành phố Thousand Oaks
Định dạng
Số trang 239
Dung lượng 6,28 MB
File đính kèm 123. Using Stata fo.rar (6 MB)

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

All of the same features are covered, but Stata 12 has a slightly different appearance from these previous versions, which may make matching up what you see in the text and on your scree

Trang 3

for

Q uantitative Analysis

Trang 4

SAGE Publications India Pvt Ltd.

B1A1 Mohan Cooperative Industna! Area

Mathura Road New Delhi 110 044

Executive Editor: Jerry Westby

Production Editor Brittany Bauhaus

Copy Editor QuADS (P) Ltd

Typesetter: C&M Digitals (P) Ltd

Proofreader: Eleni-Maria Georgiou

Cover Designer: Anupama Krishnan

Marketing Manager: Erica DeLuca

Permissions Editor: Karen Ehrmann

( Copyright © 2012 by SAGF Publications, Inc

All rights reserved N o part of this book may

he reproduced or utilized in any form or by

.my m eans, electronic or mechanical, including

photocop yin g, recording, or by any information storage and retrieval system, without permission

in w ritin g from the publisher.

P rinted in the United States of America

Library o f Congress Cataloging-in-Publication Data

1 Stata 2 Social sciences— Graphic

m e t h o d s —Computer programs 3 Social sciences— Statistical m ethods—C o m p u te r

programs I Title.

HA32.L66 2012 005.5'5— dc23 2011041851

Certified Chain of Custody

SUSTAINABLE Promoting Sustainable Forestry

INITIATIVE www.sfiprogram.org"

SFI label applies to text stock

11 12 13 14 15 1 0 9 8 7 6 5 4 3 2 1

Trang 5

Preface ix

Chapter 5: Relationships Betw een N o m in a l and Ordinal V ariables 113 Chapter 6: Relationships Betw een D ifferen t Measurement Levels 137

C hapter 7: Relationships Betw een Interval-Ratio Variables 148

Trang 6

Detailed Contents

A bout th e N ational Study o f Y outh and Religion x

O p e n in g and Saving S tata D ata Files 5

U sing D ifferent Types o f D a ta Files in Stata 10

T ypes o f Variables in D ata Files 13

E xecuting a C om m and U sing the C om m and W indow 21

Trang 7

T ranslation F rom the C o m m an d W in d o w 58

G ettin g the M ost O ut o f D o Files 66

H istogram s, Bar Graphs, an d Pie C h a rts 97Measures o f C e n tral Tendency an d V ariability 102

Trang 8

S u m m a ry o f C o m m an d s U sed in T his C hapter 146

D ichotom ous (D u m m y ) V ariables and Linear

S u m m a ry o f C o m m an d s Used in T his C hapter 165

A dvanced C onvenience C o m m a n d s 175

S u m m a ry o f C o m m an d s Used in T his C h ap ter 185

Trang 9

Motivation and Purpose

T h e motivation for th is book, as I assume is tru e for m o st, came from a series

o f personal experiences First, as a graduate stu d e n t, I rem em ber literally laying awake at night d re a d in g the idea o f using a c o m p u te r program to c o n d u c t statistical analyses T h e first statistics course I to o k required Stata to co m p le te

th e assignments a n d the final research project T his necessity was so over­

w helm ing at the tim e , in p art, because there d id n o t seem to be any s tr a ig h t­forw ard, concise tex ts explaining the basics o f Stata O ver my tim e in g ra d u a te school, I came to be v ery fam iliar with Stata, even to th e point that I d eveloped

a serious passion fo r b o th learning Stata and te ac h in g it to students w h o were facing the same fears I once did In a som ew hat m irro re d experience, I was

ho p in g to use Stata as a significant portion o f th e classroom experience and requirem ents w hen I first began teaching a co u rse on Q uantitative A nalysis I

so o n realized that th e re still was not a m anageable in tro d u c to ry text o n th e use

o f Stata for q u an tita tiv e research.1 Thus, I so u g h t to co n trib u te to filling this

v oid by providing a straightforw ard, applied in tro d u c tio n to using Stata.This book will b e m ost beneficial to rea d ers w h o are novices w h e n it com es to Stata and a re at least in the early stages o f learn in g strategies fo r c o n ­

d u ctin g quantitative analysis It does assum e th a t th e reader has a w o rk in g know ledge o f basic statistical techniques and term inology The o rg an iz atio n

a n d coverage o f th e b o o k is guided by the c o n te n t a n d ordering o f topics fo u n d

in m ost in tro d u c to ry social statistics textbooks In this m anner, it can serve as

a n excellent co m p a n io n , eith e r for a class o r self-learner, to such a te x tb o o k

'Assuredly, there are several very good and effective texts on learning Stata Virtually all o f these, however, are aimed at experienced users or are so detailed and long that they are not helpful for a typical classroom in which teaching Stata is not the primary purpose.

Trang 10

x U SIN G STATA FOR QUANTITATIVE ANALYSIS

To b e clear, this book sh o u ld n o t be used to learn statistics o r quantitative analysis Som e basic assu m p tio n s a n d explanations are pro v id ed , but these

sh o u ld n o t be used in place o f a m o re th o ro u g h coverage o f each o f th e analytic strateg ies T he statistical g ro u n d in g fo r this book is based prim arily on

F ra n k fo rt-N a ch m ias and L e o n -G u e rre ro ’s (2009) Social Statistics fo r a Diverse

Society T h e definitions an d in te rp re ta tio n s o f the specific m easures and tests

are b a s e d on those presented in th is text O f course, any inaccuracies or mis­takes a re solely m ine

A lso, this b o o k does n o t a tte m p t to cover every aspect o f each Stata com­

m a n d th a t is introduced M ore ex p e rie n c e d users u n d o u b te d ly know shortcuts

o r a ltern ativ e m ethods for the te c h n iq u e s that are p resen ted B ut the given

d e s c rip tio n has been geared to in tro d u c e com plete novice users to Stata This targ eted audience requires th a t th e exp lan atio n starts w ith the basics before

ju m p in g in to th e advanced features T h e presented c o m m a n d s an d procedures are discussed because they are th e m ost simplified strategies th a t effectively accom plish th e pertinent goals

About the National Study of Youth and Religion

T he d a ta for this book com e fro m th e N ational Study o f Y outh a n d Religion (NSYR) T he NSYR is a lo n g itu d in a l, nationally representative telephone sur­vey o f U.S young adults T here a re th ree waves o f data, all o f w hich are publi- cally available

T h e variables that are used in th e exam ples th ro u g h o u t this book come from th e m ost recent follow-up su rv ey o f 2,532 young a d u lts com pleted in the fall o f 2007 At the tim e o f this survey, the respondents were all betw een the ages

o f 18 a n d 24 Each respondent co m p le te d a com puter-assisted te lephone inter­view ing (CATI) survey that lasted approxim ately an hour T h is data set covers a bro ad array o f topics, m aking it possible, across examples, to use variables per­tin e n t to several disciplines For exam ple, it contains several stan d ard self­esteem m easures o f interest to psychologists, a wide a rra y o f questions on religion useful for sociologists, n u m e ro u s questions on finances (e.g., debt) applicable to econom ics, and m easu res o f substance use b eh av io rs th a t would

be p e rtin e n t to social work o r h e a lth researchers The full d a ta set a n d docu­

m e n ta tio n can be dow nloaded fro m the Association o f R eligion D ata Archives (http://w w w thearda.com /A rchive/F iles/D escriptions/N S Y R W 3.asp)

T h e first wave of the survey sam p led 3,290 U.S E nglish- a n d Spanish­speaking teenagers, ages 13 to 17 T he sam pling and su rv ey w ere conducted from July 2002 to August 2003 u sin g random -digit-dialing, draw in g o n a sam­ple o f ran d o m ly generated te lep h o n e n u m b e rs representative o f all noncellular

p h o n e n u m b e rs in the United States T he overall response ra te o f 57% for the

Trang 11

first survey is lower th a n desired, but it is sim ilar to o th e r current n atio n ally based surveys using sim ilar m ethodologies F u rth e r com parisons o f th e NSYR data with 2002 U.S C ensus data on households a n d w ith nationally rep rese n ta­tive surveys o f adolescents— such as M o n ito rin g th e Future, the N atio n al

H ousehold Education Survey, a n d the N ational L o n g itu d in al Study o f A dolescent

H ealth— confirm th a t the NSYR provides a natio n ally representative sa m p le o f U.S teenagers aged 13 to 17 years and their p a re n ts w ith o u t identifiable s a m ­pling or nonresponse biases (for details, see S m ith & D enton, 2005) The follow -

u p sample that is used in the data sets com es fro m this initial sample o f 3,290 teens To obtain m o re inform ation regarding th e technical details an d d o c u ­

m entation o f the NSYR, please visit http://w w w youthandreligion.org/

A Note on Versions

All the com m ands a n d exam ples for this bo o k w ere p roduced using S tata 12.0

fo r Windows The p rim a ry com m ands and o p tio n s are sim ilar for o ld e r ver­sions, dating back u n til at least Stata 9 T here were, however, a few ch an g es between Stata 11 a n d Stata 12 Most o f these changes d o not affect th e ac tu a l functionality but ra th e r deal w ith convenience a n d appearance In fact, m o s t o f

th e substantive differences th a t the new users w o u ld encounter fall u n d e r th e topics covered in C h a p te r 1

Due to the very recent release o f Stata 12 (July 2011), many readers m ay still be using Stata 11 o r even Stata 10 To ad d ress this potential challenge, this

b o o k includes two versions o f the in tro d u c to ry m a teria l (i.e., Getting to K now Stata) The vast m a jo rity o f th e m aterial in b o th versions is extremely sim ilar,

b u t both were included to prevent any c o n fu sio n over th e small dissim ilarities For users of Stata 12, please sta rt with C h a p te r 1: G ettin g to Know Stata 12 For users of Stata 11 (or o lder), please start w ith A ppendix: G etting to Know S tata 11,

a n d then rejoin the b o o k at C hapter 2 From th a t p o in t on, all o f th e c o m ­

m an d s and strategies are equivalent across versio n s (although the ap p e a ra n c e

o f the screenshots m ay be slightly different)

The vast m ajority o f the com m ands presen ted are sim ilar for Stata for M ac

as well The appearance and w ording o f som e ico n s as well as the pathw ays for

th e point-and-click m e n u s m ay be slightly d ifferent for a Mac operating system

Trang 12

xii U SIN G STATA FOR QUANTITATIVE ANALYSIS

v ariab le nam es in a p articu la r d a ta set, su ch as g e n d e r o r i d s It will also be used to show th e display fro m th e Stata Results w indow (if the actual screen

sh o t is n o t sh o w n )

T h is fo n t w ill be used to d e n o te a com m and th a t is entered into the

C o m m a n d w in d o w to p e rfo rm a given o peration A dditionally, if these com­

m a n d s are p rese n ted by them selv es w ith in a sentence, th e y will be set apart by

a d ash p re a n d p o st (e.g., - r e p l a c e - ) so that they are n o t confused with a

v ariab le nam e

T h e m a jo rity o f this b o o k discusses the syntax c o m m a n d interface (i.e., the C o m m a n d w indow ) aspect o f Stata But there will b e tim es when the

m e n u , p o in t-a n d -c lic k interface is d escribed Menus (e.g., F ile ), clickable but­

to n s (e.g., O K ), o r keys o n th e k ey b o a rd (e.g., E nter) will b e d en o ted with the

A ria l fo n t

Finally, S tata is a case-sensitive p ro g ra m , m eaning th a t all com m ands and variab le nam es m ust be typed exactly as they are show n F o r the purposes of this b o o k , this sensitivity m eans th a t at tim es the cap italizatio n m ay not follow typical g ram m atic al conventions For exam ple, if a variable nam e starts a sen­tence a n d th a t variable n am e is low ercase, then that sen ten c e will start with a low ercase letter

References

F ra n k fo rt-N a c h m ia s, C., & L eo n -G u e rre ro , A (2009) Social statistics for a diverse society

( 5 th e d ) T h o u s a n d O aks, CA: F in e Forge Press

S m ith , C , & D e n to n , M L (2005) Soul searching: The religious and spiritual lives of

Am erican teenagers. New York, NY: O x fo rd University Press

Trang 13

The author an d SAGE gratefully acknow ledge th e c o n trib u tio n s o f th e fol­

lowing reviewers:

Karen Y H olm es, Norfolk State University, Norfolk

Sean Kelly, University o f Notre Dame

David Peterson, Iowa State University, Ames

Raymond S anchez M ayers, Rutgers University, N ew Brunswick

Trang 15

PART I

Foundations for Working With Stata

Trang 16

Getting to Know Stata 12

Fo r m any people, learning an y new com puter software can be an anxiety-

p roducing task W hen that c o m p u ter program involves statistics, the stress level generally increases exponentially If you have similar feelings as you begin your jo urney into becom ing a Stata user, d o not fear, you are n o t alone This book

is designed w ith this apprehension in m ind O ne of the p rim a ry goals o f this book

is to help alleviate, or at least m inim ize, this anxiety as we m ove tow ard becoming

an effective and proficient Stata user Keep in mind that at o n e tim e you may have had sim ilar feelings about using e-m ail o r the Internet, an d just as m any people now feel extremely com fortable u sing these programs, by th e en d o f this book you will have a sim ilar grasp o f and co m fo rt w ith Stata

Before diving into all the details of using Stata, it is im p o rta n t to have an understanding o f its various com ponents This chapter will serve as an introduc­tion to the basic building blocks o f th e Stata program Each o f these aspects will be covered in m uch more detail th ro u g h o u t th e book, but this chapter provides an overview of the basic functionality o f the Stata program T he second section of the chapter explains how data are o p en ed , im ported, and entered

W hat You See1

W h en you o p en Stata, by d o u b le clicking on the Stata ic o n , for th e first time, you w ill see th e following screen:

'If you are using Stata 11 (or Stata 10), please use Appendix: Getting to Know Stata 11 instead of this first chapter All of the same features are covered, but Stata 12 has a slightly different appearance from these previous versions, which may make matching up what you see in the text and on your screen a bit confusing Starting from Chapter 2, the vast m ajority o f operations and com mands are similar across versions A nd the text specifically notes any particular features that are different for previous versions.

2

Trang 17

'• r* UtM : h HL' »«*«1 UW »» <*

There are five different windows on the screen 2

1 Results W indow: T he Results w indow is w here everything th a t Stata

“d o es” will be displayed A nytim e Stata executes so m e o peration, it will display

th a t operation and its results in this window T hese results, however, a re n o t autom atically saved H ow to save these resu lts is covered in th e D ata

M anagem ent: Saving Results section o f C h ap ter 3

2 Review W indow: The Review w indow c o n ta in s a running h isto ry o f all

th e operations that have been perform ed d u rin g th e c u rre n t session o f Stata

W henever you enter a n d execute a com m and, it w ill a p p e a r both in the R esults

w indow and in the Review window The m ost useful aspect of the Review w in ­

d o w is that it can b e used as a shortcut to w o rk w ith a previously executed

co m m an d W hen y o u click o n a c o m m a n d in th e Review w indow , th a t

:This layout is what you would see if Stata was opened “right out o f the box.” If you are working

on a shared computer (or over a network), there is a chance that these windows have been m oved, resized, or even deleted by another user, making what you see slightly different from the screenshot presented If any o f these windows are missing, you can click on the W indow s tab and click on the desired window You can also move these windows by sim ply clicking on them with your m ouse

Trang 18

4 PART I FOUNDATIONS FOR W O RK IN G WITH STATA

c o m m a n d will appear in th e C o m m a n d w indow, from w h ich you can alter the

c o m m a n d o r sim ply reru n th e sa m e co m m an d

3 V ariables W indow: W h e n you o p e n a data file in S tata, th e variables

c o n ta in e d in th a t data set w ill b e listed in the V ariables w indow This w in­

d o w c a n be u se d to scroll th r o u g h a n d see all the v aria b les th a t are contained

in th e active data W henever y o u click on a variab le n a m e listed in the

V ariab les w indow , several c h a r a c te ris tic s o f that v aria b le are displayed in the

P ro p e rtie s w indow If you p la c e y o u r c u rso r over a v a ria b le , a sm all arrow

w ill a p p e a r By clicking o n th a t arro w , th e variable n a m e will autom atically

a p p e a r in th e C o m m an d w in d o w T h is w indow also lists th e variable

“L abel,” w hich presents m o re d e ta ile d in fo rm atio n a b o u t th e variable

L abels are discussed in m o re d e ta il in th e Data M a n ag e m en t: W orking With

L abels se ctio n o f C h ap ter 3

4 P rop erties W indow: T h e P ro p e rtie s w indow p ro v id e s d etails about

th e d a ta set th a t is c u rre n tly b e in g u se d and any v a ria b le th a t has been

se lec ted (by clicking on it) fro m th e Variables w indow For th e data, this

w in d o w p ro v id es the file n a m e , th e n u m b e r of variables th a t are included in

th e d a ta , a n d th e n u m b er o f “o b s e r v a tio n s ” (e.g., su rv ey re sp o n d e n ts) For a given v aria b le, th e P ro p erties w in d o w lists the v ariable n a m e , its type, for­

m a t, a n d value label D etails o n each o f these d e s c rip to rs are discussed later

in th is ch ap ter By default, th e P ro p e rtie s w indow is “ lo c k ed ,” m eaning you

c a n n o t chan g e any o f these c h a ra c te rs tic s directly fro m th e P ro p e rties win­dow C lick in g o n the p ad lo ck ic o n (lo c a te d in the u p p e r left c o rn e r of the

P ro p e rtie s w indow ), how ever, u n lo c k s th e P roperties w in d o w a n d allows you to ch an g e th e aspects o f th e v a ria b le sim ply by click in g on th a t property (e.g., th e v ariable nam e) M o re d e ta ils o n this process a re p ro v id e d later in

th is c h a p te r

5 C om m and Window: T h e C o m m a n d w indow is w h e re yo u will enter

th e o p e ra tio n s th a t you w ant S ta ta to p e rfo rm , w hen u sin g th e “sy n ta x ” inter­face Syntax, o r code, is a n o th e r te rm fo r Stata’s c o m m a n d language These are th e w ords th at tell Stata w h a t p ro ce d u re s to execute C o m m an d s are

e n te re d , one a t a time, in this w in d o w A fter you type a c o m m a n d into the

C o m m a n d w indow , pressing th e E n te r key on your k e y b o a rd m akes Stata

ex ecu te the p rocedure th a t is d e fin e d b y the C o m m an d O n e h elp fu l feature

o f th e C o m m a n d w indow is th a t you c a n scroll th ro u g h p re v io u sly executed

c o m m a n d s by pressing th e P a g e U p key W hen you fin d th e p rev io u s com­

m a n d you are interested in, you ca n e ith e r alter it o r sim p ly press E n te r again

to r e r u n the sam e com m and T h e m a jo rity o f this b o o k will be devoted to

ex p lain in g a n d describing th e v a rio u s co m m an d s th a t y o u will n ee d to use to

p e rfo rm q u antitative analyses

Trang 19

There also are several icons at the to p o f th e screen The p u r p o s e a n d

u se o f these icons a re covered th ro u g h o u t th e b o o k Each o f th e se b asic

w indow s will b ec o m e fam iliar to you as we go th r o u g h this book F o r now ,

b e sure that you feel co m fo rta b le id entifying th e m a in p urpose o f e a c h o f

th e windows

Getting Started With Data Files

W hen w orking w ith S tata, y ou will be using w h a t is referred to as a “d a ta file.”

I f you are fam iliar w ith typical database p ro g ra m s, th e n you already k n o w

w h a t a data file basically is These files c o n ta in in fo rm a tio n (often n u m e r i­cal) on a set o f cases, such as respondents to a survey, a sam ple o f sc h o o ls, o r each o f the states in th e U nited States G enerally, d ata files are o rg an iz ed su c h

th a t inform ation reg a rd in g each case is c o n ta in e d in one row in th e file,

w hereas each colum n rep rese n ts a variable (i.e., in fo rm a tio n a b o u t th a t case),

su c h as a person’s gender, a sc h o o l’s total n u m b e r o f stu d e n ts, or a sta te ’s to ta l

sq u a re miles

Similar to m ost c o m p u te r files, data files c o m e in m an y different types But

ju st like a PDF file is very sim ilar to a w ord d o c u m e n t, so too are all d a ta files essential derivations o f a sim ilar structure Each o f these derivations is d e n o te d

by a different file extension— the letters th at c o m e after the in a file n a m e

T h e prim ary file for S tata data files is dta M oving o th e r types o f data files in to Stata (e.g., M icrosoft Excel files) is covered in th e U sing D ifferent Types o f D ata Files in Stata section o f this chapter

O P E N IN G A N D S A V IN G STATA DATA FILES

To open a data file that is in Stata fo rm at (i.e., o n e th at has a dta e x te n ­sio n ), select the File m e n u (in the upper le ft-h a n d c o rn e r), then choose O p e n

O r alternatively, you can sim ply click on the ¿Jf icon F ro m here you will n ee d

to search through th e disk drives and folders o n y o u r com puter to fin d y o u r saved data file This ch a p te r uses the data file, available a t w w w sag ep u b co m /

lo n g e st nam ed C h a p t e r 1 D a t a d t a O nce you have found y our d a ta file,

d o u b le click the file H aving d one this, you w ill n o tic e th at the Stata screen looks different from h o w it d id initially

The first o p era tio n you perform ed is n o w displayed in both th e R esults

a n d Review windows Again, w henever we tell S tata to “d o ” som ething, w h e th e r

th ro u g h the p o in t-a n d -c lic k m enus o r by e n te rin g a com m and in th e

C om m and window, it will be displayed in th e Results a n d Review w in d o w s Because opening a d a ta file does not have any “results,” only the c o m m a n d is

Trang 20

6 PART I FOUNDATIONS FOR W ORKING WITH STATA

4905 Lakevay Drive College Station, Texas 7 7 8 4 5 USA

1 (/v# opt i o n or - s e t maxvar-) 5000 maximum variables

use "C:\DocuMnts and Settinga\ldongest\My Documents\stata\Data\Chapte

> r INChapter 1 Data.dta”

mm

Variable Label ids (kfc)Re*p

gender (gender, agecats (agecat*,^ employs* (empbyst refcgotfi (refcgOb_w

m o st im p o rta n t aspect is the v aria b le nam e In this data set, th e five variables are n am ed i d s , g e n d e r , a g e c a t s , e m p l o y s t , a n d r e l i g o t h These variable nam es should give you so m e indication o f w hat ty p e o f information

th e variable contains T he v ariab le gender, for exam ple, says w hether each resp o n d e n t is a male or a fem ale

It is a good practice to always save a copy o f your d a ta files a n d only work

w ith th a t duplicated version W h e n w orking with and analy zin g d ata, you will often be forced to change aspects o f th e data files For ex am p le, you may need

to create a new variable o r change so m e th in g about an ex istin g variable But it

is im p o rta n t to have an orig in al version o f the data, ju s t in case something

un d esired occurs Don’t w o rry to o m u ch ; m ost alterations y o u p e rfo rm can be

u n d o n e or recovered W orking w ith a duplicate copy o f th e d ata is simply an

a d d e d protection

To save a duplicate copy o f th e d a ta file you have ju s t o p en e d , open the

F ile m en u an d click on S a v e A s You can then enter a n e w file n am e, such as

C h a p t e r 1 D a t a m y c o p y d t a , an d click S ave T h is is th e procedure you will use w henever you w an t to save a new version o f y o u r d a ta file

Trang 21

A Closer Look: Stata Data Files Across Versions

As was noted in the Preface, the vast majority o f Stata features and com ­mands are similar across versions (e.g., Stata 12, 11,10, etc.) This is true o f Stata data files, by and large All Stata data files th a t are created a n d /o r saved in an older version can be read by a newer version (i.e., forward com ­patible) That means that if you are using Stata 12 but are working w ith colleagues who are using Stata 11, any files they send to you will open without a problem

During certain upgrades, however, Stata data files cease to be "back­ward" compatible, meaning files saved in a newer version cannot be opened

by older versions Stata 12 happens to be one o f those upgrades If you are using Stata 12 and send a data set th a t you saved in Stata 12 to your col­leagues who are using Stata 11, they will not be able to open it [Note: This

is not a problem if you are moving files between Stata 11 and Stata 10, as these two versions are completely compatable w ith each other.]

Do not despair Stata has built in a very simple feature to overcome this problem If you know that you want the data you are using in Stata 12 to

be opened by older versions, you need to take one extra step (from the process just explained)

First, click on the File menu and then click on S a v e As Now, use th e drop-down menu in the S a v e as T yp e box and select Stata 9 /1 0 D a ta (*.d ta ) option The option is listed as "Stata 9 / 1 0 ” and not 11 because Stata Versions 8 and 9 as well as 10 and 11 are completely compatible w ith each other (both forward and backward), so using this option actually allows the data to be opened in any version o f Stata from 8 through 12 Note that you do not need to change the file extension, it is still d t a Once you have named your file, click S ave You will know that you have saved the data correctly when the output in the Results starts w ith

saveold, which is telling you that the file has been saved in a way th a t makes it readable by the previous versions Again, note th a t when you save

a file in this way, it can still be used in Stata 12

D ATA BROWSER A N D EDITO R

If this is the first tim e y o u are w orking w ith d ata, it may be h elp fu l to actually “see” the d ata Even if you have ex p erien ce u sing data, it may o fte n be helpful to look at th e d ata you are exam ining To see th e data file in S tata, you

Trang 22

8 PART I FOUNDATIONS FOR W O RK IN G WITH STATA

can click on th e D ata Brow ser ic o n , in the m iddle o f th e to p o f the screen

W h e n you d o so, you will see a n e w w in d o w that ap p e ars as sh o w n below:

C *-» JTUBt, A->l IVLMpU« \ IMI4 m r o>f «u

.1 ill, H l l

PtÊ im tmm

ids g«nder ag e c a t s e m p l o y » r e H g o t h

1 41 8 4 1 Mai* 23 No »chool or M O R M O N

3 9534 Mal« 22 Acti v e arm e d P E N T ECOSTAL

4 1 0 2 8 1 Female 19 Em p l o y e d N O N D E N O M I NATIONAL

5 13530 Fanale 18 E m p loyed and BAPTIST

6 11079 Mal« 19 In school on NONDENOMI N A T I O N A L CHRISTIAN

7 3135 Fanal« 18 E m p loyed and MOR M O N

8 4 3 3 1 Fonal « 21 In school on PROTESTANT

9 4 9 2 9 Female 2 1 E m p loyed and DOVER FIRST C H R I S T I A N CHURCH

10 5 228 Mal« 1 9 Out o f labor E P I S COPALIAN

A,

-T h is new window, as is d e n o te d in its upper le ft-h a n d co rn er, is the Data

E d ito r (Browse) window T h e “ (B ro w se)” aspect indicates th a t you are only

lo o k in g at th e data, not actually c h a n g in g them

In this w indow, you see all five o f the variables th a t w ere listed in the

V ariables w indow As was m e n tio n e d earlier, each row is a d iffe ren t case (i.e., a

N atio n al Study o f Youth and R eligion [NSYR] resp o n d e n t), a n d each column

is a different variable Each cell th e n c o n tain s inform ation o n th e given variable for th a t case For example, th e case in th e first row is a “M a le ” respondent who

m e n tio n e d th a t “M orm on” was h is o th e r religion To close this window, click

o n th e red “X” in the u p p er r ig h t-h a n d corner

T here may be times w hen y o u w ant to change the value o f a particular case

on a n individual variable O ne w ay to do so is by using th e D ata E d ito r window (A m o re efficient way to change th e values of m ultiple cases is covered in The

5 Essential C om m ands: replace (if) section o f Chapter 2.) To begin, click on the

D ata Editor icon, which is nex t to th e Data Browser icon You may notice

th a t th e Data Editor and Data Browser windows look very sim ilar T h e main dif­ference is th at in the upper le ft-h a n d c o rn e r of the w indow , after “ D ata Editor,”

Trang 23

th e window now reads “ (Edit).” It is im portant to know which window y o u have opened because you can change the values o f th e data w hen the Editor is o p en

To prevent any accidental alterations, it is generally advised only to use th e D ata Browser window unless you are certain you w ant to change a particular value.After you have o p en ed th e Data Editor w indow , use the direction keys (o r

m ouse) to highlight th e cell you would like to change For example, y o u m ay have realized that th e first case’s age was incorrectly entered in the d a ta file Instead o f being 23 years old, this case should o n ly be 22 years old To m a k e

th is change, once y o u have th e cell in the first row listed under a g e c a t s highlighted, type 22 a n d press Enter This case’s value for the variable a g e ­

c a t s has now changed W hen you close the D ata E d ito r window, this o p e r a ­tio n has been recorded and displayed in b oth th e Review and Results w in d o w s

A Closer Look: Your First Command

You may have noticed that when you changed the first case's value using th e Data Editor window, the following text was displayed in the Results window:

r e p l a c e a g e c a t s = 2 2 i n 1 (1 r e a l c h a n g e m a d e )

Whenever you use the menus or a point-and-click method for performing

an operation in Stata, it displays the command th a t would be entered in th e Command window to perform the same operation in the Results window In this Data Editor example, you can see th a t the command to change a value

is -replace- If you had entered this full command into the Command

window and pressed E nter, the same change would have been made A t times, it may be helpful to perform an operation for the first time using th e menus, but, as w ill be discussed in much more detail in Chapter 2, it is extremely beneficial to know and use the commands via the Command window for the m ajority o f the operations you need to perform

The rest of this book will discuss how to perform operations using th e Command window But to see the connection between the menu-based operation and the Command window, try this: Type (or copy and paste) th e full command (except the f ir s t ".") that was displayed in the Results w indow when you closed the Data Editor window into the Command window Now change the "22" to "23." The command should read

r e p l a c e a g e c a t s = 2 3 i n 1Then press E n te r Open the Data Browser w indow again and notice th e change to the first case's value under a g e c a t s

Trang 24

10 PART I FOUNDATIONS FO R WORKING WITH STATA

E N T E R IN G Y O U R O W N D A T A

M any d a ta files that you w ill analyze will already b e in Stata form at or in

a fo rm a t th a t can be easily co n v e rted to Stata fo rm at (m o re o n this topic below ) Yet th e re may be tim es w h en y o u need to enter th e data from a study

F or exam ple, if you d istrib u te d a survey through the m a il, you will need to

in p u t th e responses to each q u e s tio n for each case so th a t y o u can analyze them

in S tata

T h e first step in en terin g y o u r ow n data after you h av e o p e n e d Stata is to

o p e n th e D ata Editor w in d o w as above From here you can sim ply enter the values for each case on each variable E ntering data in th is way is very similar

to en te rin g values into a M ic ro so ft Excel file The D ata E ditor, however, does

n o t have the equation fu n ctio n alities th a t an Excel file w ou ld

W hen you begin e n te rin g values, each variable is au to m a tic a lly named

v a r l , v a r 2 , an d so on M ost o ften it is helpful to have th e v ariable names be

m o re descriptive of the values th e y c o n ta in O ne way to ch a n g e these generic

n a m e s to som ething th a t m o re clearly identifies the v aria b le is to click on the

c u r re n t nam e o f the given v aria b le y o u w ant to re n a m e (e.g., v a r l ) listed

n e a r th e top o f the E ditor w indow D o in g so will b r in g up th a t variable’s

in fo rm a tio n in the P roperties w in d o w (inside the D ata E d ito r w indow ) Then click on the c u rren t variable n a m e listed in the N a m e b la n k in th a t Properties

w indow F rom there you can sim p ly d elete the c u rre n t n a m e a n d enter the

d esire d nam e A nother o p tio n w ould b e to close th e D ata E d ito r window

w h e n you have finished e n te rin g all o f the data T hen y o u can click on the

v aria b le nam e (e.g., v a r 2 ) in th e V ariables window, w h ic h will b rin g up that

v a ria b le ’s info rm atio n in th e P ro p e rtie s window To c h a n g e th e nam e in this

P ro p e rties w indow , you will n e e d to click on the padlock ic o n in th e Properties

w indow T h en you click on th e c u rre n t variable n am e listed in the Name

b la n k and sim ply type the new n am e in the blank

O n ce you have finished e n te rin g all o f your data, clo se th e D ata Editor

a n d follow th e steps d escrib ed above to save a copy o f y o u r d a ta file in Stata

fo rm a t

U S IN G DIFFERENT TYPES O F DATA FILES IN STATA

S om e d a ta files may n o t b e available in Stata fo rm a t T herefore, a few ste p s are needed to w ork w ith th e se files in Stata It w o u ld be v irtu a lly im pos­sible to cover every possible d a ta file ty p e and how each c a n be transferred to

be u sable in Stata Instead, th e m o st c o m m o n type will b e covered Also note

th a t there are other c o m p u te r softw are p rogram s th a t are specifically

d esig n ed to convert data files in to v ario u s form ats (e.g., S tat/T ra n sfe r) If you have access to such a p ro g ra m , it is p ro b ab ly the m o st effective a n d efficient

Trang 25

w ay to transfer files in to a Stata form at S o m e sta tistica l software packages also offer the o p tio n o f saving a data file in a d iffe re n t form at, w hich often includes the Stata, d ta extension.

One o f the m o st frequently encountered d a ta file type that is n o t S tata- read y is a M icrosoft Excel file Usually these files are denoted w ith th e .xls extension, but o th e r extensions (e.g., csv) th a t are generated or readable by

M icrosoft Excel can all be treated in a sim ilar fashion

This process requires th a t you have access to a n d som e fam iliarity w ith

M icrosoft Excel To sta rt, o p en the data file in M ic ro so ft Excel T hen h ig h lig h t

th e entire w orksheet th a t contains the data a n d co p y it (either by right clicking

a n d choosing C o p y o r using the copy fu n ctio n (C tr l+ C )) Next, in S tata o p e n

th e Data Editor w indow , highlight the u p p e r left d a ta cell, right click a n d choose Paste, o r use the paste function (C tr l+ V ) O n c e you pasted in th e

d a ta , you should be presen ted with a w indow th a t asks w hether you w a n t to

T re a t First R ow a s D ata o r T reat F irs t R o w a s V a ria b le N a m e s T h e

o p tio n that you ch o o se will depend on w h e th e r y o u r Excel file co n tain s v a ri­able names in the first row o r w hether it c o n ta in s o nly data The two fo rm a ts

a re show n below

First Row as Variable Nam es

S : “ !' h»ri *•«« i*t%*A Clue*«« 1 P«u iS |C«n>uMiitty Mod»| • Mitre soft E»tt*

A' 4'I*-A-

H#-1 iff UP *£»*»» • Ifc m

NONDENOMINATIONAL CHRISTIAN MORMON

PROTESTANT DOVER FIRST CHRISTIAN CHURCH EPISCOPALIAN

Trang 26

12 PART I FOUNDATIONS FOR WORKING WITH STATA

First Row as Data

9 4929 Female 21 Employed and DOVER FIRST CHRISTIAN CHURCH

th e data from within Stata as a Stata d ata file O nce y o u have saved your data

as a Stata d ata file, you can sim ply o p e n and use this v ersio n o f your data.1Stata 12 (but n o t th e p re v io u s versions) o ffe rs a n o th e r method for

b rin g in g d a ta from an Excel file in to Stata that m ay b e even slightly quicker

A fter o p e n in g Stata, click o n th e F ile m enu, follow ed by Im p o rt Select the

E x c e l s p re a d s h e e t (* x ls * x ls x ) o p tio n ,4 an d th e fo llo w in g window will

a p p e a r:

T h is “copy and paste" method is the easiest way to transfer data from Microsoft Excel into a Stata format, especially for novice users But there are some disadvantages to this strategy More practiced users should transform Excel worksheets into csv files and then im plem ent the - i n s h e e t - command The specifics o f this command are beyond the scope o f this introductory text, but the Stata Help Files section o f Chapter 8 provides information on how Stata’s Help files can be used to learn how to use this command.

'It you are using Stata 12, you will also notice that you could select several different data file formats from this window The general procedure for each o f these formats is very similar to the

Trang 27

O Import fast row as variable names

d Import all data as strings

Preview:

Cancel

Click on the B ro w s e b u tto n to find the Excel d ata th at you w ould like to

tu r n into a Stata d ata set O nce you have selected the Excel file, you can p ic k a particu lar w orksheet from th a t file or even a p a rtic u la r set o f cells by u sin g th e

co rresponding boxes N otice th a t you still need to d ecide and tell Stata w h e th e r the first row in the Excel file contains variable n am es o r actual data If th e first row contains variable nam es, click the radio b u tto n n ex t to Im port fir s t ro w

a s v a ria b le n a m e s (w hen you do this, notice th a t th e data shown in th e p re ­view w indow will change) T h en click OK As d e scrib ed above, you can follow

th e previously d escribed steps to save the data fro m w ith in Stata as a S tata data file O nce you have saved your data as a Stata d a ta file, y o u can simply o p e n an d use this version o f y o u r data

TYPES OF VARIABLES IN DATA FILES

At this point, you should feel com fortable w ith th e basic structure o f d ata files Each row holds th e inform ation for one case a n d each colum n is a d iffe r­

e n t variable W ith th is know ledge, you are alm o st ready to start analyzing y o u r data There is, how ever, one distinction in th e types o f variables in c lu d ed in

d ata that is im p o rta n t to understand

Trang 28

14 PART I FOUNDATIONS FOR WORKING WITH STATA

To help illustrate this difference, consider the NSYR variable, in the

C h a p t e r 1 D a t a d t a file, gender This variable c a m e fro m the follow­

in g q u e s tio n asked o f all re sp o n d e n ts:

A re you

a Male?

b Female?

If you w ere entering th e resp o n se s to this question in to a S tata data set,

y o u could reco rd them in o n e o f tw o ways First, the a c tu a l an sw e r “Male” or

“ F em a le” co u ld be recorded for each case Second, you c o u ld use a number to

re p re se n t each answer For ex a m p le , you could choose to e n te r 0 fo r all respon­

d e n ts re p o rtin g “Male” an d 1 fo r all respondents re p o rtin g “Fem ale.”

If you record the responses in the first way, it w o u ld be w h a t Stata refers

to as a string variable A strin g v ariable is a variable in w h ich th e contents are

a c tu a l w ords String variables can be very useful for m a n y purposes For

ex a m p le , you can enter v erb a tim answ ers to questions d irec tly in to Stata, as

w as d o n e for th e variable r e l i g o t h in the C h a p t e r 1 D a t a d t a

T h e draw back o f sto rin g a variab le such as gender as a s trin g variable is

th a t so m e statistical o p era tio n s req u ire num bers For ex a m p le , if you wanted

to calculate th e mean (i.e., m a th e m a tic a l average) o f a v aria b le, each category

m u s t be assigned a nu m eric value F or this reason, it is g enerally advisable,

w h e n possible, to use the seco n d m e th o d and enter variables as numeric vari-

ables T hese are variables th at have actual num bers a tta c h e d to each response

F ortunately, many o f th e S tata co m m an d s that w ill b e discussed in this

b o o k o p era te sim ilarly w ith n u m e ric o r string variables T h e com m ands that

w o rk only w ith num eric variables are th o se that p erfo rm statistical operations

th a t req u ire n um bers to calculate, for exam ple, the m ean o r a lin e a r regression

B ecause n u m e ric variables, typically, a re m ore applicable to th e vast majority

o f d a ta analyses, the c o m m an d s discussed in this book fo cu s on th e ir use with

n u m e ric variables (keeping in m in d th a t many o p erate id e n tic ally for string variables) T h e prim ary c o m m a n d s th a t are used (and a re d iffe ren t) for string

v ariables, including m ethods fo r c h a n g in g a string v aria b le to a num eric vari­able, arc addressed in the D ata M anag em en t: Using S trin g V ariables section in

C h a p te r 3

As has been discussed, often you m ay be using data th a t you d id not enter,

so y o u m ay n o t have a choice o r even b e certain ab o u t th e way in which vari­ables were entered There are several ways to determ ine w h e th e r a variable is a

n u m e ric o r strin g variable T h e m o st straightforw ard w ay is to o p e n the Data

B row ser w indow In versions S tata 10 o r later, string variables are show n in a red

fo n t, w hereas num eric variables are show n in either black o r b lu e font In the

C h a p t e r 1 D a t a d t a file, you will see that only th e variab le religoth

is a strin g variable

Trang 29

A nother o p tio n in Stata 12 to see w hich variables are string variables is to click on a particular variable in the Variable w indow In th e Properties w indow ,

y o u will see an e n try for T ype When the v aria b le ty p e starts with the le tte rs

“str,” the variable is sto red as a string variable

A Closer Look: Variable Types

You may have noticed th a t more information about the variable type is

listed in the Properties window For example, g e n d e r is shown to be a byte variable, ids is a long variable, and r e l i g o t h is a str31 variable.

These distinctions further demarcate variables w ithin the general catego­ries of numeric and string They also are related to how much file space is allotted to storing the variable

All string variables have the "str" prefix, and the number indicates th e maximum characters that can be used for th a t string variable So the m axi­

mum length a denomination could be in the variable r e l i g o t h is 31

characters As you w ill see, this constraint can be altered, but it is advisable

to use only the minimum number of characters th a t are needed Otherwise you are using memory to store empty spaces

Similarly, the various subtypes of numeric variables indicate the number

of digits that each variable can hold In order o f smallest to largest, th e numeric variable types are byte, int, long, float, and double

Generally, Stata will store variables in the most efficient and effective way when you create them Moreover, most users o f Stata will conductcountless analyses w ithout ever having to w orry or manipulate these spe­cific distinctions

W hen you have th e Data Browser o pen, y o u p ro b a b ly notice, however, th a t

th e variables g e n d e r and e m p l o y s t look d iffe ren t fro m the variables id s

a n d agecats T his difference is due to th e fact th a t g e n d e r and e m p l o y s t

have what are called value labels attached to th e m Value labels will be covered

in m uch m ore detail later, b u t they are labels th a t can be applied to the n u m e ric codes used to represent responses Rem em ber th a t you could decide to u se th e

n u m b e r 1 to represent the answ er “Female.” T h is choice may be difficu lt to rem em ber (i.e., w h e th e r 1 was Male or w h eth e r 1 was Female), th erefo re you can use value labels as a sh o rtcu t to help re m e m b e r th is coding strategy T h e

variables ids and a g e c a t s were num erical resp o n se s so they do n o t have

a n y value labels th a t could be attached to th em You can see the actual n u m e r ­ical codes for each variable using the D ata B row ser w in d o w by clicking o n th e

Trang 30

16 PART I FOUNDATIONS FOR WORKING WITH STATA

T o o ls m e n u , selecting V a lu e L a b e ls , and clicking H id e A ll V a lu e Labels

W h e n you d o so you will see th e cases th a t were “M ale” n o w display “0” and the cases th a t w ere “ Female” now display “ 1 ”O r you can h ig h lig h t (eith e r using the

d ire c tio n keys o r the m o u se ) a p a rtic u la r cell (e.g., “M a le” ) W h e n you do so,

th e actual value is listed in a p a n e ju st u n derneath th e icons

Exercises

1 Open the “Chapter 1 Exercise Data dta” data file.

2 Save a copy o f th e o p e n d a ta n a m e d “Chapter 1 Ex m y c o p y d t a ”

3 U sing th e Data Brow ser, d e te rm in e how m any cases a n d v ariab les are in the

d a ta set

4 W h ic h o f the v ariables is a strin g variable?

5 Use th e D ata E d ito r to c h an g e th e agef stdt v alu e o f th e last case from 14

Trang 31

The Essentials

N ow that you are fam iliar with the basic c o m p o n e n ts of Stata a n d d ata

files, it is tim e to begin perform ing statistical analyses The th o u g h t o f

co nducting statistical o p era tio n s on top o f le a rn in g a new com puter p ro g ra m

ca n be a doubly d a u n tin g task— often b rin g in g w ith it a considerable a m o u n t

o f anxiety This c h a p te r is explicitly designed to h elp alleviate this c o m m o n an d

n atu ra l em otional reaction to learning Stata to c o n d u c t statistical o p e ra tio n s

T h is chapter has th re e p rim a ry goals First, it p rese n ts a conceptual a p p ro a c h

to learning Stata co m m an d s th at has been sh o w n to n o t only help le a rn th e necessary operations b u t also assuage the fears o f “m em orizing” seem in g ly endless com m ands Second, th e basic stru c tu re o r fo rm a t of Stata c o m m a n d s

is covered Regardless o f w hether the actual o p e ra tio n th a t a c o m m a n d p e r ­

fo rm s is straightforw ard or com plex, all Stata c o m m a n d s follow a very sim ila r stru ctu re Knowing this underlying fo rm at will help you process each new ly presented operation m ore easily Finally, this c h a p te r discusses the 5 essen tial

co m m an d s o f Stata These 5 com m ands form th e fo u n d a tio n o f statistical a n d

d a ta m anagem ent o p era tio n s for the vast m a jo rity o f research p ro jec ts Therefore, once you have com pleted this ch a p te r, yo u will have m a ste re d a significant p ortion o f using Stata to accom plish y o u r research D oing so will hopefully m inim ize anxiety a n d increase co n fid e n ce w hen a p p ro a ch in g th e

m o re nuanced topics covered in the su b seq u en t ch ap ters

Intuition and Stata Commands

Perhaps one o f the m o re intim idating aspects o f S tata is th a t it operates p r im a r ­ily, and most effectively, using a syntax, c o m m a n d -d riv e n interface As m o st readers have becom e accustom ed to a W indow s, p oint-and-click interface, this

17

Trang 32

18 PART I FOUNDATIONS FOR WORKING WITH STATA

m o re “D O Sesque” system m a y be unfam iliar and u n u su a l F urtherm ore, m any

u se rs m ay be disheartened b y the th o u g h t o f try in g to m em o rize numerous,

o d d -s o u n d in g com m ands

These very valid co n c ern s are w hy this book uses a new approach for teach­

in g Stata com m ands T his m e th o d is founded on the idea th a t instead of view­

in g Stata as som e black box th a t only spits out the right results w hen told exactly

w h a t to do, it is m ore beneficial to see Stata as an ex trem ely s m a rt colleague w ho

y o u are asking to produce s o m e calculations very quickly T he latter perspective

w ill help you rem em ber th a t a lth o u g h Stata is a statistical, co m p u ter program ,

it is designed by people W h e n these people th ought a b o u t w hat to call particu­

la r co m m an d s, they did th e ir best to give them nam es th a t m a d e sense.Taking th e latter ap p ro ach helps facilitate a m ore in tu itiv e approach to Stata

R a th e r than considering the n u m e ro u s com m and n am es th a t need to be “m em ­orized,” it is m ore effective to th in k as if “what w ould I call a com m and that

w o u ld tell a com puter to do a cross-tabulation?” O r alternatively, you can think

“ if m y colleague and I had b een w orking together for a long tim e, how might I tell him or her that I needed a cross-tabulation in a sh o rth a n d way.” Generally,

th in k in g in this m anner leads you to th e correct answ er - t a b u l a t i o n - o r

- t a b - for short T his in tu itiv e app ro ach should h e lp you learn and retain

S ta ta co m m an d s m ore easily a n d effectively It should also help m inimize w or­ries a b o u t th e prospect o f u sin g Stata

Finally, there are tim es w h en this type of th in k in g m ay n o t lead to exactly

th e right com m and For exam ple, if you thought “w h a t w ould I tell my col­leag u e if I w anted him o r h e r to erase a variable from th e d a ta set,” you may

th in k “erase” o r “delete.” T h e actual co m m an d for th is o p e ra tio n is - d r o p -

B u t appro ach in g the new c o m m a n d in this way sh o u ld lead you m ore quickly

to th e correct com m and a n d will help make the actual c o m m a n d make m ore

se n se and be m ore easily rem e m b ere d

T hus, as we em bark o n le a rn in g all the w onderful th in g s S tata can do, keep

th is intuitive approach in m in d R em em ber you are sim p ly w orking with a really sm art colleague S o m etim es com m u n icatio n m a y b ec o m e strained, b u t

w ith a bit o f dialogue an d u n d e rsta n d in g , you will b e able to conduct very effective analyses

A Closer Look: Commands versus Point-and-Click

Often new Stata users are apprehensive about using Stata because of its command-driven interface, rather than a Windows, point-and-click-based system Sometimes this concern may tem pt users to disregard learning Stata

Trang 33

commands and instead rely solely on its Menus and point-and-click opera­tion Although this path may seem appealing, there are several reasons to fight the urge.

First, using the point-and-click method is not any easier, in terms o f th e amount of inform ation you need to know That is, even when using a Windows-based program, you still need to learn which menus to open, which button to use for a particular operation, and the correct options to choose This method may seem easier than learning the commands, b u t it

is not due to a difference in quantity o f inform ation to be attained The distinction in the tw o methods rests mainly in the fam iliarity with using menus and Windows to perform operations But at one time this method was probably intim idating as well Just as many people have come to feel very comfortable using Windows-based computer programs, with a little practice, the Stata syntax, command-based interface will seem just as straightforward

Second, and perhaps even more important, there are real advantages to knowing the command-based aspect o f Stata For the majority o f opera­tions, the command-based interface is much quicker than the menus W hat can take several point-and-clicks to, process through the necessary layers o f options, usually can be typed in a few short words Furthermore, although similar operations can be performed using either method, the command- based format makes it much easier to save and replicate your data m anipu­lation and analyses Often, you need to make adjustments to previously conducted procedures and run them again As w ill be shown in the W hat Is

a Do File? section o f Chapter 3, using the commands along with "do files" makes this process much more straightforward Additionally, if you continue

to use Stata, many o f the more advanced abilities o f the program rely on the command format

The Structure of Stata Commands

T his section provides an overview o f th e c o m p o n e n ts o f Stata c o m m a n d s

M uch more detail a n d specific exam ples are co v ered th ro u g h o u t th e c h a p te r,

w hich will help clarify each aspect Every c o m m a n d th a t is p e rfo rm e d in

S tata has the sam e b asic stru c tu re , w hich can b e w ritte n in generic te rm s asfollows:

command va rna me ( s) [ i f v a r n a m e = = v a l u e ] [ , o p t i o n s ]

Trang 34

20 PART I FOUNDATIONS FOR WORKING W ITH STATA

C O M M A N D

A ny statistical o r d a ta o p e r a tio n you w an t to p e r f o r m in Stata has a

n a m e F o r exam ple, if y o u w o u ld like to delete a n e n tir e variable from th e

d a ta , th e co m m an d w o u ld b e -drop- These c o m m a n d s a re generally th e

f irs t ite m th a t is ty p e d in th e C o m m a n d w in d o w ( o r “d o ” file, covered in

C h a p te r 3)

M ost com m ands have tw o form s: a full c o m m a n d n am e a n d an abbrevi­

a te d c o m m a n d nam e T h e a b b re v iate d co m m an d n a m e co n tain s the m ini­

m u m n u m b e r of ch a racters re q u ire d to uniquely specify th a t command If a

c o m m a n d has an a b b re v ia tio n , you can type as m a n y o f th e characters as you

w o u ld like, as long as it c o n ta in s th e m inim um a b b re v ia tio n For example, the

full c o m m a n d to perfo rm a lin e a r regression is - r e g r e s s - , b u t the abbrevi­

a te d c o m m a n d nam e is -reg- Therefore, you c o u ld ty p e - r e g r e s s - , -r eg - , -regr-, - r e g r e - , o r -regres-, and th e sam e operation would

b e p erfo rm ed This b o o k alw ays in troduces a c o m m a n d u sin g the full com ­

m a n d n am e, but often an a b b re v ia te d com m and n a m e is used for the sake o f sim p lic ity after this first use

VAR IABLES

After the com m and, y o u m ust specify the variab le o r variables on which

y o u w ant to perform th a t o p e ra tio n For exam ple, if you w anted to delete a

v ariable n am ed gender, y o u w ould type dr o p g e n d e r in to the Com m and

w indow Particular co m m a n d s, as will be discussed in m o re detail, either

ac c o m m o d ate m ultiple variables o r even require m u ltip le variables to be specified If, for instance, y o u w an ted to create a c ro ss-ta b u la tio n , you would specify tw o variables after th e a p p ro p riate com m and

IF STATEMENTS

T here may be tim es w h en you w an t to perform a n o p e ra tio n only on cer­

ta in types o f cases As an exam ple, you may w ant to p ro d u c e a cross-tabulation

ta b le th a t includes only the m ales in y o u r data set To d o so, y o u would type an

i f sta te m en t after you have en tered the com m and a n d variables Generally,

th ese i f statem ents take th e fo rm o f a particular v aria b le o r variables equaling

so m e value

The i f statem ents are com p letely optional; m e a n in g yo u do not have to

e n te r th e m when p e rfo rm in g a co m m an d , which is w hy th e y are shown in

b rackets above You need to ty p e an i f statem ent o n ly w hen y o u wish to per­

fo rm the operation on a selected set o f cases in the d a ta

Trang 35

O P TIO N S

Most Stata co m m an d s include options th a t can be invoked with th e m As

th e name suggests, o p tio n statem ents are o p tio n a l O ptio n s perfo rm so m e extension or m odification o f th e basic co m m a n d , such as requesting a d d itio n a l statistical m easures o r a different form atting o f th e o u tp u t When a Stata c o m ­

m an d does not p ro d u ce exactly what you w ould like by default, you o ften can

ob tain what you are looking for through the use o f o p tio n s W hen each c o m ­

m a n d is covered th ro u g h o u t th e book, the m ost helpful options will be d etailed

as well F urtherm ore, the Stata Help Files sectio n o f C hapter 8 shows h o w to learn all the possible o p tio n s for each co m m an d

EXECUTING A C O M M A N D U SING THE C O M M A N D W IN D O W

Once you have d eterm in e d which c o m m a n d you n eed to use, w hich v a ri­ables you want to p erfo rm it on, and w h eth er you w ould like to use a n i f statem ent or o p tio n s, you are ready to execute th e com m and

First, be sure th a t you have the C o m m a n d w in d o w selected by click in g

th e mouse when th e cu rso r is anywhere in th e C o m m a n d window N ex t, you

w ill type the c o m m a n d , variable nam e(s), a n d any desired i f sta te m e n ts o r options Instead o f actually typing a variable n am e, yo u can also place y o u r

c u rso r over a p a rtic u la r variable in the V ariables w indow , and w hen y o u click

o n the little arrow th a t appears, its nam e will a p p e a r in the C o m m an d w in ­dow If you are using Stata 11 (o r earlier), you ca n sim p ly click on the v aria b le

n am e in the Variables w indow , and th at v aria b le’s n a m e will appear in th e

C o m m an d w indow O nce you have finished e n te rin g all the in fo rm a tio n , press Enter

Pressing E n te r tells Stata to perform the o p e ra tio n and causes o u tp u t to

be displayed in the Results window N ote th at so m e co m m an d s may w ra p o n to

m o re than one line in the C om m and w indow T his scenerio is co m p letely acceptable Stata trea ts everything typed b efore you press E nter as a single com m and Thus, for each com m and you wish to p e rfo rm , you need to ty p e all

th e required in fo rm atio n a n d press E n te r (i.e., you c a n n o t type several c o m ­

m a n d s in succession in the C om m and w in d o w ) A m e th o d for p e rfo rm in g

m ultiple com m ands a t once is covered in the n e x t chapter

The 5 Essential Commands

T h e following se ctio n provides a closer look a t th e fo u n d atio n c o m m a n d s o f Stata These 5 c o m m an d s accom plish a sig n ifican t p o r tio n of the analyses a n d

Trang 36

22 PART I FOUNDATIONS FOR WORKING W ITH STATA

d a ta m anagem ent th a t is n e e d e d fo r many research p rojects This section

sh o u ld be seen, however, as a n in tro d u c tio n to th e se co m m an d s It explains

th e basics o f each c o m m a n d , w hich for many users m a y be all that is needed

M o re o f th e specifics a n d n u a n c e s fo r each c o m m a n d are covered in the chap­

te r devoted to that p a rtic u la r sta tistica l operation T h e re fo re , the goal of this sectio n is threefold First, it p ro v id e s essential c o m m a n d s th a t perform some

o f the m o st frequently used o p e ra tio n s Second, it gives you a framework on

w h ich to place all o f th e m o re advanced topics to c o m e in later chapters

T h ird , it should give you c o n fid e n c e to tackle th o se m o re advanced topics

W h e n you grasp these core c o n c e p ts, you are in a g re a t p o sitio n to become an effective Stata user

All th e exam ples th a t follo w use th e C h a p t e r 2 D a t a d t a , available

a t w w w sag e p u b c o m /lo n g est, T his d a ta set c o n ta in s 7 variables for 25 cases fro m the N ational S tudy o f Y outh a n d Religion (N SY R ) d a ta (see the Preface

fo r m ore in fo rm atio n on h o w these data were c o lle c te d ) T h is subsample of

th e full d a ta was selected so th a t it w o u ld be possible fo r you to double-check

th e follow ing analyses by p r o d u c in g it by h an d if it is helpful As mentioned

in C h a p te r 1, it is a good id e a to save a copy o f th e d a ta file you are working

w ith so th a t you always have a b ac k u p o f the o rig in a l data

ta b u late

T he first two essen tial c o m m a n d s , - t a b u l a t e - a n d - s u m m a r y - ,

b o th p ro d u c e basic d e s c rip tiv e in fo rm a tio n a b o u t v a ria b le s, which is why

th e y generally are th e first a n a ly tic o p e ra tio n p e r fo rm e d fo r th e vast major­ity o f research studies A gain th is sectio n will p ro v id e m o re o f an overview’

o n how to use these c o m m a n d s , w hereas C h a p te r 4 p re s e n ts much greater

d etail on th e specifics o f u s in g th e se com m ands, as well as m ore detail on

ex ten sio n s to each

O ne o f the first analytic processes taught in sta tistics courses is how to

co n stru c t a frequency d istrib u tio n table Notice, if y o u w ere asking a really

sm a rt colleague to p ro d u ce a d istrib u tio n that ta b u la te s th e values of each of

th e cases, you m ight tell h im o r h e r to “tabulate” th e data T h e abbreviated

co m m an d nam e for - t a b u l a t e - is technically -ta~, b u t it will probably be easier to rem em ber - t a b - as a sh o rten e d version.

To see w hat the - t a b - c o m m a n d does, select th e C o m m a n d window

(i.e., click the m ouse w hile th e c u rs o r is over th e C o m m a n d window), and

ty p e t a b e m p l o y s t ( o r alternatively, type t a b a n d p la ce your cursor

o ver e m p l o y s t in th e V ariable w indow an d click th e sm all arrow that

a p p e ars) T he e m p l o y s t v a ria b le com es from a q u e s tio n asking about the

Trang 37

resp o n d e n ts’ c u r r e n t em p lo y m en t status W h e n y o u have typed th e c o m ­

m a n d , the screen s h o u ld lo o k sim ilar to th e sc re e n s h o t presented b elo w :

Now simply press E n te r, and your screen sh o u ld lo o k like this:

1 (/v # o p tio n o r - s e t maxvar-) 5000 maximum v a r i a b l e s

us* "C:\Documents and sttt lng*\klonge 3t\Ky D o c u

> r 2 \ c h apter 2 Oata.dta", clear

ids gender agecats employst religoth datnum numfrien

(ids)Resp (gender_ (agecats_

(empbyst (reKg0b_W

(datnum_ (numfrien

Trang 38

24 PART I FOUNDATIONS FOR WORKING WITH STATA

Before addressing th e u n iq u e aspects of the in fo rm a tio n provided by the

- t a b - co m m an d , ex am ine so m e o f th e general o u tp u t th a t is produced by all

S ta ta com m ands First, you see w h at the co m m an d “d id ” displayed in the

R esults w indow In this case, th is is a distrib u tio n table sh ow ing the frequency,

p ercen tag e, and cum ulative percen tag e for each ca te g o ry o f the variable

e m p l o y s t Next, ju st above th is o u tp u t, Stata p rese n ts the exact command

th a t was executed to p ro d u c e th e o u tp u t This same in fo rm a tio n is also stored

in th e Review window T hese th re e co m p o n en ts are p ro d u c e d for every com­

m a n d you en ter in th e C o m m a n d w indow

T urn in g to the o u tp u t o f th e - t a b - com m and, as n o te d , it produces a

ta b le show ing the frequency, percentage, and cu m u lativ e distribution of the given variable For the e m p l o y s t variable, six cases a re E m p lo y e d , which is

2 4 % o f th e sam ple Also n o tic e th a t it displays th e to ta l n u m b e r of cases

th a t fall in to at least o n e o f th ese categories o f the v ariable In this case, there

a re 25 cases that were coded in to o n e o f the presented ca te g o rie s.1 The top left

c o rn e r o f th e table lists w h at is called the “variable label.” T hese labels usually

p ro v id e a brief description o f th e variable, such as “ E m ploym ent Status.”

W o rk in g w ith these labels is covered in m ore detail in C h a p te r 3

As w ith all com m ands, th e - t a b - com m and c o n ta in s several options that

c a n be invoked to p e rfo rm differen t o r additional o p e ra tio n s beyond this

d efau lt procedure O n e o f th e m o re useful options for th e - t a b - command is

- s o r t - T h e - sor t - o p tio n tells Stata, as it sounds, to rearrange (i.e., sort)

th e table so that it lists th e categories in d escending o rd e r o f frequency

R em em ber, options are always ty p ed after entering a co m m a, meaning you

w o u ld type tab e m p l o y s t , s o r t in the C o m m a n d w indow and press

E n te r W hen you do this, th e follow ing o u tput is p resented:

N o school or work but l o o k i n g 1 2 8.00 92.00

Active ar med forc e s 1 2 8.00 100.00

T ot al | 25 100.00

‘For these introductory examples, none o f the variables have any m issing data, meaning all of the cases have valid answers for all o f the variables Clearly this situation may not always be the case with real data Handling such missing data will be covered in more detail in the later chapters that more thoroughly discuss using the commands to complete statistical analyses Specifically, see the Data

Trang 39

As you can see, the sam e basic in fo rm a tio n (frequency, percentage, an d cum ulative percentage) is displayed, b u t now th e categories are ordered so th a t

yo u can easily see w hich one contains the m o st respondents and w hich c o n ­

ta in s the least For e m p lo y m en t status, th e m o st c o m m o n category is E m ployed

a n d school, w hereas being O u t o f labor force, N o school or work b u t lo o k in g (fo r a job), and Active arm ed forces are all tied fo r th e least com m on responses.There are several oth er options you can use w ith - t a b - and m o s t o f

th e m are covered in th e Frequency D istrib u tio n s section o f Chapter 4 B u t for now , it is only necessary to understand th e basic fo rm o f how o p tio n s are invoked, as it is sim ilar for all oth er com m ands Also, it should be n oted th a t as

w ith com m ands, o p tio n s have full nam es a n d abbreviated nam es T h e full

n am e will always be presented when it is first in tro d u c e d , with the m o st c o m ­

m o n abbreviation b ein g used in all the follow ing instances

In ad d itio n to p ro d u c in g a d istrib u tio n o f o n e variable, th e - t a b -

co m m a n d can g enerate a cross-tabulation b etw een tw o variables For in stan c e,

you may be interested to know if there is a difference in the re sp o n d e n ts’

em ploym ent status by gender To make this c o m p a riso n , you would w ant to see

th e distribution o f em p lo y m en t status for m ales a n d the d istrib u tio n o f

em ploym ent status fo r females One m e th o d fo r displaying this in fo rm a tio n is

to invoke the - t a b - c o m m an d and list b o th variables instead o f just o n e Type

t a b e m p l o y s t g e n d e r in the C o m m an d w in d o w a n d press E nter D o in g

so produces the follow ing results:

I (g e nd er_w3)Respondent (employstat_w3) | g e n d e r

Employment Status I Male F em a l e I Total

kn o w the percentage o f females in each category c o m p a re d with the percen tag e

o f males in each category

Trang 40

26 PART I FOUNDATIONS FOR WORKING W ITH STATA

To p ro d u c e th e n e c e s sa ry fig u re s, you can a g a in th in k intuitively You

n e e d to ask Stata to p r o d u c e a set o f percentages b a s e d e ith e r on the rows

o r o n th e colum ns B ecause you believe th a t g e n d e r is th e causal variable (i.e , th e in d e p e n d e n t v a ria b le ), y o u w ould w a n t th e percentages in the

co lu m n s T h a t is, you w a n t to b e able to c o m p a re th e p ro p o rtio n of all

fe m a le s w ho are e m p lo y e d w ith th e p r o p o rtio n o f a ll males who are

em p lo y ed T herefore, th e p e rc e n ta g e s need to be c a lc u la te d w ithin the col­

u m n s To have your s m a rt c o lle a g u e m ake th is c a lc u la tio n , yo u might co n ­

s id e r tellin g him o r her, fo r s h o r th a n d , “co lu m n s.” F o llo w in g this logic, th e

o p tio n to present th ese p e rc e n ta g e s is - c o l u m n - [If y o u wanted th e

p erc e n ta g e s in the row, as y o u m ig h t have g u e sse d , th e o p tio n would be

-row-.] Type ta b e m p l o y s t g e n d e r , c o l in th e C om m and w in­

Employment Status I Male F e m a l e | Total

Ngày đăng: 02/09/2021, 21:04

w