All of the same features are covered, but Stata 12 has a slightly different appearance from these previous versions, which may make matching up what you see in the text and on your scree
Trang 3for
Q uantitative Analysis
Trang 4SAGE Publications India Pvt Ltd.
B1A1 Mohan Cooperative Industna! Area
Mathura Road New Delhi 110 044
Executive Editor: Jerry Westby
Production Editor Brittany Bauhaus
Copy Editor QuADS (P) Ltd
Typesetter: C&M Digitals (P) Ltd
Proofreader: Eleni-Maria Georgiou
Cover Designer: Anupama Krishnan
Marketing Manager: Erica DeLuca
Permissions Editor: Karen Ehrmann
( Copyright © 2012 by SAGF Publications, Inc
All rights reserved N o part of this book may
he reproduced or utilized in any form or by
.my m eans, electronic or mechanical, including
photocop yin g, recording, or by any information storage and retrieval system, without permission
in w ritin g from the publisher.
P rinted in the United States of America
Library o f Congress Cataloging-in-Publication Data
1 Stata 2 Social sciences— Graphic
m e t h o d s —Computer programs 3 Social sciences— Statistical m ethods—C o m p u te r
programs I Title.
HA32.L66 2012 005.5'5— dc23 2011041851
Certified Chain of Custody
SUSTAINABLE Promoting Sustainable Forestry
INITIATIVE www.sfiprogram.org"
SFI label applies to text stock
11 12 13 14 15 1 0 9 8 7 6 5 4 3 2 1
Trang 5Preface ix
Chapter 5: Relationships Betw een N o m in a l and Ordinal V ariables 113 Chapter 6: Relationships Betw een D ifferen t Measurement Levels 137
C hapter 7: Relationships Betw een Interval-Ratio Variables 148
Trang 6Detailed Contents
A bout th e N ational Study o f Y outh and Religion x
O p e n in g and Saving S tata D ata Files 5
U sing D ifferent Types o f D a ta Files in Stata 10
T ypes o f Variables in D ata Files 13
E xecuting a C om m and U sing the C om m and W indow 21
Trang 7T ranslation F rom the C o m m an d W in d o w 58
G ettin g the M ost O ut o f D o Files 66
H istogram s, Bar Graphs, an d Pie C h a rts 97Measures o f C e n tral Tendency an d V ariability 102
Trang 8S u m m a ry o f C o m m an d s U sed in T his C hapter 146
D ichotom ous (D u m m y ) V ariables and Linear
S u m m a ry o f C o m m an d s Used in T his C hapter 165
A dvanced C onvenience C o m m a n d s 175
S u m m a ry o f C o m m an d s Used in T his C h ap ter 185
Trang 9Motivation and Purpose
T h e motivation for th is book, as I assume is tru e for m o st, came from a series
o f personal experiences First, as a graduate stu d e n t, I rem em ber literally laying awake at night d re a d in g the idea o f using a c o m p u te r program to c o n d u c t statistical analyses T h e first statistics course I to o k required Stata to co m p le te
th e assignments a n d the final research project T his necessity was so over
w helm ing at the tim e , in p art, because there d id n o t seem to be any s tr a ig h tforw ard, concise tex ts explaining the basics o f Stata O ver my tim e in g ra d u a te school, I came to be v ery fam iliar with Stata, even to th e point that I d eveloped
a serious passion fo r b o th learning Stata and te ac h in g it to students w h o were facing the same fears I once did In a som ew hat m irro re d experience, I was
ho p in g to use Stata as a significant portion o f th e classroom experience and requirem ents w hen I first began teaching a co u rse on Q uantitative A nalysis I
so o n realized that th e re still was not a m anageable in tro d u c to ry text o n th e use
o f Stata for q u an tita tiv e research.1 Thus, I so u g h t to co n trib u te to filling this
v oid by providing a straightforw ard, applied in tro d u c tio n to using Stata.This book will b e m ost beneficial to rea d ers w h o are novices w h e n it com es to Stata and a re at least in the early stages o f learn in g strategies fo r c o n
d u ctin g quantitative analysis It does assum e th a t th e reader has a w o rk in g know ledge o f basic statistical techniques and term inology The o rg an iz atio n
a n d coverage o f th e b o o k is guided by the c o n te n t a n d ordering o f topics fo u n d
in m ost in tro d u c to ry social statistics textbooks In this m anner, it can serve as
a n excellent co m p a n io n , eith e r for a class o r self-learner, to such a te x tb o o k
'Assuredly, there are several very good and effective texts on learning Stata Virtually all o f these, however, are aimed at experienced users or are so detailed and long that they are not helpful for a typical classroom in which teaching Stata is not the primary purpose.
Trang 10x U SIN G STATA FOR QUANTITATIVE ANALYSIS
To b e clear, this book sh o u ld n o t be used to learn statistics o r quantitative analysis Som e basic assu m p tio n s a n d explanations are pro v id ed , but these
sh o u ld n o t be used in place o f a m o re th o ro u g h coverage o f each o f th e analytic strateg ies T he statistical g ro u n d in g fo r this book is based prim arily on
F ra n k fo rt-N a ch m ias and L e o n -G u e rre ro ’s (2009) Social Statistics fo r a Diverse
Society T h e definitions an d in te rp re ta tio n s o f the specific m easures and tests
are b a s e d on those presented in th is text O f course, any inaccuracies or mistakes a re solely m ine
A lso, this b o o k does n o t a tte m p t to cover every aspect o f each Stata com
m a n d th a t is introduced M ore ex p e rie n c e d users u n d o u b te d ly know shortcuts
o r a ltern ativ e m ethods for the te c h n iq u e s that are p resen ted B ut the given
d e s c rip tio n has been geared to in tro d u c e com plete novice users to Stata This targ eted audience requires th a t th e exp lan atio n starts w ith the basics before
ju m p in g in to th e advanced features T h e presented c o m m a n d s an d procedures are discussed because they are th e m ost simplified strategies th a t effectively accom plish th e pertinent goals
About the National Study of Youth and Religion
T he d a ta for this book com e fro m th e N ational Study o f Y outh a n d Religion (NSYR) T he NSYR is a lo n g itu d in a l, nationally representative telephone survey o f U.S young adults T here a re th ree waves o f data, all o f w hich are publi- cally available
T h e variables that are used in th e exam ples th ro u g h o u t this book come from th e m ost recent follow-up su rv ey o f 2,532 young a d u lts com pleted in the fall o f 2007 At the tim e o f this survey, the respondents were all betw een the ages
o f 18 a n d 24 Each respondent co m p le te d a com puter-assisted te lephone interview ing (CATI) survey that lasted approxim ately an hour T h is data set covers a bro ad array o f topics, m aking it possible, across examples, to use variables pertin e n t to several disciplines For exam ple, it contains several stan d ard selfesteem m easures o f interest to psychologists, a wide a rra y o f questions on religion useful for sociologists, n u m e ro u s questions on finances (e.g., debt) applicable to econom ics, and m easu res o f substance use b eh av io rs th a t would
be p e rtin e n t to social work o r h e a lth researchers The full d a ta set a n d docu
m e n ta tio n can be dow nloaded fro m the Association o f R eligion D ata Archives (http://w w w thearda.com /A rchive/F iles/D escriptions/N S Y R W 3.asp)
T h e first wave of the survey sam p led 3,290 U.S E nglish- a n d Spanishspeaking teenagers, ages 13 to 17 T he sam pling and su rv ey w ere conducted from July 2002 to August 2003 u sin g random -digit-dialing, draw in g o n a sample o f ran d o m ly generated te lep h o n e n u m b e rs representative o f all noncellular
p h o n e n u m b e rs in the United States T he overall response ra te o f 57% for the
Trang 11first survey is lower th a n desired, but it is sim ilar to o th e r current n atio n ally based surveys using sim ilar m ethodologies F u rth e r com parisons o f th e NSYR data with 2002 U.S C ensus data on households a n d w ith nationally rep rese n tative surveys o f adolescents— such as M o n ito rin g th e Future, the N atio n al
H ousehold Education Survey, a n d the N ational L o n g itu d in al Study o f A dolescent
H ealth— confirm th a t the NSYR provides a natio n ally representative sa m p le o f U.S teenagers aged 13 to 17 years and their p a re n ts w ith o u t identifiable s a m pling or nonresponse biases (for details, see S m ith & D enton, 2005) The follow -
u p sample that is used in the data sets com es fro m this initial sample o f 3,290 teens To obtain m o re inform ation regarding th e technical details an d d o c u
m entation o f the NSYR, please visit http://w w w youthandreligion.org/
A Note on Versions
All the com m ands a n d exam ples for this bo o k w ere p roduced using S tata 12.0
fo r Windows The p rim a ry com m ands and o p tio n s are sim ilar for o ld e r versions, dating back u n til at least Stata 9 T here were, however, a few ch an g es between Stata 11 a n d Stata 12 Most o f these changes d o not affect th e ac tu a l functionality but ra th e r deal w ith convenience a n d appearance In fact, m o s t o f
th e substantive differences th a t the new users w o u ld encounter fall u n d e r th e topics covered in C h a p te r 1
Due to the very recent release o f Stata 12 (July 2011), many readers m ay still be using Stata 11 o r even Stata 10 To ad d ress this potential challenge, this
b o o k includes two versions o f the in tro d u c to ry m a teria l (i.e., Getting to K now Stata) The vast m a jo rity o f th e m aterial in b o th versions is extremely sim ilar,
b u t both were included to prevent any c o n fu sio n over th e small dissim ilarities For users of Stata 12, please sta rt with C h a p te r 1: G ettin g to Know Stata 12 For users of Stata 11 (or o lder), please start w ith A ppendix: G etting to Know S tata 11,
a n d then rejoin the b o o k at C hapter 2 From th a t p o in t on, all o f th e c o m
m an d s and strategies are equivalent across versio n s (although the ap p e a ra n c e
o f the screenshots m ay be slightly different)
The vast m ajority o f the com m ands presen ted are sim ilar for Stata for M ac
as well The appearance and w ording o f som e ico n s as well as the pathw ays for
th e point-and-click m e n u s m ay be slightly d ifferent for a Mac operating system
Trang 12xii U SIN G STATA FOR QUANTITATIVE ANALYSIS
v ariab le nam es in a p articu la r d a ta set, su ch as g e n d e r o r i d s It will also be used to show th e display fro m th e Stata Results w indow (if the actual screen
sh o t is n o t sh o w n )
T h is fo n t w ill be used to d e n o te a com m and th a t is entered into the
C o m m a n d w in d o w to p e rfo rm a given o peration A dditionally, if these com
m a n d s are p rese n ted by them selv es w ith in a sentence, th e y will be set apart by
a d ash p re a n d p o st (e.g., - r e p l a c e - ) so that they are n o t confused with a
v ariab le nam e
T h e m a jo rity o f this b o o k discusses the syntax c o m m a n d interface (i.e., the C o m m a n d w indow ) aspect o f Stata But there will b e tim es when the
m e n u , p o in t-a n d -c lic k interface is d escribed Menus (e.g., F ile ), clickable but
to n s (e.g., O K ), o r keys o n th e k ey b o a rd (e.g., E nter) will b e d en o ted with the
A ria l fo n t
Finally, S tata is a case-sensitive p ro g ra m , m eaning th a t all com m ands and variab le nam es m ust be typed exactly as they are show n F o r the purposes of this b o o k , this sensitivity m eans th a t at tim es the cap italizatio n m ay not follow typical g ram m atic al conventions For exam ple, if a variable nam e starts a sentence a n d th a t variable n am e is low ercase, then that sen ten c e will start with a low ercase letter
References
F ra n k fo rt-N a c h m ia s, C., & L eo n -G u e rre ro , A (2009) Social statistics for a diverse society
( 5 th e d ) T h o u s a n d O aks, CA: F in e Forge Press
S m ith , C , & D e n to n , M L (2005) Soul searching: The religious and spiritual lives of
Am erican teenagers. New York, NY: O x fo rd University Press
Trang 13The author an d SAGE gratefully acknow ledge th e c o n trib u tio n s o f th e fol
lowing reviewers:
Karen Y H olm es, Norfolk State University, Norfolk
Sean Kelly, University o f Notre Dame
David Peterson, Iowa State University, Ames
Raymond S anchez M ayers, Rutgers University, N ew Brunswick
Trang 15PART I
Foundations for Working With Stata
Trang 16Getting to Know Stata 12
Fo r m any people, learning an y new com puter software can be an anxiety-
p roducing task W hen that c o m p u ter program involves statistics, the stress level generally increases exponentially If you have similar feelings as you begin your jo urney into becom ing a Stata user, d o not fear, you are n o t alone This book
is designed w ith this apprehension in m ind O ne of the p rim a ry goals o f this book
is to help alleviate, or at least m inim ize, this anxiety as we m ove tow ard becoming
an effective and proficient Stata user Keep in mind that at o n e tim e you may have had sim ilar feelings about using e-m ail o r the Internet, an d just as m any people now feel extremely com fortable u sing these programs, by th e en d o f this book you will have a sim ilar grasp o f and co m fo rt w ith Stata
Before diving into all the details of using Stata, it is im p o rta n t to have an understanding o f its various com ponents This chapter will serve as an introduction to the basic building blocks o f th e Stata program Each o f these aspects will be covered in m uch more detail th ro u g h o u t th e book, but this chapter provides an overview of the basic functionality o f the Stata program T he second section of the chapter explains how data are o p en ed , im ported, and entered
W hat You See1
W h en you o p en Stata, by d o u b le clicking on the Stata ic o n , for th e first time, you w ill see th e following screen:
'If you are using Stata 11 (or Stata 10), please use Appendix: Getting to Know Stata 11 instead of this first chapter All of the same features are covered, but Stata 12 has a slightly different appearance from these previous versions, which may make matching up what you see in the text and on your screen a bit confusing Starting from Chapter 2, the vast m ajority o f operations and com mands are similar across versions A nd the text specifically notes any particular features that are different for previous versions.
2
Trang 17'• r* UtM : h HL' »«*«1 UW »» <*
There are five different windows on the screen 2
1 Results W indow: T he Results w indow is w here everything th a t Stata
“d o es” will be displayed A nytim e Stata executes so m e o peration, it will display
th a t operation and its results in this window T hese results, however, a re n o t autom atically saved H ow to save these resu lts is covered in th e D ata
M anagem ent: Saving Results section o f C h ap ter 3
2 Review W indow: The Review w indow c o n ta in s a running h isto ry o f all
th e operations that have been perform ed d u rin g th e c u rre n t session o f Stata
W henever you enter a n d execute a com m and, it w ill a p p e a r both in the R esults
w indow and in the Review window The m ost useful aspect of the Review w in
d o w is that it can b e used as a shortcut to w o rk w ith a previously executed
co m m an d W hen y o u click o n a c o m m a n d in th e Review w indow , th a t
:This layout is what you would see if Stata was opened “right out o f the box.” If you are working
on a shared computer (or over a network), there is a chance that these windows have been m oved, resized, or even deleted by another user, making what you see slightly different from the screenshot presented If any o f these windows are missing, you can click on the W indow s tab and click on the desired window You can also move these windows by sim ply clicking on them with your m ouse
Trang 184 PART I FOUNDATIONS FOR W O RK IN G WITH STATA
c o m m a n d will appear in th e C o m m a n d w indow, from w h ich you can alter the
c o m m a n d o r sim ply reru n th e sa m e co m m an d
3 V ariables W indow: W h e n you o p e n a data file in S tata, th e variables
c o n ta in e d in th a t data set w ill b e listed in the V ariables w indow This w in
d o w c a n be u se d to scroll th r o u g h a n d see all the v aria b les th a t are contained
in th e active data W henever y o u click on a variab le n a m e listed in the
V ariab les w indow , several c h a r a c te ris tic s o f that v aria b le are displayed in the
P ro p e rtie s w indow If you p la c e y o u r c u rso r over a v a ria b le , a sm all arrow
w ill a p p e a r By clicking o n th a t arro w , th e variable n a m e will autom atically
a p p e a r in th e C o m m an d w in d o w T h is w indow also lists th e variable
“L abel,” w hich presents m o re d e ta ile d in fo rm atio n a b o u t th e variable
L abels are discussed in m o re d e ta il in th e Data M a n ag e m en t: W orking With
L abels se ctio n o f C h ap ter 3
4 P rop erties W indow: T h e P ro p e rtie s w indow p ro v id e s d etails about
th e d a ta set th a t is c u rre n tly b e in g u se d and any v a ria b le th a t has been
se lec ted (by clicking on it) fro m th e Variables w indow For th e data, this
w in d o w p ro v id es the file n a m e , th e n u m b e r of variables th a t are included in
th e d a ta , a n d th e n u m b er o f “o b s e r v a tio n s ” (e.g., su rv ey re sp o n d e n ts) For a given v aria b le, th e P ro p erties w in d o w lists the v ariable n a m e , its type, for
m a t, a n d value label D etails o n each o f these d e s c rip to rs are discussed later
in th is ch ap ter By default, th e P ro p e rtie s w indow is “ lo c k ed ,” m eaning you
c a n n o t chan g e any o f these c h a ra c te rs tic s directly fro m th e P ro p e rties window C lick in g o n the p ad lo ck ic o n (lo c a te d in the u p p e r left c o rn e r of the
P ro p e rtie s w indow ), how ever, u n lo c k s th e P roperties w in d o w a n d allows you to ch an g e th e aspects o f th e v a ria b le sim ply by click in g on th a t property (e.g., th e v ariable nam e) M o re d e ta ils o n this process a re p ro v id e d later in
th is c h a p te r
5 C om m and Window: T h e C o m m a n d w indow is w h e re yo u will enter
th e o p e ra tio n s th a t you w ant S ta ta to p e rfo rm , w hen u sin g th e “sy n ta x ” interface Syntax, o r code, is a n o th e r te rm fo r Stata’s c o m m a n d language These are th e w ords th at tell Stata w h a t p ro ce d u re s to execute C o m m an d s are
e n te re d , one a t a time, in this w in d o w A fter you type a c o m m a n d into the
C o m m a n d w indow , pressing th e E n te r key on your k e y b o a rd m akes Stata
ex ecu te the p rocedure th a t is d e fin e d b y the C o m m an d O n e h elp fu l feature
o f th e C o m m a n d w indow is th a t you c a n scroll th ro u g h p re v io u sly executed
c o m m a n d s by pressing th e P a g e U p key W hen you fin d th e p rev io u s com
m a n d you are interested in, you ca n e ith e r alter it o r sim p ly press E n te r again
to r e r u n the sam e com m and T h e m a jo rity o f this b o o k will be devoted to
ex p lain in g a n d describing th e v a rio u s co m m an d s th a t y o u will n ee d to use to
p e rfo rm q u antitative analyses
Trang 19There also are several icons at the to p o f th e screen The p u r p o s e a n d
u se o f these icons a re covered th ro u g h o u t th e b o o k Each o f th e se b asic
w indow s will b ec o m e fam iliar to you as we go th r o u g h this book F o r now ,
b e sure that you feel co m fo rta b le id entifying th e m a in p urpose o f e a c h o f
th e windows
Getting Started With Data Files
W hen w orking w ith S tata, y ou will be using w h a t is referred to as a “d a ta file.”
I f you are fam iliar w ith typical database p ro g ra m s, th e n you already k n o w
w h a t a data file basically is These files c o n ta in in fo rm a tio n (often n u m e r ical) on a set o f cases, such as respondents to a survey, a sam ple o f sc h o o ls, o r each o f the states in th e U nited States G enerally, d ata files are o rg an iz ed su c h
th a t inform ation reg a rd in g each case is c o n ta in e d in one row in th e file,
w hereas each colum n rep rese n ts a variable (i.e., in fo rm a tio n a b o u t th a t case),
su c h as a person’s gender, a sc h o o l’s total n u m b e r o f stu d e n ts, or a sta te ’s to ta l
sq u a re miles
Similar to m ost c o m p u te r files, data files c o m e in m an y different types But
ju st like a PDF file is very sim ilar to a w ord d o c u m e n t, so too are all d a ta files essential derivations o f a sim ilar structure Each o f these derivations is d e n o te d
by a different file extension— the letters th at c o m e after the in a file n a m e
T h e prim ary file for S tata data files is dta M oving o th e r types o f data files in to Stata (e.g., M icrosoft Excel files) is covered in th e U sing D ifferent Types o f D ata Files in Stata section o f this chapter
O P E N IN G A N D S A V IN G STATA DATA FILES
To open a data file that is in Stata fo rm at (i.e., o n e th at has a dta e x te n sio n ), select the File m e n u (in the upper le ft-h a n d c o rn e r), then choose O p e n
O r alternatively, you can sim ply click on the ¿Jf icon F ro m here you will n ee d
to search through th e disk drives and folders o n y o u r com puter to fin d y o u r saved data file This ch a p te r uses the data file, available a t w w w sag ep u b co m /
lo n g e st nam ed C h a p t e r 1 D a t a d t a O nce you have found y our d a ta file,
d o u b le click the file H aving d one this, you w ill n o tic e th at the Stata screen looks different from h o w it d id initially
The first o p era tio n you perform ed is n o w displayed in both th e R esults
a n d Review windows Again, w henever we tell S tata to “d o ” som ething, w h e th e r
th ro u g h the p o in t-a n d -c lic k m enus o r by e n te rin g a com m and in th e
C om m and window, it will be displayed in th e Results a n d Review w in d o w s Because opening a d a ta file does not have any “results,” only the c o m m a n d is
Trang 206 PART I FOUNDATIONS FOR W ORKING WITH STATA
4905 Lakevay Drive College Station, Texas 7 7 8 4 5 USA
1 (/v# opt i o n or - s e t maxvar-) 5000 maximum variables
use "C:\DocuMnts and Settinga\ldongest\My Documents\stata\Data\Chapte
> r INChapter 1 Data.dta”
mm
Variable Label ids (kfc)Re*p
gender (gender, agecats (agecat*,^ employs* (empbyst refcgotfi (refcgOb_w
m o st im p o rta n t aspect is the v aria b le nam e In this data set, th e five variables are n am ed i d s , g e n d e r , a g e c a t s , e m p l o y s t , a n d r e l i g o t h These variable nam es should give you so m e indication o f w hat ty p e o f information
th e variable contains T he v ariab le gender, for exam ple, says w hether each resp o n d e n t is a male or a fem ale
It is a good practice to always save a copy o f your d a ta files a n d only work
w ith th a t duplicated version W h e n w orking with and analy zin g d ata, you will often be forced to change aspects o f th e data files For ex am p le, you may need
to create a new variable o r change so m e th in g about an ex istin g variable But it
is im p o rta n t to have an orig in al version o f the data, ju s t in case something
un d esired occurs Don’t w o rry to o m u ch ; m ost alterations y o u p e rfo rm can be
u n d o n e or recovered W orking w ith a duplicate copy o f th e d ata is simply an
a d d e d protection
To save a duplicate copy o f th e d a ta file you have ju s t o p en e d , open the
F ile m en u an d click on S a v e A s You can then enter a n e w file n am e, such as
C h a p t e r 1 D a t a m y c o p y d t a , an d click S ave T h is is th e procedure you will use w henever you w an t to save a new version o f y o u r d a ta file
Trang 21A Closer Look: Stata Data Files Across Versions
As was noted in the Preface, the vast majority o f Stata features and com mands are similar across versions (e.g., Stata 12, 11,10, etc.) This is true o f Stata data files, by and large All Stata data files th a t are created a n d /o r saved in an older version can be read by a newer version (i.e., forward com patible) That means that if you are using Stata 12 but are working w ith colleagues who are using Stata 11, any files they send to you will open without a problem
During certain upgrades, however, Stata data files cease to be "backward" compatible, meaning files saved in a newer version cannot be opened
by older versions Stata 12 happens to be one o f those upgrades If you are using Stata 12 and send a data set th a t you saved in Stata 12 to your colleagues who are using Stata 11, they will not be able to open it [Note: This
is not a problem if you are moving files between Stata 11 and Stata 10, as these two versions are completely compatable w ith each other.]
Do not despair Stata has built in a very simple feature to overcome this problem If you know that you want the data you are using in Stata 12 to
be opened by older versions, you need to take one extra step (from the process just explained)
First, click on the File menu and then click on S a v e As Now, use th e drop-down menu in the S a v e as T yp e box and select Stata 9 /1 0 D a ta (*.d ta ) option The option is listed as "Stata 9 / 1 0 ” and not 11 because Stata Versions 8 and 9 as well as 10 and 11 are completely compatible w ith each other (both forward and backward), so using this option actually allows the data to be opened in any version o f Stata from 8 through 12 Note that you do not need to change the file extension, it is still d t a Once you have named your file, click S ave You will know that you have saved the data correctly when the output in the Results starts w ith
saveold, which is telling you that the file has been saved in a way th a t makes it readable by the previous versions Again, note th a t when you save
a file in this way, it can still be used in Stata 12
D ATA BROWSER A N D EDITO R
If this is the first tim e y o u are w orking w ith d ata, it may be h elp fu l to actually “see” the d ata Even if you have ex p erien ce u sing data, it may o fte n be helpful to look at th e d ata you are exam ining To see th e data file in S tata, you
Trang 228 PART I FOUNDATIONS FOR W O RK IN G WITH STATA
can click on th e D ata Brow ser ic o n , in the m iddle o f th e to p o f the screen
W h e n you d o so, you will see a n e w w in d o w that ap p e ars as sh o w n below:
C *-» JTUBt, A->l IVLMpU« \ IMI4 m r o>f «u
.1 ill, H l l
PtÊ im tmm
ids g«nder ag e c a t s e m p l o y » r e H g o t h
1 41 8 4 1 Mai* 23 No »chool or M O R M O N
3 9534 Mal« 22 Acti v e arm e d P E N T ECOSTAL
4 1 0 2 8 1 Female 19 Em p l o y e d N O N D E N O M I NATIONAL
5 13530 Fanale 18 E m p loyed and BAPTIST
6 11079 Mal« 19 In school on NONDENOMI N A T I O N A L CHRISTIAN
7 3135 Fanal« 18 E m p loyed and MOR M O N
8 4 3 3 1 Fonal « 21 In school on PROTESTANT
9 4 9 2 9 Female 2 1 E m p loyed and DOVER FIRST C H R I S T I A N CHURCH
10 5 228 Mal« 1 9 Out o f labor E P I S COPALIAN
A,
-T h is new window, as is d e n o te d in its upper le ft-h a n d co rn er, is the Data
E d ito r (Browse) window T h e “ (B ro w se)” aspect indicates th a t you are only
lo o k in g at th e data, not actually c h a n g in g them
In this w indow, you see all five o f the variables th a t w ere listed in the
V ariables w indow As was m e n tio n e d earlier, each row is a d iffe ren t case (i.e., a
N atio n al Study o f Youth and R eligion [NSYR] resp o n d e n t), a n d each column
is a different variable Each cell th e n c o n tain s inform ation o n th e given variable for th a t case For example, th e case in th e first row is a “M a le ” respondent who
m e n tio n e d th a t “M orm on” was h is o th e r religion To close this window, click
o n th e red “X” in the u p p er r ig h t-h a n d corner
T here may be times w hen y o u w ant to change the value o f a particular case
on a n individual variable O ne w ay to do so is by using th e D ata E d ito r window (A m o re efficient way to change th e values of m ultiple cases is covered in The
5 Essential C om m ands: replace (if) section o f Chapter 2.) To begin, click on the
D ata Editor icon, which is nex t to th e Data Browser icon You may notice
th a t th e Data Editor and Data Browser windows look very sim ilar T h e main difference is th at in the upper le ft-h a n d c o rn e r of the w indow , after “ D ata Editor,”
Trang 23th e window now reads “ (Edit).” It is im portant to know which window y o u have opened because you can change the values o f th e data w hen the Editor is o p en
To prevent any accidental alterations, it is generally advised only to use th e D ata Browser window unless you are certain you w ant to change a particular value.After you have o p en ed th e Data Editor w indow , use the direction keys (o r
m ouse) to highlight th e cell you would like to change For example, y o u m ay have realized that th e first case’s age was incorrectly entered in the d a ta file Instead o f being 23 years old, this case should o n ly be 22 years old To m a k e
th is change, once y o u have th e cell in the first row listed under a g e c a t s highlighted, type 22 a n d press Enter This case’s value for the variable a g e
c a t s has now changed W hen you close the D ata E d ito r window, this o p e r a tio n has been recorded and displayed in b oth th e Review and Results w in d o w s
A Closer Look: Your First Command
You may have noticed that when you changed the first case's value using th e Data Editor window, the following text was displayed in the Results window:
r e p l a c e a g e c a t s = 2 2 i n 1 (1 r e a l c h a n g e m a d e )
Whenever you use the menus or a point-and-click method for performing
an operation in Stata, it displays the command th a t would be entered in th e Command window to perform the same operation in the Results window In this Data Editor example, you can see th a t the command to change a value
is -replace- If you had entered this full command into the Command
window and pressed E nter, the same change would have been made A t times, it may be helpful to perform an operation for the first time using th e menus, but, as w ill be discussed in much more detail in Chapter 2, it is extremely beneficial to know and use the commands via the Command window for the m ajority o f the operations you need to perform
The rest of this book will discuss how to perform operations using th e Command window But to see the connection between the menu-based operation and the Command window, try this: Type (or copy and paste) th e full command (except the f ir s t ".") that was displayed in the Results w indow when you closed the Data Editor window into the Command window Now change the "22" to "23." The command should read
r e p l a c e a g e c a t s = 2 3 i n 1Then press E n te r Open the Data Browser w indow again and notice th e change to the first case's value under a g e c a t s
Trang 2410 PART I FOUNDATIONS FO R WORKING WITH STATA
E N T E R IN G Y O U R O W N D A T A
M any d a ta files that you w ill analyze will already b e in Stata form at or in
a fo rm a t th a t can be easily co n v e rted to Stata fo rm at (m o re o n this topic below ) Yet th e re may be tim es w h en y o u need to enter th e data from a study
F or exam ple, if you d istrib u te d a survey through the m a il, you will need to
in p u t th e responses to each q u e s tio n for each case so th a t y o u can analyze them
in S tata
T h e first step in en terin g y o u r ow n data after you h av e o p e n e d Stata is to
o p e n th e D ata Editor w in d o w as above From here you can sim ply enter the values for each case on each variable E ntering data in th is way is very similar
to en te rin g values into a M ic ro so ft Excel file The D ata E ditor, however, does
n o t have the equation fu n ctio n alities th a t an Excel file w ou ld
W hen you begin e n te rin g values, each variable is au to m a tic a lly named
v a r l , v a r 2 , an d so on M ost o ften it is helpful to have th e v ariable names be
m o re descriptive of the values th e y c o n ta in O ne way to ch a n g e these generic
n a m e s to som ething th a t m o re clearly identifies the v aria b le is to click on the
c u r re n t nam e o f the given v aria b le y o u w ant to re n a m e (e.g., v a r l ) listed
n e a r th e top o f the E ditor w indow D o in g so will b r in g up th a t variable’s
in fo rm a tio n in the P roperties w in d o w (inside the D ata E d ito r w indow ) Then click on the c u rren t variable n a m e listed in the N a m e b la n k in th a t Properties
w indow F rom there you can sim p ly d elete the c u rre n t n a m e a n d enter the
d esire d nam e A nother o p tio n w ould b e to close th e D ata E d ito r window
w h e n you have finished e n te rin g all o f the data T hen y o u can click on the
v aria b le nam e (e.g., v a r 2 ) in th e V ariables window, w h ic h will b rin g up that
v a ria b le ’s info rm atio n in th e P ro p e rtie s window To c h a n g e th e nam e in this
P ro p e rties w indow , you will n e e d to click on the padlock ic o n in th e Properties
w indow T h en you click on th e c u rre n t variable n am e listed in the Name
b la n k and sim ply type the new n am e in the blank
O n ce you have finished e n te rin g all o f your data, clo se th e D ata Editor
a n d follow th e steps d escrib ed above to save a copy o f y o u r d a ta file in Stata
fo rm a t
U S IN G DIFFERENT TYPES O F DATA FILES IN STATA
S om e d a ta files may n o t b e available in Stata fo rm a t T herefore, a few ste p s are needed to w ork w ith th e se files in Stata It w o u ld be v irtu a lly im possible to cover every possible d a ta file ty p e and how each c a n be transferred to
be u sable in Stata Instead, th e m o st c o m m o n type will b e covered Also note
th a t there are other c o m p u te r softw are p rogram s th a t are specifically
d esig n ed to convert data files in to v ario u s form ats (e.g., S tat/T ra n sfe r) If you have access to such a p ro g ra m , it is p ro b ab ly the m o st effective a n d efficient
Trang 25w ay to transfer files in to a Stata form at S o m e sta tistica l software packages also offer the o p tio n o f saving a data file in a d iffe re n t form at, w hich often includes the Stata, d ta extension.
One o f the m o st frequently encountered d a ta file type that is n o t S tata- read y is a M icrosoft Excel file Usually these files are denoted w ith th e .xls extension, but o th e r extensions (e.g., csv) th a t are generated or readable by
M icrosoft Excel can all be treated in a sim ilar fashion
This process requires th a t you have access to a n d som e fam iliarity w ith
M icrosoft Excel To sta rt, o p en the data file in M ic ro so ft Excel T hen h ig h lig h t
th e entire w orksheet th a t contains the data a n d co p y it (either by right clicking
a n d choosing C o p y o r using the copy fu n ctio n (C tr l+ C )) Next, in S tata o p e n
th e Data Editor w indow , highlight the u p p e r left d a ta cell, right click a n d choose Paste, o r use the paste function (C tr l+ V ) O n c e you pasted in th e
d a ta , you should be presen ted with a w indow th a t asks w hether you w a n t to
T re a t First R ow a s D ata o r T reat F irs t R o w a s V a ria b le N a m e s T h e
o p tio n that you ch o o se will depend on w h e th e r y o u r Excel file co n tain s v a riable names in the first row o r w hether it c o n ta in s o nly data The two fo rm a ts
a re show n below
First Row as Variable Nam es
S : “ !' h»ri *•«« i*t%*A Clue*«« 1 P«u iS |C«n>uMiitty Mod»| • Mitre soft E»tt*
A' 4'I*-A-
H#-1 iff UP *£»*»» • I • fc m
NONDENOMINATIONAL CHRISTIAN MORMON
PROTESTANT DOVER FIRST CHRISTIAN CHURCH EPISCOPALIAN
Trang 2612 PART I FOUNDATIONS FOR WORKING WITH STATA
First Row as Data
9 4929 Female 21 Employed and DOVER FIRST CHRISTIAN CHURCH
th e data from within Stata as a Stata d ata file O nce y o u have saved your data
as a Stata d ata file, you can sim ply o p e n and use this v ersio n o f your data.1Stata 12 (but n o t th e p re v io u s versions) o ffe rs a n o th e r method for
b rin g in g d a ta from an Excel file in to Stata that m ay b e even slightly quicker
A fter o p e n in g Stata, click o n th e F ile m enu, follow ed by Im p o rt Select the
E x c e l s p re a d s h e e t (* x ls * x ls x ) o p tio n ,4 an d th e fo llo w in g window will
a p p e a r:
T h is “copy and paste" method is the easiest way to transfer data from Microsoft Excel into a Stata format, especially for novice users But there are some disadvantages to this strategy More practiced users should transform Excel worksheets into csv files and then im plem ent the - i n s h e e t - command The specifics o f this command are beyond the scope o f this introductory text, but the Stata Help Files section o f Chapter 8 provides information on how Stata’s Help files can be used to learn how to use this command.
'It you are using Stata 12, you will also notice that you could select several different data file formats from this window The general procedure for each o f these formats is very similar to the
Trang 27O Import fast row as variable names
d Import all data as strings
Preview:
Cancel
Click on the B ro w s e b u tto n to find the Excel d ata th at you w ould like to
tu r n into a Stata d ata set O nce you have selected the Excel file, you can p ic k a particu lar w orksheet from th a t file or even a p a rtic u la r set o f cells by u sin g th e
co rresponding boxes N otice th a t you still need to d ecide and tell Stata w h e th e r the first row in the Excel file contains variable n am es o r actual data If th e first row contains variable nam es, click the radio b u tto n n ex t to Im port fir s t ro w
a s v a ria b le n a m e s (w hen you do this, notice th a t th e data shown in th e p re view w indow will change) T h en click OK As d e scrib ed above, you can follow
th e previously d escribed steps to save the data fro m w ith in Stata as a S tata data file O nce you have saved your data as a Stata d a ta file, y o u can simply o p e n an d use this version o f y o u r data
TYPES OF VARIABLES IN DATA FILES
At this point, you should feel com fortable w ith th e basic structure o f d ata files Each row holds th e inform ation for one case a n d each colum n is a d iffe r
e n t variable W ith th is know ledge, you are alm o st ready to start analyzing y o u r data There is, how ever, one distinction in th e types o f variables in c lu d ed in
d ata that is im p o rta n t to understand
Trang 2814 PART I FOUNDATIONS FOR WORKING WITH STATA
To help illustrate this difference, consider the NSYR variable, in the
C h a p t e r 1 D a t a d t a file, gender This variable c a m e fro m the follow
in g q u e s tio n asked o f all re sp o n d e n ts:
A re you
a Male?
b Female?
If you w ere entering th e resp o n se s to this question in to a S tata data set,
y o u could reco rd them in o n e o f tw o ways First, the a c tu a l an sw e r “Male” or
“ F em a le” co u ld be recorded for each case Second, you c o u ld use a number to
re p re se n t each answer For ex a m p le , you could choose to e n te r 0 fo r all respon
d e n ts re p o rtin g “Male” an d 1 fo r all respondents re p o rtin g “Fem ale.”
If you record the responses in the first way, it w o u ld be w h a t Stata refers
to as a string variable A strin g v ariable is a variable in w h ich th e contents are
a c tu a l w ords String variables can be very useful for m a n y purposes For
ex a m p le , you can enter v erb a tim answ ers to questions d irec tly in to Stata, as
w as d o n e for th e variable r e l i g o t h in the C h a p t e r 1 D a t a d t a
T h e draw back o f sto rin g a variab le such as gender as a s trin g variable is
th a t so m e statistical o p era tio n s req u ire num bers For ex a m p le , if you wanted
to calculate th e mean (i.e., m a th e m a tic a l average) o f a v aria b le, each category
m u s t be assigned a nu m eric value F or this reason, it is g enerally advisable,
w h e n possible, to use the seco n d m e th o d and enter variables as numeric vari-
ables T hese are variables th at have actual num bers a tta c h e d to each response
F ortunately, many o f th e S tata co m m an d s that w ill b e discussed in this
b o o k o p era te sim ilarly w ith n u m e ric o r string variables T h e com m ands that
w o rk only w ith num eric variables are th o se that p erfo rm statistical operations
th a t req u ire n um bers to calculate, for exam ple, the m ean o r a lin e a r regression
B ecause n u m e ric variables, typically, a re m ore applicable to th e vast majority
o f d a ta analyses, the c o m m an d s discussed in this book fo cu s on th e ir use with
n u m e ric variables (keeping in m in d th a t many o p erate id e n tic ally for string variables) T h e prim ary c o m m a n d s th a t are used (and a re d iffe ren t) for string
v ariables, including m ethods fo r c h a n g in g a string v aria b le to a num eric variable, arc addressed in the D ata M anag em en t: Using S trin g V ariables section in
C h a p te r 3
As has been discussed, often you m ay be using data th a t you d id not enter,
so y o u m ay n o t have a choice o r even b e certain ab o u t th e way in which variables were entered There are several ways to determ ine w h e th e r a variable is a
n u m e ric o r strin g variable T h e m o st straightforw ard w ay is to o p e n the Data
B row ser w indow In versions S tata 10 o r later, string variables are show n in a red
fo n t, w hereas num eric variables are show n in either black o r b lu e font In the
C h a p t e r 1 D a t a d t a file, you will see that only th e variab le religoth
is a strin g variable
Trang 29A nother o p tio n in Stata 12 to see w hich variables are string variables is to click on a particular variable in the Variable w indow In th e Properties w indow ,
y o u will see an e n try for T ype When the v aria b le ty p e starts with the le tte rs
“str,” the variable is sto red as a string variable
A Closer Look: Variable Types
You may have noticed th a t more information about the variable type is
listed in the Properties window For example, g e n d e r is shown to be a byte variable, ids is a long variable, and r e l i g o t h is a str31 variable.
These distinctions further demarcate variables w ithin the general categories of numeric and string They also are related to how much file space is allotted to storing the variable
All string variables have the "str" prefix, and the number indicates th e maximum characters that can be used for th a t string variable So the m axi
mum length a denomination could be in the variable r e l i g o t h is 31
characters As you w ill see, this constraint can be altered, but it is advisable
to use only the minimum number of characters th a t are needed Otherwise you are using memory to store empty spaces
Similarly, the various subtypes of numeric variables indicate the number
of digits that each variable can hold In order o f smallest to largest, th e numeric variable types are byte, int, long, float, and double
Generally, Stata will store variables in the most efficient and effective way when you create them Moreover, most users o f Stata will conductcountless analyses w ithout ever having to w orry or manipulate these specific distinctions
W hen you have th e Data Browser o pen, y o u p ro b a b ly notice, however, th a t
th e variables g e n d e r and e m p l o y s t look d iffe ren t fro m the variables id s
a n d agecats T his difference is due to th e fact th a t g e n d e r and e m p l o y s t
have what are called value labels attached to th e m Value labels will be covered
in m uch m ore detail later, b u t they are labels th a t can be applied to the n u m e ric codes used to represent responses Rem em ber th a t you could decide to u se th e
n u m b e r 1 to represent the answ er “Female.” T h is choice may be difficu lt to rem em ber (i.e., w h e th e r 1 was Male or w h eth e r 1 was Female), th erefo re you can use value labels as a sh o rtcu t to help re m e m b e r th is coding strategy T h e
variables ids and a g e c a t s were num erical resp o n se s so they do n o t have
a n y value labels th a t could be attached to th em You can see the actual n u m e r ical codes for each variable using the D ata B row ser w in d o w by clicking o n th e
Trang 3016 PART I FOUNDATIONS FOR WORKING WITH STATA
T o o ls m e n u , selecting V a lu e L a b e ls , and clicking H id e A ll V a lu e Labels
W h e n you d o so you will see th e cases th a t were “M ale” n o w display “0” and the cases th a t w ere “ Female” now display “ 1 ”O r you can h ig h lig h t (eith e r using the
d ire c tio n keys o r the m o u se ) a p a rtic u la r cell (e.g., “M a le” ) W h e n you do so,
th e actual value is listed in a p a n e ju st u n derneath th e icons
Exercises
1 Open the “Chapter 1 Exercise Data dta” data file.
2 Save a copy o f th e o p e n d a ta n a m e d “Chapter 1 Ex m y c o p y d t a ”
3 U sing th e Data Brow ser, d e te rm in e how m any cases a n d v ariab les are in the
d a ta set
4 W h ic h o f the v ariables is a strin g variable?
5 Use th e D ata E d ito r to c h an g e th e agef stdt v alu e o f th e last case from 14
Trang 31The Essentials
N ow that you are fam iliar with the basic c o m p o n e n ts of Stata a n d d ata
files, it is tim e to begin perform ing statistical analyses The th o u g h t o f
co nducting statistical o p era tio n s on top o f le a rn in g a new com puter p ro g ra m
ca n be a doubly d a u n tin g task— often b rin g in g w ith it a considerable a m o u n t
o f anxiety This c h a p te r is explicitly designed to h elp alleviate this c o m m o n an d
n atu ra l em otional reaction to learning Stata to c o n d u c t statistical o p e ra tio n s
T h is chapter has th re e p rim a ry goals First, it p rese n ts a conceptual a p p ro a c h
to learning Stata co m m an d s th at has been sh o w n to n o t only help le a rn th e necessary operations b u t also assuage the fears o f “m em orizing” seem in g ly endless com m ands Second, th e basic stru c tu re o r fo rm a t of Stata c o m m a n d s
is covered Regardless o f w hether the actual o p e ra tio n th a t a c o m m a n d p e r
fo rm s is straightforw ard or com plex, all Stata c o m m a n d s follow a very sim ila r stru ctu re Knowing this underlying fo rm at will help you process each new ly presented operation m ore easily Finally, this c h a p te r discusses the 5 essen tial
co m m an d s o f Stata These 5 com m ands form th e fo u n d a tio n o f statistical a n d
d a ta m anagem ent o p era tio n s for the vast m a jo rity o f research p ro jec ts Therefore, once you have com pleted this ch a p te r, yo u will have m a ste re d a significant p ortion o f using Stata to accom plish y o u r research D oing so will hopefully m inim ize anxiety a n d increase co n fid e n ce w hen a p p ro a ch in g th e
m o re nuanced topics covered in the su b seq u en t ch ap ters
Intuition and Stata Commands
Perhaps one o f the m o re intim idating aspects o f S tata is th a t it operates p r im a r ily, and most effectively, using a syntax, c o m m a n d -d riv e n interface As m o st readers have becom e accustom ed to a W indow s, p oint-and-click interface, this
17
Trang 3218 PART I FOUNDATIONS FOR WORKING WITH STATA
m o re “D O Sesque” system m a y be unfam iliar and u n u su a l F urtherm ore, m any
u se rs m ay be disheartened b y the th o u g h t o f try in g to m em o rize numerous,
o d d -s o u n d in g com m ands
These very valid co n c ern s are w hy this book uses a new approach for teach
in g Stata com m ands T his m e th o d is founded on the idea th a t instead of view
in g Stata as som e black box th a t only spits out the right results w hen told exactly
w h a t to do, it is m ore beneficial to see Stata as an ex trem ely s m a rt colleague w ho
y o u are asking to produce s o m e calculations very quickly T he latter perspective
w ill help you rem em ber th a t a lth o u g h Stata is a statistical, co m p u ter program ,
it is designed by people W h e n these people th ought a b o u t w hat to call particu
la r co m m an d s, they did th e ir best to give them nam es th a t m a d e sense.Taking th e latter ap p ro ach helps facilitate a m ore in tu itiv e approach to Stata
R a th e r than considering the n u m e ro u s com m and n am es th a t need to be “m em orized,” it is m ore effective to th in k as if “what w ould I call a com m and that
w o u ld tell a com puter to do a cross-tabulation?” O r alternatively, you can think
“ if m y colleague and I had b een w orking together for a long tim e, how might I tell him or her that I needed a cross-tabulation in a sh o rth a n d way.” Generally,
th in k in g in this m anner leads you to th e correct answ er - t a b u l a t i o n - o r
- t a b - for short T his in tu itiv e app ro ach should h e lp you learn and retain
S ta ta co m m an d s m ore easily a n d effectively It should also help m inimize w orries a b o u t th e prospect o f u sin g Stata
Finally, there are tim es w h en this type of th in k in g m ay n o t lead to exactly
th e right com m and For exam ple, if you thought “w h a t w ould I tell my colleag u e if I w anted him o r h e r to erase a variable from th e d a ta set,” you may
th in k “erase” o r “delete.” T h e actual co m m an d for th is o p e ra tio n is - d r o p -
B u t appro ach in g the new c o m m a n d in this way sh o u ld lead you m ore quickly
to th e correct com m and a n d will help make the actual c o m m a n d make m ore
se n se and be m ore easily rem e m b ere d
T hus, as we em bark o n le a rn in g all the w onderful th in g s S tata can do, keep
th is intuitive approach in m in d R em em ber you are sim p ly w orking with a really sm art colleague S o m etim es com m u n icatio n m a y b ec o m e strained, b u t
w ith a bit o f dialogue an d u n d e rsta n d in g , you will b e able to conduct very effective analyses
A Closer Look: Commands versus Point-and-Click
Often new Stata users are apprehensive about using Stata because of its command-driven interface, rather than a Windows, point-and-click-based system Sometimes this concern may tem pt users to disregard learning Stata
Trang 33commands and instead rely solely on its Menus and point-and-click operation Although this path may seem appealing, there are several reasons to fight the urge.
First, using the point-and-click method is not any easier, in terms o f th e amount of inform ation you need to know That is, even when using a Windows-based program, you still need to learn which menus to open, which button to use for a particular operation, and the correct options to choose This method may seem easier than learning the commands, b u t it
is not due to a difference in quantity o f inform ation to be attained The distinction in the tw o methods rests mainly in the fam iliarity with using menus and Windows to perform operations But at one time this method was probably intim idating as well Just as many people have come to feel very comfortable using Windows-based computer programs, with a little practice, the Stata syntax, command-based interface will seem just as straightforward
Second, and perhaps even more important, there are real advantages to knowing the command-based aspect o f Stata For the majority o f operations, the command-based interface is much quicker than the menus W hat can take several point-and-clicks to, process through the necessary layers o f options, usually can be typed in a few short words Furthermore, although similar operations can be performed using either method, the command- based format makes it much easier to save and replicate your data m anipulation and analyses Often, you need to make adjustments to previously conducted procedures and run them again As w ill be shown in the W hat Is
a Do File? section o f Chapter 3, using the commands along with "do files" makes this process much more straightforward Additionally, if you continue
to use Stata, many o f the more advanced abilities o f the program rely on the command format
The Structure of Stata Commands
T his section provides an overview o f th e c o m p o n e n ts o f Stata c o m m a n d s
M uch more detail a n d specific exam ples are co v ered th ro u g h o u t th e c h a p te r,
w hich will help clarify each aspect Every c o m m a n d th a t is p e rfo rm e d in
S tata has the sam e b asic stru c tu re , w hich can b e w ritte n in generic te rm s asfollows:
command va rna me ( s) [ i f v a r n a m e = = v a l u e ] [ , o p t i o n s ]
Trang 3420 PART I FOUNDATIONS FOR WORKING W ITH STATA
C O M M A N D
A ny statistical o r d a ta o p e r a tio n you w an t to p e r f o r m in Stata has a
n a m e F o r exam ple, if y o u w o u ld like to delete a n e n tir e variable from th e
d a ta , th e co m m an d w o u ld b e -drop- These c o m m a n d s a re generally th e
f irs t ite m th a t is ty p e d in th e C o m m a n d w in d o w ( o r “d o ” file, covered in
C h a p te r 3)
M ost com m ands have tw o form s: a full c o m m a n d n am e a n d an abbrevi
a te d c o m m a n d nam e T h e a b b re v iate d co m m an d n a m e co n tain s the m ini
m u m n u m b e r of ch a racters re q u ire d to uniquely specify th a t command If a
c o m m a n d has an a b b re v ia tio n , you can type as m a n y o f th e characters as you
w o u ld like, as long as it c o n ta in s th e m inim um a b b re v ia tio n For example, the
full c o m m a n d to perfo rm a lin e a r regression is - r e g r e s s - , b u t the abbrevi
a te d c o m m a n d nam e is -reg- Therefore, you c o u ld ty p e - r e g r e s s - , -r eg - , -regr-, - r e g r e - , o r -regres-, and th e sam e operation would
b e p erfo rm ed This b o o k alw ays in troduces a c o m m a n d u sin g the full com
m a n d n am e, but often an a b b re v ia te d com m and n a m e is used for the sake o f sim p lic ity after this first use
VAR IABLES
After the com m and, y o u m ust specify the variab le o r variables on which
y o u w ant to perform th a t o p e ra tio n For exam ple, if you w anted to delete a
v ariable n am ed gender, y o u w ould type dr o p g e n d e r in to the Com m and
w indow Particular co m m a n d s, as will be discussed in m o re detail, either
ac c o m m o d ate m ultiple variables o r even require m u ltip le variables to be specified If, for instance, y o u w an ted to create a c ro ss-ta b u la tio n , you would specify tw o variables after th e a p p ro p riate com m and
IF STATEMENTS
T here may be tim es w h en you w an t to perform a n o p e ra tio n only on cer
ta in types o f cases As an exam ple, you may w ant to p ro d u c e a cross-tabulation
ta b le th a t includes only the m ales in y o u r data set To d o so, y o u would type an
i f sta te m en t after you have en tered the com m and a n d variables Generally,
th ese i f statem ents take th e fo rm o f a particular v aria b le o r variables equaling
so m e value
The i f statem ents are com p letely optional; m e a n in g yo u do not have to
e n te r th e m when p e rfo rm in g a co m m an d , which is w hy th e y are shown in
b rackets above You need to ty p e an i f statem ent o n ly w hen y o u wish to per
fo rm the operation on a selected set o f cases in the d a ta
Trang 35O P TIO N S
Most Stata co m m an d s include options th a t can be invoked with th e m As
th e name suggests, o p tio n statem ents are o p tio n a l O ptio n s perfo rm so m e extension or m odification o f th e basic co m m a n d , such as requesting a d d itio n a l statistical m easures o r a different form atting o f th e o u tp u t When a Stata c o m
m an d does not p ro d u ce exactly what you w ould like by default, you o ften can
ob tain what you are looking for through the use o f o p tio n s W hen each c o m
m a n d is covered th ro u g h o u t th e book, the m ost helpful options will be d etailed
as well F urtherm ore, the Stata Help Files sectio n o f C hapter 8 shows h o w to learn all the possible o p tio n s for each co m m an d
EXECUTING A C O M M A N D U SING THE C O M M A N D W IN D O W
Once you have d eterm in e d which c o m m a n d you n eed to use, w hich v a riables you want to p erfo rm it on, and w h eth er you w ould like to use a n i f statem ent or o p tio n s, you are ready to execute th e com m and
First, be sure th a t you have the C o m m a n d w in d o w selected by click in g
th e mouse when th e cu rso r is anywhere in th e C o m m a n d window N ex t, you
w ill type the c o m m a n d , variable nam e(s), a n d any desired i f sta te m e n ts o r options Instead o f actually typing a variable n am e, yo u can also place y o u r
c u rso r over a p a rtic u la r variable in the V ariables w indow , and w hen y o u click
o n the little arrow th a t appears, its nam e will a p p e a r in the C o m m an d w in dow If you are using Stata 11 (o r earlier), you ca n sim p ly click on the v aria b le
n am e in the Variables w indow , and th at v aria b le’s n a m e will appear in th e
C o m m an d w indow O nce you have finished e n te rin g all the in fo rm a tio n , press Enter
Pressing E n te r tells Stata to perform the o p e ra tio n and causes o u tp u t to
be displayed in the Results window N ote th at so m e co m m an d s may w ra p o n to
m o re than one line in the C om m and w indow T his scenerio is co m p letely acceptable Stata trea ts everything typed b efore you press E nter as a single com m and Thus, for each com m and you wish to p e rfo rm , you need to ty p e all
th e required in fo rm atio n a n d press E n te r (i.e., you c a n n o t type several c o m
m a n d s in succession in the C om m and w in d o w ) A m e th o d for p e rfo rm in g
m ultiple com m ands a t once is covered in the n e x t chapter
The 5 Essential Commands
T h e following se ctio n provides a closer look a t th e fo u n d atio n c o m m a n d s o f Stata These 5 c o m m an d s accom plish a sig n ifican t p o r tio n of the analyses a n d
Trang 3622 PART I FOUNDATIONS FOR WORKING W ITH STATA
d a ta m anagem ent th a t is n e e d e d fo r many research p rojects This section
sh o u ld be seen, however, as a n in tro d u c tio n to th e se co m m an d s It explains
th e basics o f each c o m m a n d , w hich for many users m a y be all that is needed
M o re o f th e specifics a n d n u a n c e s fo r each c o m m a n d are covered in the chap
te r devoted to that p a rtic u la r sta tistica l operation T h e re fo re , the goal of this sectio n is threefold First, it p ro v id e s essential c o m m a n d s th a t perform some
o f the m o st frequently used o p e ra tio n s Second, it gives you a framework on
w h ich to place all o f th e m o re advanced topics to c o m e in later chapters
T h ird , it should give you c o n fid e n c e to tackle th o se m o re advanced topics
W h e n you grasp these core c o n c e p ts, you are in a g re a t p o sitio n to become an effective Stata user
All th e exam ples th a t follo w use th e C h a p t e r 2 D a t a d t a , available
a t w w w sag e p u b c o m /lo n g est, T his d a ta set c o n ta in s 7 variables for 25 cases fro m the N ational S tudy o f Y outh a n d Religion (N SY R ) d a ta (see the Preface
fo r m ore in fo rm atio n on h o w these data were c o lle c te d ) T h is subsample of
th e full d a ta was selected so th a t it w o u ld be possible fo r you to double-check
th e follow ing analyses by p r o d u c in g it by h an d if it is helpful As mentioned
in C h a p te r 1, it is a good id e a to save a copy o f th e d a ta file you are working
w ith so th a t you always have a b ac k u p o f the o rig in a l data
ta b u late
T he first two essen tial c o m m a n d s , - t a b u l a t e - a n d - s u m m a r y - ,
b o th p ro d u c e basic d e s c rip tiv e in fo rm a tio n a b o u t v a ria b le s, which is why
th e y generally are th e first a n a ly tic o p e ra tio n p e r fo rm e d fo r th e vast majority o f research studies A gain th is sectio n will p ro v id e m o re o f an overview’
o n how to use these c o m m a n d s , w hereas C h a p te r 4 p re s e n ts much greater
d etail on th e specifics o f u s in g th e se com m ands, as well as m ore detail on
ex ten sio n s to each
O ne o f the first analytic processes taught in sta tistics courses is how to
co n stru c t a frequency d istrib u tio n table Notice, if y o u w ere asking a really
sm a rt colleague to p ro d u ce a d istrib u tio n that ta b u la te s th e values of each of
th e cases, you m ight tell h im o r h e r to “tabulate” th e data T h e abbreviated
co m m an d nam e for - t a b u l a t e - is technically -ta~, b u t it will probably be easier to rem em ber - t a b - as a sh o rten e d version.
To see w hat the - t a b - c o m m a n d does, select th e C o m m a n d window
(i.e., click the m ouse w hile th e c u rs o r is over th e C o m m a n d window), and
ty p e t a b e m p l o y s t ( o r alternatively, type t a b a n d p la ce your cursor
o ver e m p l o y s t in th e V ariable w indow an d click th e sm all arrow that
a p p e ars) T he e m p l o y s t v a ria b le com es from a q u e s tio n asking about the
Trang 37resp o n d e n ts’ c u r r e n t em p lo y m en t status W h e n y o u have typed th e c o m
m a n d , the screen s h o u ld lo o k sim ilar to th e sc re e n s h o t presented b elo w :
Now simply press E n te r, and your screen sh o u ld lo o k like this:
1 (/v # o p tio n o r - s e t maxvar-) 5000 maximum v a r i a b l e s
us* "C:\Documents and sttt lng*\klonge 3t\Ky D o c u
> r 2 \ c h apter 2 Oata.dta", clear
ids gender agecats employst religoth datnum numfrien
(ids)Resp (gender_ (agecats_
(empbyst (reKg0b_W
(datnum_ (numfrien
Trang 3824 PART I FOUNDATIONS FOR WORKING WITH STATA
Before addressing th e u n iq u e aspects of the in fo rm a tio n provided by the
- t a b - co m m an d , ex am ine so m e o f th e general o u tp u t th a t is produced by all
S ta ta com m ands First, you see w h at the co m m an d “d id ” displayed in the
R esults w indow In this case, th is is a distrib u tio n table sh ow ing the frequency,
p ercen tag e, and cum ulative percen tag e for each ca te g o ry o f the variable
e m p l o y s t Next, ju st above th is o u tp u t, Stata p rese n ts the exact command
th a t was executed to p ro d u c e th e o u tp u t This same in fo rm a tio n is also stored
in th e Review window T hese th re e co m p o n en ts are p ro d u c e d for every com
m a n d you en ter in th e C o m m a n d w indow
T urn in g to the o u tp u t o f th e - t a b - com m and, as n o te d , it produces a
ta b le show ing the frequency, percentage, and cu m u lativ e distribution of the given variable For the e m p l o y s t variable, six cases a re E m p lo y e d , which is
2 4 % o f th e sam ple Also n o tic e th a t it displays th e to ta l n u m b e r of cases
th a t fall in to at least o n e o f th ese categories o f the v ariable In this case, there
a re 25 cases that were coded in to o n e o f the presented ca te g o rie s.1 The top left
c o rn e r o f th e table lists w h at is called the “variable label.” T hese labels usually
p ro v id e a brief description o f th e variable, such as “ E m ploym ent Status.”
W o rk in g w ith these labels is covered in m ore detail in C h a p te r 3
As w ith all com m ands, th e - t a b - com m and c o n ta in s several options that
c a n be invoked to p e rfo rm differen t o r additional o p e ra tio n s beyond this
d efau lt procedure O n e o f th e m o re useful options for th e - t a b - command is
- s o r t - T h e - sor t - o p tio n tells Stata, as it sounds, to rearrange (i.e., sort)
th e table so that it lists th e categories in d escending o rd e r o f frequency
R em em ber, options are always ty p ed after entering a co m m a, meaning you
w o u ld type tab e m p l o y s t , s o r t in the C o m m a n d w indow and press
E n te r W hen you do this, th e follow ing o u tput is p resented:
N o school or work but l o o k i n g 1 2 8.00 92.00
Active ar med forc e s 1 2 8.00 100.00
T ot al | 25 100.00
‘For these introductory examples, none o f the variables have any m issing data, meaning all of the cases have valid answers for all o f the variables Clearly this situation may not always be the case with real data Handling such missing data will be covered in more detail in the later chapters that more thoroughly discuss using the commands to complete statistical analyses Specifically, see the Data
Trang 39As you can see, the sam e basic in fo rm a tio n (frequency, percentage, an d cum ulative percentage) is displayed, b u t now th e categories are ordered so th a t
yo u can easily see w hich one contains the m o st respondents and w hich c o n
ta in s the least For e m p lo y m en t status, th e m o st c o m m o n category is E m ployed
a n d school, w hereas being O u t o f labor force, N o school or work b u t lo o k in g (fo r a job), and Active arm ed forces are all tied fo r th e least com m on responses.There are several oth er options you can use w ith - t a b - and m o s t o f
th e m are covered in th e Frequency D istrib u tio n s section o f Chapter 4 B u t for now , it is only necessary to understand th e basic fo rm o f how o p tio n s are invoked, as it is sim ilar for all oth er com m ands Also, it should be n oted th a t as
w ith com m ands, o p tio n s have full nam es a n d abbreviated nam es T h e full
n am e will always be presented when it is first in tro d u c e d , with the m o st c o m
m o n abbreviation b ein g used in all the follow ing instances
In ad d itio n to p ro d u c in g a d istrib u tio n o f o n e variable, th e - t a b -
co m m a n d can g enerate a cross-tabulation b etw een tw o variables For in stan c e,
you may be interested to know if there is a difference in the re sp o n d e n ts’
em ploym ent status by gender To make this c o m p a riso n , you would w ant to see
th e distribution o f em p lo y m en t status for m ales a n d the d istrib u tio n o f
em ploym ent status fo r females One m e th o d fo r displaying this in fo rm a tio n is
to invoke the - t a b - c o m m an d and list b o th variables instead o f just o n e Type
t a b e m p l o y s t g e n d e r in the C o m m an d w in d o w a n d press E nter D o in g
so produces the follow ing results:
I (g e nd er_w3)Respondent (employstat_w3) | g e n d e r
Employment Status I Male F em a l e I Total
kn o w the percentage o f females in each category c o m p a re d with the percen tag e
o f males in each category
Trang 4026 PART I FOUNDATIONS FOR WORKING W ITH STATA
To p ro d u c e th e n e c e s sa ry fig u re s, you can a g a in th in k intuitively You
n e e d to ask Stata to p r o d u c e a set o f percentages b a s e d e ith e r on the rows
o r o n th e colum ns B ecause you believe th a t g e n d e r is th e causal variable (i.e , th e in d e p e n d e n t v a ria b le ), y o u w ould w a n t th e percentages in the
co lu m n s T h a t is, you w a n t to b e able to c o m p a re th e p ro p o rtio n of all
fe m a le s w ho are e m p lo y e d w ith th e p r o p o rtio n o f a ll males who are
em p lo y ed T herefore, th e p e rc e n ta g e s need to be c a lc u la te d w ithin the col
u m n s To have your s m a rt c o lle a g u e m ake th is c a lc u la tio n , yo u might co n
s id e r tellin g him o r her, fo r s h o r th a n d , “co lu m n s.” F o llo w in g this logic, th e
o p tio n to present th ese p e rc e n ta g e s is - c o l u m n - [If y o u wanted th e
p erc e n ta g e s in the row, as y o u m ig h t have g u e sse d , th e o p tio n would be
-row-.] Type ta b e m p l o y s t g e n d e r , c o l in th e C om m and w in
Employment Status I Male F e m a l e | Total