Chosen Variables for Research We obtained the following result by using command ‘des’ Figure 1: The result of using command 'des' The data set was created on August 18, 1999, containing
Trang 1CONTENT 1
INTRODUCTION 5
PART 1: DATA DESCRIPTION 6
I GENERAL DATA DESCRIPTION 6
II DATA DESCRIPTION IN DETAILS 8
1 Time worked per week in 1975 8
2 Age in 1975 9
3 Educational level in 1975 9
4 Health status in 1975 10
5 Gender 11
6 Marital status in 1975 11
7 Time of sleeping per week in 1975 12
PART 2: REGRESSION ANALYSIS 13
I THE RELATIONSHIP BETWEEN VARIABLES – STATISTICAL CORRELATION 13
II ESTIMATE THE REGRESSION MODEL BY OLS METHOD 14 1 Population regression function 14
2 Sample regression function 14
3 Analysis of Parameters in the Sample Regression Model 14
III MISTAKE TESTS OF THE MODEL 15
1 Testing multicollinearity 15
2 Testing heteroskedasticity 16
3 Cure for heteroskedasticity 17
IV HYPOTHESES TESTS 17
1 Testing overall significance of the regression 17
2 Testing significance of the regression coefficients 18
3 Testing exclusion restricstions 20
PART 3: CONSTRUCTING FINAL REGRESSION MODEL 22
I. ESTIMATE THE REGRESSION MODEL BY OLS METHOD 22
1 Population regression function 22
1
Trang 22 Sample regression function 22
3 Analysis of parameters in the sample regression model 22
II MISTAKE TESTS OF THE MODEL 23
1 Testing multicollinearity 23
2 Testing heteroskedasticity 24
3 Cure for heteroskedasticity 24
III HYPOTHESES TESTS 25
1 Testing the overall significance of regression 25
2 Testing the significance of the regression coefficients 25
CONCLUSION 27
APPENDIX 28
1 Result of using command ‘tab totwrk75’ 28
2 Result of using command ‘tab slpnap75’ 34
Trang 3TABLE OF FIGURES
Figure 1: The result of using command 'des' 6
Figure 2: The result of using command 'des' for variables chosen 7
Figure 3: The result of using command 'sum' 8
Figure 4: The result of using command 'tab totwrk75' ( full version in appendix ) .8
Figure 5: The result of using command 'tab age75' 9
Figure 6: The result of using command 'tab educ75' 10
Figure 7: The result of using command 'tab gdhlth75' 10
Figure 8: The result of using command 'tab male75' 11
Figure 9: The result of using command 'tab marr75' 11
Figure 10: The result of using command 'tab slpnap75' ( full version in appendix ) .12
Figure 11: The result of using command ‘corr’ in STATA 13
Figure 12: The result of using command 'reg' in STATA (6 variables) 14
Figure 13: The result of using command 'vif' after using 'reg' in STATA 15
Figure 14: The result of using 'imtest, white' in STATA 16
Figure 15: The result of using command robust in STATA 17
Figure 16: The result of command 'test' (after using robust) 17
Figure 17: The result of using command 'reg' (2 variables) 20
Figure 18: The result of using command 'test' for 4 variables above - after robust 21
Figure 19: The result of using command 'reg' after omitting 4 variables 22
Figure 20: The result of using 'corr' with 3 variables 23
Figure 21: The result of using command 'vif' after 'reg totwkr75 male slpnap75' 23
Figure 22: The result of using command ‘imtest, white’ for new function 24
Figure 23: The result of using 'reg robust' 24
Figure 24: The result of using command’ test male slnap’ 25
3
Trang 4The success and final outcome of this assignment required a lot of supportfrom others, and we are extremely fortunate to have this all along the completion ofour work We would like to express our gratitude to Mrs Dinh Thi Thanh Binh, ourEconometrics lecturer, for excellent expertise and supportive guidance she provided
us throughout the process Without such help, we might not have been able tocomplete this assignment so far
We are really grateful as we managed to complete the assignment on time,which could not be done without the effort and co-operation from our groupmembers Last but not least, we would like to thank all of our friends for their nicesupport and willingness to spend some time helping us finishing the documents
Group 11
Trang 5Researches have shown that various factors have influences on the workingtime of labor For instance, older workers tend to work less time than younger ones.The same thing happens to female workers who are married and have a family totake care of And for each person, the influences of these factors are different
Therefore, after taking everything into consideration, we decided to choose
and study the project: “The factors affecting weekly working time in 1975” Thus
through our project, we analyze the factors that have major impact on the workingtime of labor in 1975, using the econometric methods Econometrics is a socialscience in which tools of economic, mathematical, and statistical theories are used
to estimate economic relationships, testing economic theories, and evaluating andimplementing government and business policy It is based upon the development ofstatistical methods to forecast economic issues
In this paper, we consider six factors that may affect staffs’ weekly working
time: age, educational level, health status (good or poor), gender (male or female), marital status (married or single), time of sleeping.
Throughout the project, we used STATA as the tool for econometricsanalysis to analyze the data set “11.DTA”
We hope that arguments and statistics in this project will be helpful foranyone who is interested in the topic stated
5
Trang 6PART 1: DATA DESCRIPTION
I GENERAL DATA DESCRIPTION
1 Chosen Variables for Research
We obtained the following result by using command ‘des’
Figure 1: The result of using command 'des'
The data set was created on August 18, 1999, containing 20 variables, 239observations
After considering the meaning of variables in file 11.dta, our group decided to
choose following variables as variables in regression model:
Dependent variable: totwrk75
Independent variables: age75, educ75, gdhlth75, male, marr75, slpnap75.
2 General Description of Chosen Data
We obtained the following result by using command ‘des’ for variables analyzed:
Trang 7Figure 2: The result of using command 'des' for variables chosen
From the above result, we can see that age75, educ75 and, slpnap75, totwrk75are quantitative variables and gdhlth75, male, marr75 are qualitative variables.Here is the variables explanation in detail:
Variables Display Format Meaning Unit
Using command ‘sum totwrk75 age75 educ75 gdhlth75 male marr75 slpnap75’,
we can know the number of observations and the mean, standard deviation, min,
max of each variables (age75, educ75, gdhlth75, male, marr75, slpnap75,
totwrk75)
7
Trang 8sum totwrk75 age75 educ75 gdhlth75 male marr75 slpnap75
Figure 3: The result of using command 'sum'
II DATA DESCRIPTION IN DETAILS
To describe variables in details, we used command ‘tab’ for each variable:
1 Time worked per week in 1975
Figure 4: The result of using command 'tab totwrk75' (full version in appendix)
Trang 9Minutes of working time per week starts from 0 to 4805 The most frequent is 0minute, with 10 observations, accounted for 4.18% Followed by is 2325 minutes,with 4 observations, accounted for 1.67%
2 Age in 1975
Figure 5: The result of using command 'tab age75'
Age of workers in 1975 varies from 23 years old to 65 years old The most
frequent age is 33 years old, with 14 observations, accounted for 5.8% The leastfrequent age are 49, 63, and 64 years old, with only 1 observation for each,
accounted for 0.42%
3 Educational level in 1975
9
Trang 10Years of education starts from 1 to 17 Twelve years of education has the highestnumber of observations (with 98 observation, accounted for 41%), while 1 year ofeducation has the lowest (with 1 observation, accounted for 0.42%)
Figure 6: The result of using command 'tab educ75'
4 Health status in 1975
Figure 7: The result of using command 'tab gdhlth75'
- Variable gdhlth = 1 if good health in 1975 has 211 observations, accounted for 88.28%
- Variable gdhlth = 0 if poor health in 1975 has 28 observations, accounted for11.72%
Trang 115 Gender
- Variable male = 1 if male has 144 observations, accounted for 60.25%
- Variable male = 0 if female has 95 observations, accounted for 39.75%
Figure 8: The result of using command 'tab male75'
6 Marital status in 1975
Figure 9: The result of using command 'tab marr75'
- Variable marr75 = 1 if maried in 1975 has 179 observations, accounted for 74.9%
- Variable marr75 = 0 if single in 1975 has 60 observations, accounted for 25.1%
11
Trang 127 Time of sleeping per week in 1975
Minutes of sleeping per week, including naps, starts from 2053 to 6110 The mostfrequent are 3195, 3353, and 3518 minutes, with 3 observations for each, accountedfor 1.26%
Figure 10: The result of using command 'tab slpnap75' (full version in appendix)
Trang 13PART 2: REGRESSION ANALYSIS
I THE RELATIONSHIP BETWEEN VARIABLES – STATISTICAL CORRELATION
Figure 11: The result of using command ‘corr’ in STATA
The correlation between dependent variable totwrk75 and others independent variables (age75, educ75, gdhlth75, male, marr75, slpnap75) are different Its
interval is from |r(totwrk75, slpnap75)| = 0.3538 to |r(totwrk75, slpnap75)| =
0.0813
r(totwrk75, age75) = -0.1327 That means totwrk75 and age75 have negative
correlation Sign is expected to be negative
r(totwrk75, educ75) = 0.0813 That means totwk75 and educ75 have positive
correlation Sign is expected to be positive
r(totwrk75, gdhlth75) = 0.1555 That means totwk75 and gdhlth75 have
positive correlation Sign is expected to be positive
r(totwrk75, male) = 0.3822 That means totwk75 and male have positive
correlation Sign is expected to be positive
r(totwrk75, marr75) = 0.1042 That means totwk75 and marr75 have positive
correlation However, sign is expected to be negative
r(totwrk75, slpnap75) = -0.3538 That means totwk75 and slpnap75 have
negative correlation Sign is expected to be negative
13
Trang 14II ESTIMATE THE REGRESSION MODEL BY OLS
METHOD 1 Population regression function
slpnap75 + 0 + u
The variable u, called error term or disturbance in the relationship, represents
factors other than age75, educ75, gdhlth75, male, marr75, slpnap75 that affect
totwrk75.
2 Sample regression function
By using STATA, we have the following result:
Figure 12: The result of using command 'reg' in STATA (6 variables)
From the above result, we obtain the estimated regression function:
(SRF): ̂ = – 8,061648 age75 –19.7368 educ75 + 231.5114 gdhlth75 +
670.8464 male – 25.161marr75 – 0.5949014 slpnap75 + 4172.318
3 Analysis of Parameters in the Sample Regression Model
F (6, 232) = 14.32 and Prob > F = 0.0000 are the evidence that at least one of the
independent variables (age75, educ75, gdhlth75, male, marr75, slpnap75) help
to explain the dependent variable (totwrk75).
14
Trang 15Coefficient of determination (R-squared = 0.2702) is interpreted as the fraction of
the sample variation in y that is explained by x In this model, age75, educ75, gdhlth75, male, marr75, slpnap75 can explain 27.02% of the variation in
totwrk75.
Adjusted R-squared ( ̅̅̅ 2 = 0.2513) increases when a group of variables is added
R
to a regression if, and only if, the F statistic for joint significance of the new
Residual sum of squares (RSS = 147852932) measures the sample variation in the ̂u i
III MISTAKE TESTS OF THE MODEL
1 Testing multicollinearity
1.1 Correlation matrix
The correlation matrix (image 11) shows that there is no |rij | ( i = 1,6 , j = 1,6 ) greater than 0,8; therefore, multicollineary does not exist.
Figure 13: The result of using command 'vif' after using 'reg' in STATA
As VIF(i) < 10 ( i= 1,6), we can conclude that multicollineary does not exist.
15
Trang 162 Testing heteroskedasticity
Figure 14: The result of using 'imtest, white' in STATA
= 0,05; which means heteroskedasticity exists in this model
Trang 173 Cure for heteroskedasticity
To deal with heteroskedasticity, we run robust:
Figure 15: The result of using command robust in STATA
IV HYPOTHESES TESTS
1 Testing overall significance of the regression:̂= ̂=̂=̂=̂=̂=
Hypothesis: {
Figure 16: The result of command 'test' (after using robust)
17
Trang 182 Testing significance of the regression coefficients
time of working per week The numbers we used on the second column (P > |t|) is
based on image 5 (The result of using robust in STATA).
Reject
H0, accept H1, intercepthas statistically significant effect on
= – 8.061648 0,137 > α = 0,05 have statistically significant effect on
totwrk75.
= – 19.7368 0,314> α = 0,05 have statistically significant effect on
totwrk75.
= 231.5114 0,233 > α = 0,05 have statistically significant effect on
totwrk75.
= 670.8464 0,000 < α = 0,05 significant effect on totwrk75.
18
Trang 194 = 670.8464 means that male’sworking time is 670.8464 minutes onaverage higher than female, ceterisparibus.
= – 25.161 0,838 > α = 0,05 have statistically significant effect on
totwrk75.
significant effect on totwrk75.
̂
= –0.594901 0,000 < α = 0,05 6 = – 0.5949014 means that
corresponds to a decrease in workingtime per week of 0.5949014 minutes,ceteris paribus
In conclusion, only male and slnap75 has statistically significant effect on
totwrk75 at 5% level.
19
Trang 203 Testing exclusion restricstions
From the above analysis, age75, educ75, gdhlth75, marr75 can be omitted In
this step, we are testing multiple linear restriction with those variables (q=4) It
means we are constructing a regression function with two variables: slpnap75 and
Trang 21Figure 18: The result of using command 'test' for 4 variables above - after robust
Since F = 1.02 < F 0,05(4,232) = 2,41, we cannot reject H 0 Therefore, age75, educ75,
gdhlth75, marr75 have no effect on totwrk75 after male and slpnap75 have been
controlled for and therefore should be excluded from the model
21
Trang 22PART 3: CONSTRUCTING FINAL
REGRESSION MODEL
I ESTIMATE THE REGRESSION MODEL BY OLS
METHOD 1 Population regression function
PRF: totwrk75 = 0 + 1 male + 2 slpnap75+ u
The variable u, called error term or disturbance in the relationship, represents
factors other than male, slpnap75 that affect totwrk75.
2 Sample regression function
By using STATA, we have the following result:
Figure 19: The result of using command 'reg' after omitting 4 variables
From the above result, we obtain the estimated regression function:
3 Analysis of parameters in the sample regression model
F (6, 232) = 40.31 and Prob > F = 0.0000 are the evidence that at least one of the
independent variables (male, slpnap75) help to explain the dependent variable (totwrk75)
Coefficient of determination (R-squared = 0.2546) is interpreted as the fraction of
the sample variation in y that is explained by x In this model, male, slpnap75 can explain 25.46% of the variation in totwrk75 New regression model’s R-
squared is smaller than the previous model’s
22
Trang 23Adjusted R-squared (̅̅̅̅2 = 0.2483) increases when a group of variables is added
to a regression if, and only if, the F statistic for joint significance of the new
variables is greater than unity We use ̅̅̅̅2 to decide whether a certain
independent variable (or set of variables) should or should not belongs in a model.Total sum of squares (TSS = 202597441) is a measure of the total sample
Figure 20: The result of using 'corr' with 3 variables
1.2 Variance Inflation factors (VIF) method
Figure 21: The result of using command 'vif' after 'reg totwkr75 male slpnap75'
As VIF(i) < 10 ( i= 1,3), we can conclude that multicollineary does not exist.
23