The results might be helpful for effective measures suggestion to improve traffic safety at signalized intersections. A case study is conducted in Ho Chi Minh City (HCMC), Vietnam. Historical traffic accident data in the city are collected during five years (2011-2015). Binary logit models have been used to identify contributing factors to serious traffic accident. The results show that the involvement of intersection type, land use and road type are contributing factors to the accident severity. Based on the findings, strategies and measures for safety improvement are formulated and discussed.
Trang 1DETERMINING THE CONTRIBUTING FACTORS TO TRAFFIC ACCIDENT IN HO CHI MINH CITY USING BINARY LOGIT
MODEL
ỨNG DỤNG MÔ HÌNH HỒI QUY LOGIT NHỊ THỨC ĐỂ XÁC ĐỊNH CÁC YẾU TỐ ẢNH HƯỞNG ĐẾN TAI NẠN GIAO THÔNG Ở THÀNH PHỐ HỒ CHÍ MINH
Tran Quang Vuong
University of Transport and Communications Campus in Ho Chi Minh City
Abstract: Traffic accident patterns, the severity level and the factors determination to the
accidents have been investigated in this research The results might be helpful for effective measures suggestion to improve traffic safety at signalized intersections A case study is conducted in Ho Chi Minh City (HCMC), Vietnam Historical traffic accident data in the city are collected during five years (2011-2015) Binary logit models have been used to identify contributing factors to serious traffic accident The results show that the involvement of intersection type, land use and road type are contributing factors to the accident severity Based on the findings, strategies and measures for safety improvement are formulated and discussed
Keywords: Road traffic accident, signalized intersection, logit model, traffic safety measures,
factor analysis
Tóm tắt: Nghiên cứu này tập trung phân tích đặc điểm tai nạn, mức độ nghiêm trọng và các yếu
tố ảnh hưởng đến tai nạn Kết quả nghiên cứu sẽ là căn cứ rất hữu ích để đề xuất các giải pháp hiệu quả nhằm nâng cao an toàn giao thông tại các nút giao thông có đèn tín hiệu Nghiên cứu này được thực hiện cho trường hợp ở Thành phố Hồ Chí Minh, Việt Nam, dựa trên dữ liệu thống kê về tai nạn giao thông trong 5 năm (2011-2015) Mô hình hồi quy logit nhị thức được sử dụng để xác định các yếu tố ảnh hưởng đến tai nạn giao thông nghiêm trọng Kết quả phân tích cho thấy loại nút giao thông, vị trí nút giao và loại đường là những yếu tố ảnh hưởng đến mức độ nghiêm trọng của tai nạn
Dựa vào kết quả nghiên cứu này để xuất các chính sách, giải pháp nhằm nâng cao an toàn giao thông Từ khóa: Tai nạn giao thông đường bộ, nút giao thông có đèn tín hiệu, mô hình logit, giải pháp
an toàn giao thông, phân tích yếu tố
1 Introduction
Nearly 25 percent of all fatal crashes
occur at intersections and about 30 percent of
those are at intersections controlled by
signals In 2015, the number of traffic
accident, fatalities, injuries which occurrence
in HCMC, have been slightly decreased
accounted for 3,694 (accidents); 693
(fatalities) and 3,301 (injuries) Although,
this showed that comparison with 2014, the
number of traffic accident, fatalities and
injuries in 2015 have slightly reduced,
accounted for 14.51%, 4.15% and 18.07%,
respectively, these increase at signalized
intersections in HCMC accounted for 41% of
total accident occurrence at intersections
(9.7% of total traffic accident in HCMC)
Until now, there is lack of empirical research
about traffic safety for signalized
intersections under mixed traffic conditions
since most of previous research on this topic
focusing on vehicle dominance To address
the road traffic accident problems, it is necessary to deeply understand contributing factors to traffic accident The objectives of this research are to investigate traffic accident patterns, the severity levels and contributing factors to traffic accidents This study aims, however, at exploring not all contributing factors, since substantial limitations in data obtained from accident reports Logistic regression was used in this study to estimate the effect of the significant contributing factors to accident severity
This paper are divided into five parts, introduction is the first, the second is literature review, descriptive analysis and modelling are the third and fourth, respectively and the last is discussions
2 Literature review
developed to determine contributing factors
to accident severity for both developed and
Trang 2developing cities, such as Poison and
Negative Binomial model (Hoong et al.,
2001, Lin et al, 2003, Yinhai et al., 2004,
Huang et al., 2008); ordered probit model
(Abdel-Aty et al., 2003, 2005, Yu Jin et al.,
2010); logistic regression models (Hilakivi et
al., 1989, James and Kim, 1996, Mercier et
al., 1997, Al-Ghamdi, 2002, Kelvin, 2004);
Multiple logistic regression (Shankar and
Mannering, 1996, Carson and Mannering,
2001, Yan et al., 2005) and binary logistic
model was developed by many researchers
Logistic modeling technique is often
preferred by researchers, due to the logistic
function must lie in the range between 0 and
1, and this is not usually the case with other
possible functions (Kleinbaum and Klein,
2002)
In summary, there have been numerous
studies to determine contributing factors
effect on accident severity by developing
logistic regression models Nevertheless, only
limited studies explored crash injury severity
at signalized intersections (Abdel - Aty,
2003; Abdel - Aty and Keller, 2005; Yan et
al., 2005; Huang et al., 2008; Yu Jin et al.,
2010) Time of day, intersections type, nature
of lane, street lighting, presence of the red
light camera, pedestrian involved, vehicle
type, driver age and accident type are
variables which major contributing factors to
accident severity that learning from literature
review Moreover, there is no study
investigating contributing factors to accident
severity by using logistic models at
signalized intersections in Vietnam in general
and in HCMC in particular Based on
literature review combination with historical
traffic accident data which is available in
Vietnam condition, binary logistic model can
be applied for this case with highly
appropriation
3 Descriptive analysis of traffic
accident at signalized intersections in
HCMC
3.1 Overview of HCMC
Acording to master plan, HCMC is
divided into three zones City centre (zone 1)
includes 13 urban districts - 1, 3, 4, 5, 6, 8,
10, 11, Go Vap, Tan Binh, Tan Phu, Binh Thanh, and Phu Nhuan Newly developed areas (zone 2) include 6 newly developed districts - 2, 7, 9, 12, Binh Tan, and Thu Duc Rural areas (Zone 3) include 5 rural districts - Hoc Mon, Nha Be, Can Gio, Cu Chi and Binh Chanh, Fig.1
Figure 1 Classification zone in HCMC
3.2 Data collection
This research has been carried out based
on the historical accident database during five years (2011 - 2015), obtained from the Rail-Road Traffic Police Bureau in HCMC The traffic accident information was recorded in accordance with form No 02/TNDB with nearly 60 categorizes information In fact, nevertheless, accident information just only could be recorded 17 categorizes information which were conducted for analyzing to determine significant contributing factors to accident severity
3.3 Analysis of the patterns
There were 375 traffic accidents which happened at signalized intersections in HCMC during five years (2011 - 2015) The number of traffic accident was distributed different between three zones, with 212 (56.5%) accidents occurrence in zone 1, 126 (33.6%) in zone 2, and 37 (9.9%) in zone 3 However, the rate between the number of traffic accident and the total signalized intersections in zone 2 is highest (0.79), following by zone 1(0.44), and the less in zone 3(0.36)
3.3.1 Distribution by time
The traffic accident trends slightly increasing on holidays, tet holidays, at the weekend and at the end of months in year The time of traffic accident occurrence is
Zone 1 Zone 2 Zone 3
Trang 3difference in three zones, in zone 1 most of
the traffic accident happened in night
off-peak hour from 8PM to 4AM, while in zone
2, zone 3 it trends slightly increasing
morning, noon, and night peak hour (6AM -
8AM; 12AM - 2PM; 6PM - 8PM)
3.3.2 Distribution by road user
involvement accident
Most age group of road user involvement
traffic accident is 19 - 24 year - old (24%),
following by 25 - 30 year - old group (19%)
This age group accounted for 32%, 46%, and
22% in zone 1, zone 2 and zone 3,
respectively This age group is not really
maturity, and irritated easily by alcohol Male
road users are main group leading traffic
accident for three zones, which accounted for
77%, 88% and 78% in zone 1, zone 2 and
zone 3, respectively The traffic accident
motorcycle or motorcycle and truck are
configuration type, which are the most
popular in zone 1 and in zone 2, 3 accounted
for 38%, 47%, respectively
Red - light running, not accept priority,
wrong lane, illegal turning, and illegal
overtaking are significant causes leading to
traffic accident at signalized intersections In
particular, red - light running, not accepted
priority are the most significant accident
cause in zone 1, and zone 2 accounted for
26%, 29%, respectively Red - light running
and wrong lane are main causes in zone 3,
accounted for 35%
4 Modelling of accident at signalized
intersections
4.1 Theoretical background of logistic
regression
In this research, accident severity is
dichotomous type It should pay attention that
the definition non - fatal accident mean any
accident happened without any fatal during
24 hours account from traffic accident
occurence and otherwise Each accident in
time - series on road accident data was
categorized as either non - fatal or fatal The
logistic model used is
(6) And thus
P(fatal accident) = 1-P(non-fatal accident)
= 1- p(x) = 1/(1+eg(x)) (7) Where g(x) stands for the function of the independent variables:
g(x) = 0 + 1x1 + 2x2 + +nx (8) Logistic regression determines the coefficients that makes the observed outcome (non - fatal or fatal accident) most likely using the maximum - likelihood technique Principle estimation of this model is based on probability value (P) equal 0.3, this means, in case probability value is more than and equal 0.3, that is fatal accident occurrence, and otherwise
4.2 List of variables
Since the research goal was to determine the factors that might affect the severity of the accident (i.e whether it was a fatal or none-fatal accident), 37 variables are summarized from the time - series data, accident patterns and they are coded under 0 and 1 to serve for developing model Because
of discrete variables, correlation analysis (Kendall’s tau-b test) was also used to reduce the number of variables basing on the level of correlation and P - value
Table 1 Matrix coefficient correlation
ro H
Sig.
r -.318 ** 1.000 Sig .000
r 1.000 **
-.318 ** 1.000
r -.318 ** 1.000 ** -.318 ** 1.000
r 637 **
.264 **
.637 **
.264 ** 1.000
r 450 **
.207 **
.450 **
.207 **
.442 ** 1.000
r 540 ** 388 ** 540 ** 388 ** 662 ** 421 ** 1.000
r -.039 123 * -.039 123 * 063 -.031 -.042 1.000
r 591 ** 357 ** 591 ** 357 ** 643 ** 460 ** 670 ** 061 1.000
r 117 * 182 ** 117 * 182 ** 233 ** 262 ** 188 ** -.013 179 ** 1.000
r 345 **
.294 **
.345 **
.294 **
.484 **
.263 **
.679 ** -.029 391 **
.181 ** 1.000 Sig.
r 155 ** 213 ** 155 ** 213 ** 261 ** 262 ** 282 ** 166 ** 252 ** 203 ** 159 ** 1.000
The number of samples (N)=375
r Correlation Coefficient
Variable s
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
Time of Day of accide nt Month of accide nt location Urban road Province road Commune road
He lme t Don't acce pt priority Zone 2 Width pave me nt
<3m vs
<3m
Se ve rity of accide nt
4.3 Development of logistic model
The entry method of logistic regression was followed using SPSS version 21 The Omnibus tests of traffic accident severity
Trang 4model coefficients is analyzed to assess
whether data fit the model or not as
illustration in Table 2
Table 2 Omnibus Tests of Model Coefficients
Step 1
The specified model is significant (Sig <
0.05), hence it is recommended that the
independent variables improve on the
predictive power of the null model
Table 3 contains the two pseudo R2
measures that are Cox - Snell and
Nagelkerke Cox and Snell’s R-square
attempts to imitate multiple R - square based
on ‘likelihood’, but its maximum can be (and
usually is) less than 1.0, making it difficult to
interpret Here it is indicating that 11.8% of
the variation is explained by the logistic
model
The Nagelkerke modification that does
range from 0 to 1 is a more reliable measure
of the relationship Nagelkerke’s R2 will
normally be higher than the Cox and Snell
measure In this case it is 0.263 indicating the
relationship of 26.3% between the predictors
and the prediction In addition, in Table 3
Hosmer - Lemeshow (H - L) test illustrate the
significance of the developed logistic
regression models (sig >0.05)
Table 3 Goodness of fit (Pseudo R2 and H-L Test)
-2 Log
likelihood
Cox & Snell R Square
Nagelkerke R Square
Ps e udo R2 Te s t
Ste p
a Estimation terminated at iteration number 7 because
parameter estimates changed by less than 001.
Hos me r and Le me s how Te s t
Our H - L statistic has a significance of 0.22
which means that it is not statistically
significant and therefore our model is quite
good fit Rather than using a goodness – of -
fit statistic, we often want to look at the
proportion of cases we have managed to
classify correctly In a perfect model, the
overall percent correct will be 100% for all
cases In our study overall 88.3% were
correctly classified Nevertheless, it trends
skew prediction for non - fatal accident
(percentage correct 95%) while only 18.2% is percentage correct for fatal accident prediction From Wald - value test at Table 4,
it appears that the variables loc, Uroad, Proad, Croad and Zone 2, show some significant effect (loc, Uroad, Proad, Croad are about significant)
Table 4 The result of Wald test
B S.E Wald df Sig Exp(B) loc .770 426 3.258 1 049 2.159
Uroad 1.008 522 3.727 1 049 2.740
Proad .929 415 5.020 1 025 2.533
Croad 1.188 563 4.451 1 035 3.280
Zone 2 .792 541 2.143 1 043 2.207
Consta nt
-4.422 558 62.782 1 000 012
Ste p
1 a
a Variable(s) entered on step 1: loc, Uroad, Proad, Croad, Zone2.
According to the previous analysis, the logit model with the significant variables is
as follows:
g(x) = - 4.422 + 0.77loc + 1.008Uroad + 0.929Proad + 1.188Croad + 0.792zone (9) Hence the logistic regression model developed in this study is
(x) = eg(x)/ (1+eg(x)), where g(x) in Eqs.(9)
4.4 Model interpretation
Interpretation of any models means the ability to explain practical inferences from the estimated coefficients The estimated coefficients for the independent variables represent the trend or rate of change of the dependent variables per unit of change in the independent variable The interpretation of the model developed in this study are presented in detailed, as follows
4.4.1 Impact of location on accident severity
It should pay attention that due to ‘loc’ has two levels:
loc = 1 (fatal accident occurrence at junction and the others)
loc = 0 (fatal accident occurrence at intersections)
According to this coding, our model shows loc in the logit model with the coefficient of 0.77 To interpret this parameter, the logit difference should be computed as follows:
Logit (fatal accident/ junction & other)
=
Logit (fatal accident/ Intersection)
Trang 5=
Logit difference
=
Hence the odds ratio is e1 =e0.77 = 2.16
This value shows that the odds of being
in a fatal accident at a junction and the others
location are 2.16 higher than those at an
intersection By using the same method, we
can explain the zone 2 factor to impact on
accident severity easily, the odds of being in
a fatal accident happening in zone 2 are 2.2
(e0.792) higher than those occurrence related to
zone 1 and zone 3
4.4.2 Impact of Uroad on accident
severity
2(1.008) measures the differential effect
on the logit of two cases, whether fatal
accident occurrence on urban road or not
To interpret this parameter, the logit
difference is computed first:
Logit (Fatal/Uroad)
For any other type of road:
Logit (Fatal/not Uroad)
=
Logit difference
=
Hence the odds ratio is e(-1.109) = 0.33
Thus, the odds that accident will be fatal,
in case it occurrences on urban road is 0.33
times its being fatal related to the other type
of road
The similar method was used to compute
the odds for Proad and Croad, which account
for 0.28 and 0.47, respectively
5 Conclusions
Logit model was developed in this study
in order to determine significant contributing
factors to accident severity in HCMC basing
on response variable which is binary nature
(i.e has two categories – fatal or non-fatal)
with three variables namely, type of road,
location and land use This model is
reasonable statistic fit with 88.3% overall
percentage, although it trend skew prediction for non - fatal accident case (18.2%)
The findings might help the authorities in HCMC should focus on improvement safety
at junctions in zone 2 where involve commune road for their strategies It also help the authorities that should be pay attention to make own safety policies for each zone instead of for whole HCMC as they have made before This may make safety policies more cost - effectively
The odds presented in this paper can be used to help establish priorities solutions to reduce serious accident Such as the odds of being involved in a fatal accident at junctions and other on commune road in zone 2, where there is few policeman to control the traffic, lack of traffic signs and drivers with low safety awareness, are relatively higher than those for other cases
It is important should pay attention that, some significant variables such as road surface, traffic signal pattern, light condition, collision type, license status and so on which are not available or difficult to obtain in HCMC condition So they are not including
in this research Nevertheless, the findings of this study can be considered as guidance methods for future study when these variables are available
References
[1] Yau, K.K.W (2004), Risk factors affecting the severity of
single vehicle traffic accidents in Hong Kong
Accident Analysis & Prevention
[2] Abdel-Aty et al., (2005), Exploring the overall and
specific crash severity levels at signalized intersections Accident Analysis & Prevention
[3] Yan, X et al., (2005), Characteristics of rear-end
accidents at signalized intersections using multiple logistic regression model Accident Analysis &
Prevention
[4] Huang, H et al., (2008), Severity of driver injury and
vehicle damage in traffic crashes at intersections: A Bayesian hierarchical analysis Accident Analysis &
Prevention
[5] Jin, Y., X Wang, and X Chen (2010) Right-angle
crash injury severity analysis using ordered probability models Intelligent Computation Technology and
Automation (ICICTA), IEEE
Ngày nhận bài: 26/9/2016 Ngày chuyển phản biện: 30/9/2016 Ngày hoàn thành sửa bài: 21/10/2016 Ngày chấp nhận đăng: 28/10/2016