6.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 456.7 Top 5 provinces/citi
Trang 1VIETNAM - KOREA UNIVERSITY OF INFORMATION &
Trang 2VIETNAM - KOREA UNIVERSITY OF INFORMATION &
Trang 3First of all, the team would like to express their sincere thanks to PhD NguyenThu Huong (Lecturer of Data Warehouse) for helping the group acquire the basicknowledge needed as the foundation to carry out this thesis She directly guidedthe group enthusiastically, corrected mistakes, and contributed many valuablecomments to help the group complete their subject report well During onesemester of project implementation, the group applied the accumulatedbackground knowledge and combined it with learning and researching newknowledge Since then, the team has applied what it has collected to complete thebest project report However, in the implementation process, the team cannotavoid shortcomings Therefore, the group is looking forward to receivingsuggestions from teachers in order to improve the knowledge that it has acquiredand prepare the group to tackle other topics in the future
Sincerely, thank you!
Trang 4………
………
………
………
………
………
Trang 56.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 45
6.7 Top 5 provinces/cities with the most accidents 476.8 Top 5 provinces/cities with the highest number of deaths in the adult age
6.9 Top 5 provinces/cities with the largest number of casualties 506.10 Top 5 provinces/cities with the most property damage 526.11 Top 5 provinces/cities with the most damage to vehicles 53
Trang 7LIST OF IMAGES
Figure 1-1 Visual Studio 11
Figure 1-2 SQL Server Management Studio 12
Figure 2-1 Time Dimension 13
Figure 2-2 Location Dimension 14
Figure 2-3 Participant Dimension 14
Figure 2-4 Cause Dimension 14
Figure 2-5 Vehicle Dimension 14
Figure 2-6 Conceptual modeling diagram 15
Figure 2-7 Star Schema 18
Figure 3-1 Cause Dim 20
Figure 3-2 Location Dim 20
Figure 3-3 Participant Dim 20
Figure 3-4 Time Dim 20
Figure 3-5 Vehicle Dim 20
Figure 3-6 Accidents Fact 21
Figure 3-7 Casualties Fact 21
Figure 3-8 Damages Fact 21
Figure 4-1 Conceptual ETL design 22
Figure 4-2 Time Dim Data flow 22
Figure 4-3 Time Dim Dataset 23
Figure 4-4 Time Dim ETL result 23
Figure 4-5 Location Dim Data flow 23
Figure 4-6 Location Dim Dataset 24
Figure 4-7 Location Dim ETL result 24
Figure 4-8 Cause Dim Dataflow 24
Figure 4-9 Cause Dim Dataset 25
Figure 4-10 Cause Dim ETL result 25
Figure 4-11 Participant Dim Data flow 25
Figure 4-12 Participant Dim Dataset 26
Figure 4-13 Participant Dim Dataset 26
Figure 4-14 Vehicle Dim Data flow 27
Figure 4-15 Vehicle Dim Dataset 27
Figure 4-16 Vehicle Dim ETL Result 27
Figure 4-17 Accident Fact Data flow 28
Figure 4-18 Accident Fact Dataset 28
Figure 4-19 Accident Fact ETL result 28
Figure 4-20 Casualties Fact Data flow 29
Figure 4-21 Casualties Fact Dataset 29
Figure 4-22 Casualties Fact Dataset 30
Figure 4-23 Casualties Fact ETL result 30
Figure 4-24 Damages Fact Data flow 31
Figure 4-25 Damages Fact Dataset 31
Figure 4-26 Damages Fact Dataset 32
Figure 4-27 Damages Fact ETL result 32
Figure 5-1 Cube 33
Figure 5-2 MDX Question 1 33
Figure 5-3 MDX Query 2 34
Figure 5-4 MDX Query 3 34
Trang 8Figure 5-5 MDX Query 4 35
Figure 5-6 MDX Query 5 35
Figure 5-7 MDX Query 7 36
Figure 5-8 MDX Query 8 36
Figure 5-9 MDX Query 9 37
Figure 5-10 MDX Query 10 37
Figure 5-11 MDX Query 11 38
Figure 5-12 MDX Query 12 38
Figure 6-1 Question 1 query 39
Figure 6-2 Answer 1 report format 39
Figure 6-3 Answer 1 40
Figure 6-4 Question 2 query 40
Figure 6-5 Answer 2 report format 40
Figure 6-6 Answer 2 41
Figure 6-7 Question 3 query 41
Figure 6-8 Answer 3 report format 41
Figure 6-9 Answer 3 42
Figure 6-10 Question 4 query 42
Figure 6-11 Answer 4 report format 43
Figure 6-12 Answer 4 43
Figure 6-13 Question 5 query 43
Figure 6-14 Answer 5 report format 44
Figure 6-15 Answer 5 44
Figure 6-16 Question 6 query 45
Figure 6-17 Answer 6 report format 45
Figure 6-18 Answer 6 46
Figure 6-19 Question 7 query 46
Figure 6-20 Answer 7 report format 47
Figure 6-21 Answer 7 47
Figure 6-22 Question 8 query 47
Figure 6-23 Answer 8 report format 48
Figure 6-24 Answer 8 48
Figure 6-25 Question 9 query 49
Figure 6-26 Answer 9 report format 49
Figure 6-27 Answer 9 50
Figure 6-28 Question 10 query 50
Figure 6-29 Answer 10 report format 51
Figure 6-30 Answer 10 51
Figure 6-31 Question 11 query 52
Figure 6-32 Answer 11 report format 52
Figure 6-33 Answer 11 53
Figure 6-34 Question 12 query 53
Figure 6-35 Answer 12 report format 54
Figure 6-36 Answer 12 54
Trang 95 SQL Server Integration Services SSIS
Trang 10Chapter 1 Introduction
1.1 The goal of the project
Traffic accidents are a major public health problem that causes death, injury, anddisability for millions of people around the world According to the World HealthOrganization (WHO), road traffic injuries are the leading cause of death forchildren and young adults aged 5-29 years, and the eighth leading cause of deathfor all age groups Road traffic injuries also have a significant economic impact,costing countries on average 3% of their gross domestic product
The traffic accident data warehouse provides a comprehensive and reliable source
of data for analyzing and predicting traffic accidents A data warehouse is acentralized repository of integrated data from various sources, such as policereports, road sensors, vehicle registrations, weather stations, etc A datawarehouse enables the application of data mining techniques to discover patternsand trends in the data, such as the causes, effects, and risk factors of trafficaccidents Data mining is the process of extracting useful information from largeand complex data sets using statistical and machine learning methods
A traffic accident data warehouse can support various objectives and stakeholders
in the field of road safety For example, it can help policymakers and plannersdesign and evaluate effective interventions and regulations to reduce trafficaccidents and fatalities It can also help researchers and analysts identify andunderstand the underlying factors and mechanisms of traffic accidents, such ashuman behavior, road conditions, vehicle characteristics, etc Furthermore, it canhelp drivers and travelers make informed decisions and avoid potential hazards
on the road
The data warehouse should store historical and current data on traffic accidents,such as location, date, time, causes, vehicles involved, injuries, fatalities, etc.The data warehouse should support various analytical queries and reports ontraffic accident data, such as the frequency and distribution of accidents bylocation (ware, district, province), time (date, month, quarter, year), etc., thecorrelation and causation of accidents with various factors, the impact and cost ofaccidents on society and economy, etc
The data warehouse should be scalable, reliable, and efficient to handle largevolumes of data and high concurrency of users
Statistics on the number of traffic accidents by location over years
Statistics on the number of traffic accidents by cause
Statistics on the number of vehicles damaged by cause
The largest number of vehicles damaged, the smallest number of vehiclesdamaged due to causes
Sort the number of casualties in ascending order, by years
Top 3 months with the most accidents
Top 3 months with the least number of accidents
Statistics of the total number of casualties in each province
Statistics of casualties by month of 2022
Trang 111.3 Conceptual in Data warehouse
1.3.1 Dimension
A dimension is a structure that categorizes facts and measures in order to enableusers to answer business questions Commonly used dimensions are products,place and time A dimension is composed of either one level or one or morehierarchies (Time Dimension: Year → Month → Week → Day)
1.3.2 Fact
A fact in data warehousing describes quantitative transactional data likemeasurements, metrics, or the values ready for analysis These include headernumbers, order numbers, ticket numbers, transaction numbers, transactioncurrency, etc The amount sold is a fact measure or a key performance indicator(KPI)
1.4.1 Visual Studio
Figure 1-1 Visual Studio
The Visual Studio IDE is a creative launching pad that you can use to edit, debug,and build code and then publish an app Over and above the standard editor anddebugger that most IDEs provide, Visual Studio includes compilers, codecompletion tools, graphic designers, and many more features to enhance thesoftware development process
1.4.2 SQL Server Integration Services
SQL Server Integration Services is a platform for building enterprise-level dataintegration and data transformations solutions Use Integration Services to solvecomplex business problems by copying or downloading files, loading datawarehouses, cleansing and mining data, and managing SQL Server objects anddata
Integration Services can extract and transform data from a wide variety of sourcessuch as XML data files, flat files, and relational data sources, and then load thedata into one or more destinations
Integration Services includes:
- A rich set of built-in tasks and transformations
- Graphical tools for building packages
- An SSIS Catalog database to store, run, and manage packages
Trang 121.4.3 SQL Server Management Studio
Figure 1-2 SQL Server Management Studio
SQL Server Management Studio (SSMS) is an integrated environment formanaging any SQL infrastructure Use SSMS to access, configure, manage,administer, and develop all components of SQL Server, Azure SQL Database,Azure SQL Managed Instance, SQL Server on Azure VM, and Azure SynapseAnalytics SSMS provides a single comprehensive utility that combines a broadgroup of graphical tools with many rich script editors to provide access to SQLServer for developers and database administrators of all skill levels
Trang 13Chapter 2 Data warehouse analysis and design2.1 Conceptual modeling
2.1.1 Measure and dimension entities
Measure:
- NoAccidents: The number of accidents
- NoVehiclesDamaged: The number of vehicles damaged
- EstimatedDamage: Estimated monetary damages from the accident
- NoCasualties: The number of casualties
The hierarchies: Year → Quarter → Month → Date
Figure 2-3 Time Dimension
2.1.2.2Location Dimension
The hierarchies: Province → District → Ward → Street
Trang 142.1.2.3Participant Dimension
The hierarchies: Gender, Age, Age Group, Status
Figure 2-5 Participant Dimension
2.1.2.4Cause Dimension
The hierarchies: Cause Name
Figure 2-6 Cause Dimension
2.1.2.5Vehicle Dimension
The hierarchies: Vehicle Name
Trang 152.1.3 Conceptual modeling diagram
Figure 2-8 Conceptual modeling diagram 2.2 Logical modeling
2.2.1 Fact and dimension tables
Fact tables:
- Table Damages_Fact
Table 2-1 Table Damages Fact
NoVehiclesDama
ged
The number of vehicles damaged int
EstimatedDamage Estimated monetary damages of
the accident
int
- Table Accidents_Fact
Trang 16Field Name Description Type
Quantity The number of accidents int
- Table Casualties_Fact
Table 2-3 Casualties Fact
LocationID ID of the location intParticipantID ID of the participant intNoCasualties The number of casualties intDim tables:
- Table Time_Dim
Table 2-4 Time Dim
Date The day the accident
Table 2-5 Location Dim
Province The province where the
accident occurs
String
Trang 17District The district where the
Table 2-6 Cause Dim
CauseName The description of cause String
- Table Vehicle_Dim
Table 2-7 Vehicle Dim
Vehicle Name Name of vehicle String
- Table Participants_Dim
Table 2-8 Participants Dim
ParticipantID ID of participant intGender Gender of participant String
Status Injury status of
participant
String
Trang 18Question 2: Number of accidents by cause
Question 3: Number of accidents, number of casualties, number of vehicles damaged over the years
Question 4: Number of casualties by age group over the years
Question 5: Number of casualties by gender over the years
Question 6: Number of people injured and dead over the years
Question 7: Number of vehicles damaged by vehicle type
Trang 19Question 8: Top 5 provinces/cities with the most accidents
Question 9: Top 5 provinces/cities with the highest number of deaths in the adult age group
Question 10: Top 5 provinces/cities with the largest number of casualtiesQuestion 11: Top 5 provinces/cities with the most property damage
Question 12: Top 5 provinces/cities with the most damage to vehicles
Trang 20Chapter 3 Data Warehouse Development
Trang 22Chapter 4 ETL Process
4.1 Conceptual ETL design
Figure 4-18 Conceptual ETL design
Control flow: Cause_Dim, Participant_Dim, Time_Dim, Vehical_Dim, Location_Dim is done first, followed by Accident_Fact, Damages_Fact, Causualties_Fact
Trang 234.2 ETL development by using SSIS 4.2.1 Time_Dim
Data flow:
Figure 4-19 Time Dim Data flow
Dataset:
Figure 4-20 Time Dim Dataset
Results after performing the ETL process:
Trang 244.2.2 Location_Dim
Data flow:
Figure 4-22 Location Dim Data flow
Dataset:
Figure 4-23 Location Dim Dataset
Results after performing the ETL process:
Trang 254.2.3 Cause_Dim
Data flow:
Figure 4-25 Cause Dim Dataflow
Dataset:
Figure 4-26 Cause Dim Dataset
Results after performing the ETL process:
Trang 264.2.4 Participant_Dim
Data flow:
Figure 4-28 Participant Dim Data flow
Dataset:
Trang 27Results after performing the ETL process:
Figure 4-30 Participant Dim Dataset
Trang 284.2.5 Vehicle_Dim
Figure 4-31 Vehicle Dim Data flow
Dataset:
Figure 4-32 Vehicle Dim Dataset
Results after performing the ETL process:
Figure 4-33 Vehicle Dim ETL Result
Trang 294.2.6 Accidents_Fact
Figure 4-34 Accident Fact Data flow
Dataset:
Figure 4-35 Accident Fact Dataset
Results after performing the ETL process:
Trang 31Results after performing the ETL process:
Figure 4-40 Casualties Fact ETL result
Trang 33Results after performing the ETL process:
Trang 34Chapter 5 OLAP Analysis
Figure 5-45 Cube
Figure 5-46 MDX Question 1
Trang 35Figure 5-48 MDX Query 3
Trang 36Figure 5-50 MDX Query 5
Trang 37Figure 5-52 MDX Query 8
Trang 38Figure 5-54 MDX Query 10
Trang 39Figure 5-56 MDX Query 12
Trang 40Chapter 6 SSRS
6.1 Number of accidents, number of casualties, number of vehicles damaged by month, quarter.
Figure 6-57 Question 1 query
Figure 6-58 Answer 1 report format
Trang 416.2 Number of accidents by cause
Figure 6-60 Question 2 query
Figure 6-61 Answer 2 report format
Trang 42Number of accidents, number of casualties, number of vehicles damaged over the years
Figure 6-63 Question 3 query
Figure 6-64 Answer 3 report format
Trang 436.3 Number of casualties by age group over the years
Figure 6-66 Question 4 query