1. Trang chủ
  2. » Luận Văn - Báo Cáo

Data warehouse building a data warehouse for traffic accident

58 6 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Warehouse Building a Data Warehouse for Traffic Accident
Người hướng dẫn PhD. Nguyễn Thu Hương
Trường học Vietnam - Korea University of Information & Communication Technology
Chuyên ngành Data Warehouse Building
Thể loại Graduation project
Năm xuất bản 2023
Thành phố Da Nang
Định dạng
Số trang 58
Dung lượng 5,67 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

6.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 456.7 Top 5 provinces/citi

Trang 1

VIETNAM - KOREA UNIVERSITY OF INFORMATION &

Trang 2

VIETNAM - KOREA UNIVERSITY OF INFORMATION &

Trang 3

First of all, the team would like to express their sincere thanks to PhD NguyenThu Huong (Lecturer of Data Warehouse) for helping the group acquire the basicknowledge needed as the foundation to carry out this thesis She directly guidedthe group enthusiastically, corrected mistakes, and contributed many valuablecomments to help the group complete their subject report well During onesemester of project implementation, the group applied the accumulatedbackground knowledge and combined it with learning and researching newknowledge Since then, the team has applied what it has collected to complete thebest project report However, in the implementation process, the team cannotavoid shortcomings Therefore, the group is looking forward to receivingsuggestions from teachers in order to improve the knowledge that it has acquiredand prepare the group to tackle other topics in the future

Sincerely, thank you!

Trang 4

………

………

………

………

………

………

Trang 5

6.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 45

6.7 Top 5 provinces/cities with the most accidents 476.8 Top 5 provinces/cities with the highest number of deaths in the adult age

6.9 Top 5 provinces/cities with the largest number of casualties 506.10 Top 5 provinces/cities with the most property damage 526.11 Top 5 provinces/cities with the most damage to vehicles 53

Trang 7

LIST OF IMAGES

Figure 1-1 Visual Studio 11

Figure 1-2 SQL Server Management Studio 12

Figure 2-1 Time Dimension 13

Figure 2-2 Location Dimension 14

Figure 2-3 Participant Dimension 14

Figure 2-4 Cause Dimension 14

Figure 2-5 Vehicle Dimension 14

Figure 2-6 Conceptual modeling diagram 15

Figure 2-7 Star Schema 18

Figure 3-1 Cause Dim 20

Figure 3-2 Location Dim 20

Figure 3-3 Participant Dim 20

Figure 3-4 Time Dim 20

Figure 3-5 Vehicle Dim 20

Figure 3-6 Accidents Fact 21

Figure 3-7 Casualties Fact 21

Figure 3-8 Damages Fact 21

Figure 4-1 Conceptual ETL design 22

Figure 4-2 Time Dim Data flow 22

Figure 4-3 Time Dim Dataset 23

Figure 4-4 Time Dim ETL result 23

Figure 4-5 Location Dim Data flow 23

Figure 4-6 Location Dim Dataset 24

Figure 4-7 Location Dim ETL result 24

Figure 4-8 Cause Dim Dataflow 24

Figure 4-9 Cause Dim Dataset 25

Figure 4-10 Cause Dim ETL result 25

Figure 4-11 Participant Dim Data flow 25

Figure 4-12 Participant Dim Dataset 26

Figure 4-13 Participant Dim Dataset 26

Figure 4-14 Vehicle Dim Data flow 27

Figure 4-15 Vehicle Dim Dataset 27

Figure 4-16 Vehicle Dim ETL Result 27

Figure 4-17 Accident Fact Data flow 28

Figure 4-18 Accident Fact Dataset 28

Figure 4-19 Accident Fact ETL result 28

Figure 4-20 Casualties Fact Data flow 29

Figure 4-21 Casualties Fact Dataset 29

Figure 4-22 Casualties Fact Dataset 30

Figure 4-23 Casualties Fact ETL result 30

Figure 4-24 Damages Fact Data flow 31

Figure 4-25 Damages Fact Dataset 31

Figure 4-26 Damages Fact Dataset 32

Figure 4-27 Damages Fact ETL result 32

Figure 5-1 Cube 33

Figure 5-2 MDX Question 1 33

Figure 5-3 MDX Query 2 34

Figure 5-4 MDX Query 3 34

Trang 8

Figure 5-5 MDX Query 4 35

Figure 5-6 MDX Query 5 35

Figure 5-7 MDX Query 7 36

Figure 5-8 MDX Query 8 36

Figure 5-9 MDX Query 9 37

Figure 5-10 MDX Query 10 37

Figure 5-11 MDX Query 11 38

Figure 5-12 MDX Query 12 38

Figure 6-1 Question 1 query 39

Figure 6-2 Answer 1 report format 39

Figure 6-3 Answer 1 40

Figure 6-4 Question 2 query 40

Figure 6-5 Answer 2 report format 40

Figure 6-6 Answer 2 41

Figure 6-7 Question 3 query 41

Figure 6-8 Answer 3 report format 41

Figure 6-9 Answer 3 42

Figure 6-10 Question 4 query 42

Figure 6-11 Answer 4 report format 43

Figure 6-12 Answer 4 43

Figure 6-13 Question 5 query 43

Figure 6-14 Answer 5 report format 44

Figure 6-15 Answer 5 44

Figure 6-16 Question 6 query 45

Figure 6-17 Answer 6 report format 45

Figure 6-18 Answer 6 46

Figure 6-19 Question 7 query 46

Figure 6-20 Answer 7 report format 47

Figure 6-21 Answer 7 47

Figure 6-22 Question 8 query 47

Figure 6-23 Answer 8 report format 48

Figure 6-24 Answer 8 48

Figure 6-25 Question 9 query 49

Figure 6-26 Answer 9 report format 49

Figure 6-27 Answer 9 50

Figure 6-28 Question 10 query 50

Figure 6-29 Answer 10 report format 51

Figure 6-30 Answer 10 51

Figure 6-31 Question 11 query 52

Figure 6-32 Answer 11 report format 52

Figure 6-33 Answer 11 53

Figure 6-34 Question 12 query 53

Figure 6-35 Answer 12 report format 54

Figure 6-36 Answer 12 54

Trang 9

5 SQL Server Integration Services SSIS

Trang 10

Chapter 1 Introduction

1.1 The goal of the project

Traffic accidents are a major public health problem that causes death, injury, anddisability for millions of people around the world According to the World HealthOrganization (WHO), road traffic injuries are the leading cause of death forchildren and young adults aged 5-29 years, and the eighth leading cause of deathfor all age groups Road traffic injuries also have a significant economic impact,costing countries on average 3% of their gross domestic product

The traffic accident data warehouse provides a comprehensive and reliable source

of data for analyzing and predicting traffic accidents A data warehouse is acentralized repository of integrated data from various sources, such as policereports, road sensors, vehicle registrations, weather stations, etc A datawarehouse enables the application of data mining techniques to discover patternsand trends in the data, such as the causes, effects, and risk factors of trafficaccidents Data mining is the process of extracting useful information from largeand complex data sets using statistical and machine learning methods

A traffic accident data warehouse can support various objectives and stakeholders

in the field of road safety For example, it can help policymakers and plannersdesign and evaluate effective interventions and regulations to reduce trafficaccidents and fatalities It can also help researchers and analysts identify andunderstand the underlying factors and mechanisms of traffic accidents, such ashuman behavior, road conditions, vehicle characteristics, etc Furthermore, it canhelp drivers and travelers make informed decisions and avoid potential hazards

on the road

The data warehouse should store historical and current data on traffic accidents,such as location, date, time, causes, vehicles involved, injuries, fatalities, etc.The data warehouse should support various analytical queries and reports ontraffic accident data, such as the frequency and distribution of accidents bylocation (ware, district, province), time (date, month, quarter, year), etc., thecorrelation and causation of accidents with various factors, the impact and cost ofaccidents on society and economy, etc

The data warehouse should be scalable, reliable, and efficient to handle largevolumes of data and high concurrency of users

Statistics on the number of traffic accidents by location over years

Statistics on the number of traffic accidents by cause

Statistics on the number of vehicles damaged by cause

The largest number of vehicles damaged, the smallest number of vehiclesdamaged due to causes

Sort the number of casualties in ascending order, by years

Top 3 months with the most accidents

Top 3 months with the least number of accidents

Statistics of the total number of casualties in each province

Statistics of casualties by month of 2022

Trang 11

1.3 Conceptual in Data warehouse

1.3.1 Dimension

A dimension is a structure that categorizes facts and measures in order to enableusers to answer business questions Commonly used dimensions are products,place and time A dimension is composed of either one level or one or morehierarchies (Time Dimension: Year → Month → Week → Day)

1.3.2 Fact

A fact in data warehousing describes quantitative transactional data likemeasurements, metrics, or the values ready for analysis These include headernumbers, order numbers, ticket numbers, transaction numbers, transactioncurrency, etc The amount sold is a fact measure or a key performance indicator(KPI)

1.4.1 Visual Studio

Figure 1-1 Visual Studio

The Visual Studio IDE is a creative launching pad that you can use to edit, debug,and build code and then publish an app Over and above the standard editor anddebugger that most IDEs provide, Visual Studio includes compilers, codecompletion tools, graphic designers, and many more features to enhance thesoftware development process

1.4.2 SQL Server Integration Services

SQL Server Integration Services is a platform for building enterprise-level dataintegration and data transformations solutions Use Integration Services to solvecomplex business problems by copying or downloading files, loading datawarehouses, cleansing and mining data, and managing SQL Server objects anddata

Integration Services can extract and transform data from a wide variety of sourcessuch as XML data files, flat files, and relational data sources, and then load thedata into one or more destinations

Integration Services includes:

- A rich set of built-in tasks and transformations

- Graphical tools for building packages

- An SSIS Catalog database to store, run, and manage packages

Trang 12

1.4.3 SQL Server Management Studio

Figure 1-2 SQL Server Management Studio

SQL Server Management Studio (SSMS) is an integrated environment formanaging any SQL infrastructure Use SSMS to access, configure, manage,administer, and develop all components of SQL Server, Azure SQL Database,Azure SQL Managed Instance, SQL Server on Azure VM, and Azure SynapseAnalytics SSMS provides a single comprehensive utility that combines a broadgroup of graphical tools with many rich script editors to provide access to SQLServer for developers and database administrators of all skill levels

Trang 13

Chapter 2 Data warehouse analysis and design2.1 Conceptual modeling

2.1.1 Measure and dimension entities

Measure:

- NoAccidents: The number of accidents

- NoVehiclesDamaged: The number of vehicles damaged

- EstimatedDamage: Estimated monetary damages from the accident

- NoCasualties: The number of casualties

The hierarchies: Year → Quarter → Month → Date

Figure 2-3 Time Dimension

2.1.2.2Location Dimension

The hierarchies: Province → District → Ward → Street

Trang 14

2.1.2.3Participant Dimension

The hierarchies: Gender, Age, Age Group, Status

Figure 2-5 Participant Dimension

2.1.2.4Cause Dimension

The hierarchies: Cause Name

Figure 2-6 Cause Dimension

2.1.2.5Vehicle Dimension

The hierarchies: Vehicle Name

Trang 15

2.1.3 Conceptual modeling diagram

Figure 2-8 Conceptual modeling diagram 2.2 Logical modeling

2.2.1 Fact and dimension tables

Fact tables:

- Table Damages_Fact

Table 2-1 Table Damages Fact

NoVehiclesDama

ged

The number of vehicles damaged int

EstimatedDamage Estimated monetary damages of

the accident

int

- Table Accidents_Fact

Trang 16

Field Name Description Type

Quantity The number of accidents int

- Table Casualties_Fact

Table 2-3 Casualties Fact

LocationID ID of the location intParticipantID ID of the participant intNoCasualties The number of casualties intDim tables:

- Table Time_Dim

Table 2-4 Time Dim

Date The day the accident

Table 2-5 Location Dim

Province The province where the

accident occurs

String

Trang 17

District The district where the

Table 2-6 Cause Dim

CauseName The description of cause String

- Table Vehicle_Dim

Table 2-7 Vehicle Dim

Vehicle Name Name of vehicle String

- Table Participants_Dim

Table 2-8 Participants Dim

ParticipantID ID of participant intGender Gender of participant String

Status Injury status of

participant

String

Trang 18

Question 2: Number of accidents by cause

Question 3: Number of accidents, number of casualties, number of vehicles damaged over the years

Question 4: Number of casualties by age group over the years

Question 5: Number of casualties by gender over the years

Question 6: Number of people injured and dead over the years

Question 7: Number of vehicles damaged by vehicle type

Trang 19

Question 8: Top 5 provinces/cities with the most accidents

Question 9: Top 5 provinces/cities with the highest number of deaths in the adult age group

Question 10: Top 5 provinces/cities with the largest number of casualtiesQuestion 11: Top 5 provinces/cities with the most property damage

Question 12: Top 5 provinces/cities with the most damage to vehicles

Trang 20

Chapter 3 Data Warehouse Development

Trang 22

Chapter 4 ETL Process

4.1 Conceptual ETL design

Figure 4-18 Conceptual ETL design

Control flow: Cause_Dim, Participant_Dim, Time_Dim, Vehical_Dim, Location_Dim is done first, followed by Accident_Fact, Damages_Fact, Causualties_Fact

Trang 23

4.2 ETL development by using SSIS 4.2.1 Time_Dim

Data flow:

Figure 4-19 Time Dim Data flow

Dataset:

Figure 4-20 Time Dim Dataset

Results after performing the ETL process:

Trang 24

4.2.2 Location_Dim

Data flow:

Figure 4-22 Location Dim Data flow

Dataset:

Figure 4-23 Location Dim Dataset

Results after performing the ETL process:

Trang 25

4.2.3 Cause_Dim

Data flow:

Figure 4-25 Cause Dim Dataflow

Dataset:

Figure 4-26 Cause Dim Dataset

Results after performing the ETL process:

Trang 26

4.2.4 Participant_Dim

Data flow:

Figure 4-28 Participant Dim Data flow

Dataset:

Trang 27

Results after performing the ETL process:

Figure 4-30 Participant Dim Dataset

Trang 28

4.2.5 Vehicle_Dim

Figure 4-31 Vehicle Dim Data flow

Dataset:

Figure 4-32 Vehicle Dim Dataset

Results after performing the ETL process:

Figure 4-33 Vehicle Dim ETL Result

Trang 29

4.2.6 Accidents_Fact

Figure 4-34 Accident Fact Data flow

Dataset:

Figure 4-35 Accident Fact Dataset

Results after performing the ETL process:

Trang 31

Results after performing the ETL process:

Figure 4-40 Casualties Fact ETL result

Trang 33

Results after performing the ETL process:

Trang 34

Chapter 5 OLAP Analysis

Figure 5-45 Cube

Figure 5-46 MDX Question 1

Trang 35

Figure 5-48 MDX Query 3

Trang 36

Figure 5-50 MDX Query 5

Trang 37

Figure 5-52 MDX Query 8

Trang 38

Figure 5-54 MDX Query 10

Trang 39

Figure 5-56 MDX Query 12

Trang 40

Chapter 6 SSRS

6.1 Number of accidents, number of casualties, number of vehicles damaged by month, quarter.

Figure 6-57 Question 1 query

Figure 6-58 Answer 1 report format

Trang 41

6.2 Number of accidents by cause

Figure 6-60 Question 2 query

Figure 6-61 Answer 2 report format

Trang 42

Number of accidents, number of casualties, number of vehicles damaged over the years

Figure 6-63 Question 3 query

Figure 6-64 Answer 3 report format

Trang 43

6.3 Number of casualties by age group over the years

Figure 6-66 Question 4 query

Ngày đăng: 24/08/2023, 10:23

TỪ KHÓA LIÊN QUAN

w