1. Trang chủ
  2. » Luận Văn - Báo Cáo

(Tiểu luận) data warehouse building a data warehouse for traffic accident

58 11 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Warehouse Building A Data Warehouse For Traffic Accident
Tác giả Lê Phú Quốc, Phạm Toàn Phúc, Lê Thị Hồng Quý, Lê Việt Thắng
Người hướng dẫn PhD. Nguyễn Thu Hương
Trường học Vietnam - Korea University of Information & Communication Technology
Chuyên ngành Computer Science
Thể loại Tiểu luận
Năm xuất bản 2023
Thành phố Da Nang
Định dạng
Số trang 58
Dung lượng 1,44 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

6.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 456.6 Number of vehicles d

Trang 1

VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION

Trang 2

VIETNAM - KOREA UNIVERSITY OF INFORMATION & COMMUNICATION

TECHNOLOGY

FACULTY OF SCIENCE DEPARTMENT

DATA WAREHOUSE BUILDING A DATA WAREHOUSE FOR TRAFFIC ACCIDENT

Trang 3

First of all, the team would like to express their sincere thanks to PhD NguyenThu Huong (Lecturer of Data Warehouse) for helping the group acquire thebasic knowledge needed as the foundation to carry out this thesis She directlyguided the group enthusiastically, corrected mistakes, and contributed manyvaluable comments to help the group complete their subject report well Duringone semester of project implementation, the group applied the accumulatedbackground knowledge and combined it with learning and researching newknowledge Since then, the team has applied what it has collected to completethe best project report However, in the implementation process, the team cannotavoid shortcomings Therefore, the group is looking forward to receivingsuggestions from teachers in order to improve the knowledge that it hasacquired and prepare the group to tackle other topics in the future

Sincerely, thank you!

Trang 4

………

………

………

………

………

………

Trang 5

6.3 Number of casualties by age group over the years 426.4 Number of casualties by gender over the years 436.5 Number of people injured and dead over the years 456.6 Number of vehicles damaged by vehicle type 466.7 Top 5 provinces/cities with the most accidents 476.8 Top 5 provinces/cities with the highest number of deaths in the adult age

6.9 Top 5 provinces/cities with the largest number of casualties 506.10 Top 5 provinces/cities with the most property damage 526.11 Top 5 provinces/cities with the most damage to vehicles 53

Trang 7

LIST OF IMAGES

Figure 1-1 Visual Studio 11

Figure 1-2 SQL Server Management Studio 12

Figure 2-1 Time Dimension 13

Figure 2-2 Location Dimension 14

Figure 2-3 Participant Dimension 14

Figure 2-4 Cause Dimension 14

Figure 2-5 Vehicle Dimension 14

Figure 2-6 Conceptual modeling diagram 15

Figure 2-7 Star Schema 18

Figure 3-1 Cause Dim 20

Figure 3-2 Location Dim 20

Figure 3-3 Participant Dim 20

Figure 3-4 Time Dim 20

Figure 3-5 Vehicle Dim 20

Figure 3-6 Accidents Fact 21

Figure 3-7 Casualties Fact 21

Figure 3-8 Damages Fact 21

Figure 4-1 Conceptual ETL design 22

Figure 4-2 Time Dim Data flow 22

Figure 4-3 Time Dim Dataset 23

Figure 4-4 Time Dim ETL result 23

Figure 4-5 Location Dim Data flow 23

Figure 4-6 Location Dim Dataset 24

Figure 4-7 Location Dim ETL result 24

Figure 4-8 Cause Dim Dataflow 24

Figure 4-9 Cause Dim Dataset 25

Figure 4-10 Cause Dim ETL result 25

Figure 4-11 Participant Dim Data flow 25

Figure 4-12 Participant Dim Dataset 26

Figure 4-13 Participant Dim Dataset 26

Figure 4-14 Vehicle Dim Data flow 27

Figure 4-15 Vehicle Dim Dataset 27

Figure 4-16 Vehicle Dim ETL Result 27

Figure 4-17 Accident Fact Data flow 28

Figure 4-18 Accident Fact Dataset 28

Figure 4-19 Accident Fact ETL result 28

Figure 4-20 Casualties Fact Data flow 29

Figure 4-21 Casualties Fact Dataset 29

Figure 4-22 Casualties Fact Dataset 30

Figure 4-23 Casualties Fact ETL result 30

Figure 4-24 Damages Fact Data flow 31

Figure 4-25 Damages Fact Dataset 31

Figure 4-26 Damages Fact Dataset 32

Figure 4-27 Damages Fact ETL result 32

Figure 5-1 Cube 33

Figure 5-2 MDX Question 1 33

Figure 5-3 MDX Query 2 34

Figure 5-4 MDX Query 3 34

Trang 8

Figure 5-5 MDX Query 4 35

Figure 5-6 MDX Query 5 35

Figure 5-7 MDX Query 7 36

Figure 5-8 MDX Query 8 36

Figure 5-9 MDX Query 9 37

Figure 5-10 MDX Query 10 37

Figure 5-11 MDX Query 11 38

Figure 5-12 MDX Query 12 38

Figure 6-1 Question 1 query 39

Figure 6-2 Answer 1 report format 39

Figure 6-3 Answer 1 40

Figure 6-4 Question 2 query 40

Figure 6-5 Answer 2 report format 40

Figure 6-6 Answer 2 41

Figure 6-7 Question 3 query 41

Figure 6-8 Answer 3 report format 41

Figure 6-9 Answer 3 42

Figure 6-10 Question 4 query 42

Figure 6-11 Answer 4 report format 43

Figure 6-12 Answer 4 43

Figure 6-13 Question 5 query 43

Figure 6-14 Answer 5 report format 44

Figure 6-15 Answer 5 44

Figure 6-16 Question 6 query 45

Figure 6-17 Answer 6 report format 45

Figure 6-18 Answer 6 46

Figure 6-19 Question 7 query 46

Figure 6-20 Answer 7 report format 47

Figure 6-21 Answer 7 47

Figure 6-22 Question 8 query 47

Figure 6-23 Answer 8 report format 48

Figure 6-24 Answer 8 48

Figure 6-25 Question 9 query 49

Figure 6-26 Answer 9 report format 49

Figure 6-27 Answer 9 50

Figure 6-28 Question 10 query 50

Figure 6-29 Answer 10 report format 51

Figure 6-30 Answer 10 51

Figure 6-31 Question 11 query 52

Figure 6-32 Answer 11 report format 52

Figure 6-33 Answer 11 53

Figure 6-34 Question 12 query 53

Figure 6-35 Answer 12 report format 54

Figure 6-36 Answer 12 54

Trang 9

LIST OF ACRONYMS

environment

Trang 10

Chapter 1 Introduction

1.1 The goal of the project

Traffic accidents are a major public health problem that causes death, injury,and disability for millions of people around the world According to the WorldHealth Organization (WHO), road traffic injuries are the leading cause of deathfor children and young adults aged 5-29 years, and the eighth leading cause ofdeath for all age groups Road traffic injuries also have a significant economicimpact, costing countries on average 3% of their gross domestic product

The traffic accident data warehouse provides a comprehensive and reliable source ofdata for analyzing and predicting traffic accidents A data warehouse is a centralizedrepository of integrated data from various sources, such as police reports, roadsensors, vehicle registrations, weather stations, etc A data warehouse enables theapplication of data mining techniques to discover patterns and trends in the data, such

as the causes, effects, and risk factors of traffic accidents Data mining is the process

of extracting useful information from large and complex data sets using statistical andmachine learning methods

A traffic accident data warehouse can support various objectives andstakeholders in the field of road safety For example, it can help policymakersand planners design and evaluate effective interventions and regulations toreduce traffic accidents and fatalities It can also help researchers and analystsidentify and understand the underlying factors and mechanisms of trafficaccidents, such as human behavior, road conditions, vehicle characteristics, etc.Furthermore, it can help drivers and travelers make informed decisions andavoid potential hazards on the road

1.2 Requirements

The data warehouse should store historical and current data on traffic accidents, such as location, date, time, causes, vehicles involved, injuries, fatalities, etc The data warehouse should support various analytical queries and reports on traffic accident data, such as the frequency and distribution of accidents by location (ware, district, province), time (date, month, quarter, year), etc., the correlation and causation of accidents with various factors, the impact and cost

of accidents on society and economy, etc

The data warehouse should be scalable, reliable, and efficient to handle large volumes of data and high concurrency of users

Statistics on the number of traffic accidents by location over years

Statistics on the number of traffic accidents by cause

Statistics on the number of vehicles damaged by cause

The largest number of vehicles damaged, the smallest number of vehicles damaged due to causes

Sort the number of casualties in ascending order, by years

Top 3 months with the most accidents

Top 3 months with the least number of accidents

Statistics of the total number of casualties in each province

Statistics of casualties by month of 2022

Trang 11

1.3 Conceptual in Data warehouse

1.3.1 Dimension

A dimension is a structure that categorizes facts and measures in order to enableusers to answer business questions Commonly used dimensions are products,place and time A dimension is composed of either one level or one or morehierarchies (Time Dimension: Year → Month → Week → Day)

1.3.2 Fact

A fact in data warehousing describes quantitative transactional data likemeasurements, metrics, or the values ready for analysis These include headernumbers, order numbers, ticket numbers, transaction numbers, transactioncurrency, etc The amount sold is a fact measure or a key performance indicator(KPI)

1.4 Tools

1.4.1 Visual Studio

Figure 1-1 Visual Studio

The Visual Studio IDE is a creative launching pad that you can use to edit,debug, and build code and then publish an app Over and above the standardeditor and debugger that most IDEs provide, Visual Studio includes compilers,code completion tools, graphic designers, and many more features to enhancethe software development process

1.4.2 SQL Server Integration Services

SQL Server Integration Services is a platform for building enterprise-level dataintegration and data transformations solutions Use Integration Services to solvecomplex business problems by copying or downloading files, loading datawarehouses, cleansing and mining data, and managing SQL Server objects anddata

Integration Services can extract and transform data from a wide variety of sources

such as XML data files, flat files, and relational data sources, and then load thedata into one or more destinations

Integration Services includes:

- A rich set of built-in tasks and transformations

- Graphical tools for building packages

- An SSIS Catalog database to store, run, and manage packages

Trang 12

1.4.3 SQL Server Management Studio

Figure 1-2 SQL Server Management Studio

SQL Server Management Studio (SSMS) is an integrated environment formanaging any SQL infrastructure Use SSMS to access, configure, manage,administer, and develop all components of SQL Server, Azure SQL Database,Azure SQL Managed Instance, SQL Server on Azure VM, and Azure SynapseAnalytics SSMS provides a single comprehensive utility that combines a broadgroup of graphical tools with many rich script editors to provide access to SQLServer for developers and database administrators of all skill levels

Trang 13

Chapter 2 Data warehouse analysis and design2.1 Conceptual modeling

2.1.1 Measure and dimension entities

Measure:

- NoAccidents: The number of accidents

- NoVehiclesDamaged: The number of vehicles damaged

- EstimatedDamage: Estimated monetary damages from the accident

- NoCasualties: The number of casualties

The hierarchies: Year → Quarter → Month → Date

Figure 2-3 Time Dimension

2.1.2.2Location Dimension

The hierarchies: Province → District → Ward → Street

Trang 14

2.1.2.3Participant Dimension

The hierarchies: Gender, Age, Age Group, Status

Figure 2-5 Participant Dimension

2.1.2.4Cause Dimension

The hierarchies: Cause Name

Figure 2-6 Cause Dimension

2.1.2.5Vehicle Dimension

The hierarchies: Vehicle Name

Trang 15

2.1.3 Conceptual modeling diagram

Figure 2-8 Conceptual modeling diagram 2.2 Logical modeling

2.2.1 Fact and dimension tables

Fact tables:

- Table Damages_Fact

Table 2-1 Table Damages Fact

NoVehiclesDama The number of vehicles damaged intged

EstimatedDamage Estimated monetary damages of int

the accident

- Table Accidents_Fact

Trang 16

Field Name Description Type

- Table Casualties_Fact

Table 2-3 Casualties Fact

Dim tables:

Table 2-4 Time Dim

Table 2-5 Location Dim

accident occurs

Trang 17

District The district where the String

Table 2-6 Cause Dim

- Table Vehicle_Dim

Table 2-7 Vehicle Dim

- Table Participants_Dim

Table 2-8 Participants Dim

participant

Trang 18

Question 2: Number of accidents by cause

Question 3: Number of accidents, number of casualties, number of vehicles damaged over the years

Question 4: Number of casualties by age group over the yearsQuestion 5: Number of casualties by gender over the yearsQuestion 6: Number of people injured and dead over the yearsQuestion 7: Number of vehicles damaged by vehicle type

Trang 19

Question 8: Top 5 provinces/cities with the most accidents

Question 9: Top 5 provinces/cities with the highest number of deaths in the adult age group

Question 10: Top 5 provinces/cities with the largest number of casualtiesQuestion 11: Top 5 provinces/cities with the most property damageQuestion 12: Top 5 provinces/cities with the most damage to vehicles

Trang 20

Table Accidents_Fact

Trang 22

Chapter 4 ETL Process

4.1 Conceptual ETL design

Figure 4-18 Conceptual ETL design

Control flow: Cause_Dim, Participant_Dim, Time_Dim,Vehical_Dim, Location_Dim is done first, followed by Accident_Fact,Damages_Fact, Causualties_Fact

Trang 23

4.2 ETL development by using SSIS

4.2.1 Time_Dim

Data flow:

Figure 4-19 Time Dim Data flow

Dataset:

Figure 4-20 Time Dim Dataset

Results after performing the ETL process:

Trang 24

4.2.2 Location_Dim

Data flow:

Figure 4-22 Location Dim Data flow

Dataset:

Figure 4-23 Location Dim Dataset

Results after performing the ETL process:

Trang 25

4.2.3 Cause_Dim

Data flow:

Figure 4-25 Cause Dim Dataflow

Dataset:

Figure 4-26 Cause Dim Dataset

Results after performing the ETL process:

Trang 26

4.2.4 Participant_Dim

Data flow:

Figure 4-28 Participant Dim Data flow

Dataset:

Trang 27

Results after performing the ETL process:

Figure 4-30 Participant Dim Dataset

Trang 28

4.2.5 Vehicle_Dim

Figure 4-31 Vehicle Dim Data flow

Dataset:

Figure 4-32 Vehicle Dim Dataset

Results after performing the ETL process:

Figure 4-33 Vehicle Dim ETL Result

Trang 29

4.2.6 Accidents_Fact

Figure 4-34 Accident Fact Data flow

Dataset:

Figure 4-35 Accident Fact Dataset

Results after performing the ETL process:

Trang 31

Results after performing the ETL process:

Figure 4-40 Casualties Fact ETL result

Trang 33

Results after performing the ETL process:

Trang 34

Chapter 5 OLAP Analysis

Figure 5-45 Cube

Figure 5-46 MDX Question 1

Trang 35

Figure 5-48 MDX Query 3

Trang 36

Figure 5-50 MDX Query 5

Trang 37

Figure 5-52 MDX Query 8

Trang 38

Figure 5-54 MDX Query 10

Trang 39

Figure 5-56 MDX Query 12

Trang 40

Chapter 6 SSRS

6.1 Number of accidents, number of casualties, number of vehicles damaged by month, quarter.

Figure 6-57 Question 1 query

Figure 6-58 Answer 1 report format

Trang 41

6.2 Number of accidents by cause

Figure 6-60 Question 2 query

Figure 6-61 Answer 2 report format

Trang 42

Number of accidents, number of casualties, number of vehicles damaged over the years

Figure 6-63 Question 3 query

Figure 6-64 Answer 3 report format

Trang 43

6.3 Number of casualties by age group over the years

Figure 6-66 Question 4 query

Trang 44

Figure 6-68 Answer 4 6.4 Number of casualties by gender over the years

Trang 45

Figure 6-71 Answer 5

Trang 46

6.5 Number of people injured and dead over the years

Figure 6-72 Question 6 query

Figure 6-73 Answer 6 report format

Trang 47

6.6 Number of vehicles damaged by vehicle type

Figure 6-75 Question 7 query

Trang 48

Figure 6-77 Answer 7 6.7 Top 5 provinces/cities with the most accidents

Trang 49

Figure 6-80 Answer 8

Trang 50

6.8 Top 5 provinces/cities with the highest number of deaths in the adult age group

Figure 6-81 Question 9 query

Figure 6-82 Answer 9 report format

Trang 51

6.9 Top 5 provinces/cities with the largest number of casualties

Figure 6-84 Question 10 query

Ngày đăng: 20/09/2023, 15:03

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w