1. Trang chủ
  2. » Luận Văn - Báo Cáo

Data warehouse building a data warehouse of building violations in the city of chicago

18 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Warehouse Building a Data Warehouse of Building Violations in the City of Chicago
Người hướng dẫn Dinh Vuong Gia Huy, Tran Thi My Duyen, Tran Thanh Liem
Trường học Vietnam Korea University of Information and Communication Technology
Chuyên ngành Computer Science
Thể loại Graduation project
Năm xuất bản 2023
Thành phố Da Nang
Định dạng
Số trang 18
Dung lượng 1,12 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

VIETNAM KOREA THE UNIVERSITY OF INFORMATION ANDCOMMUNICATION TECHNOLOGY COMPUTER SCIENCE ---????? ---DATA WAREHOUSE BUILDING A DATA WAREHOUSE OF BUILDING VIOLATIONS IN THE CITY OF CHICA

Trang 1

VIETNAM KOREA THE UNIVERSITY OF INFORMATION AND

COMMUNICATION TECHNOLOGY COMPUTER SCIENCE

-🙞🙞🙞🙞🙞

-DATA WAREHOUSE

BUILDING A DATA WAREHOUSE OF BUILDING VIOLATIONS IN THE CITY OF CHICAGO

Students perform: DINH VUONG GIA HUY 20SE5

Instructors: M.S TRAN THANH LIEM

Trang 2

VIETNAM KOREA THE UNIVERSITY OF INFORMATION AND

COMMUNICATION TECHNOLOGY COMPUTER SCIENCE

-🙞🙞🙞🙞🙞

-DATA WAREHOUSE

BUILDING A DATA WAREHOUSE OF BUILDING VIOLATIONS IN THE CITY OF CHICAGO

Students perform: DINH VUONG GIA HUY 20SE5

Instructors: M.S TRAN THANH LIEM

Trang 3

COMMENT (For Instructor)

Da Nang, May 2023 Instructor

THANK YOU

We would like to express our sincere thanks to the teachers of the Department of Computer Science and to everyone who took the time to help us during the

implementation of this thematic project In particular, we would like to thank M.S Tran

Thanh Liem is the person who agreed to direct our topic We are dedicated to helping us

with project information Thanks to that, we have completed our project and most importantly, we have gained experience during the course of implementing the subject project

Trang 4

Although we have prepared the report very carefully, it is inevitable that errors will not be avoided

We look forward to receiving your understanding and suggestions

We sincerely thank you!

MỤC LỤC

Trang 5

DANH MỤC HÌNH ẢNH

Trang 6

CHAPTER 1: INTRODUCE 1.1 Topic introduce

Violations issued by the Department of Buildings from 2006 to the present Lenders and title companies, please note: These data are historical in nature and should not be relied upon for real estate transactions For transactional purposes such as closings, please

Trang 7

consult the title commitment for outstanding enforcement actions in the Circuit Court of Cook County or the Chicago Department of Administrative Hearings Violations are always associated to an inspection and there can be multiple violation records to one inspection record Related Applications: Building Data Warehouse

1.2 Introduce dataset Chicago Building Violations

The Chicago Building Violations dataset provides information about building code violations that have occurred in the city of Chicago This dataset offers valuable insights into the condition of buildings, the enforcement of building regulations, and the efforts made to ensure public safety and compliance with building codes

Here are some key details about the Chicago Building Violations dataset:

Content: The dataset includes detailed information about building code violations in Chicago It covers a wide range of violations, such as structural issues, plumbing problems, electrical hazards, lack of permits, and other violations related to building safety and maintenance

Data Fields: The dataset typically includes information such as ID, VIOLATION LAST MODIFIED DATE, VIOLATION DATE, VIOLATION CODE, VIOLATION STATUS, VIOLATION STATUS DATE, VIOLATION DESCRIPTION, VIOLATION LOCATION, VIOLATION INSPECTOR COMMENTS, VIOLATION ORDINANCE, INSPECTOR ID, INSPECTION NUMBER, INSPECTION STATUS, INSPECTION WAIVED, INSPECTION CATEGORY, DEPARTMENT BUREAU, ADDRESS, STREET NUMBER STREET DIRECTION, STREET NAME, STREET TYPE, PROPERTY GROUP, SSA, LATITUDE, LONGITUDE, LOCATION, Community Areas, Zip Codes Boundaries - ZIP Codes, Census Tracts, Wards, Historical Wards 2003-2015 (whether it has been resolved or is still open)

Sources: The dataset is derived from official records maintained by the City of Chicago, including the Department of Buildings and other relevant authorities responsible for enforcing building codes and regulations

Trang 8

Purpose: The Chicago Building Violations dataset serves multiple purposes It helps city officials and inspectors monitor and enforce compliance with building codes, ensuring the safety and habitability of buildings within the city The dataset also provides valuable information to researchers, analysts, and the general public interested in understanding building conditions, patterns of violations, and trends over time

Analysis and Applications: The dataset can be analyzed to identify areas or types of buildings with a higher frequency of violations, allowing for targeted enforcement efforts

It can also be used to assess the effectiveness of building code regulations, identify areas

in need of improvement, and evaluate the impact of enforcement actions Researchers and analysts can use the dataset to study correlations between building violations and factors such as neighborhood characteristics, property ownership, or economic indicators Accessibility: The availability and accessibility of the dataset may vary It may be accessible through the official website of the City of Chicago or other government data portals Additionally, there might be different versions or subsets of the dataset, each containing specific time frames or types of violations

It's important to note that the specific details and availability of the dataset may change over time Therefore, it's recommended to refer to the official sources or the City

of Chicago's data portal for the most up-to-date and accurate information regarding the Chicago Building Violations dataset

1.3 Tools Used

1.3.1 SQL Server

SQL Server is a relational database management system (RDBMS) developed by Microsoft It is primarily designed and developed to compete with MySQL and Oracle database SQL Server supports ANSI SQL, which is the standard SQL (Structured Query Language) language However, SQL Server comes with its own implementation of the SQL language, T-SQL (Transact-SQL)

MS SQL Server as Client-Server Architecture

Trang 9

Let’s have a look at the below early morning conversation between Mom and her Son, Tom.

Fig 1 Fig 1 MS SQL Server as Client-Server Architecture

Key Components and Services of SQL Server

Below are the main components and services of SQL server:

Database Engine: This component handle storage, Rapid transaction Processing, and Securing Data

SQL Server: This service starts, stops, pauses, and continues an instance of Microsoft SQL Server Executable name is sqlservr.exe.SQL Server Agent: It performs the role of Task Scheduler It can be triggered by any event or as per demand Executable name is sqlagent.exe

SQL Server Browser: This listens to the incoming request and connects to the desired SQL server instance Executable name is sqlbrowser.exe

SQL Server Full-Text Search: This lets user running full-text queries against Character data in SQL Tables.Executable name is fdlauncher.exe

SQL Server VSS Writer: This allows backup and restoration of data files when the SQL server is not running.Executable name is sqlwriter.exe

Trang 10

SQL Server Analysis Services (SSAS): Provide Data analysis, Data mining and Machine Learning capabilities SQL server is integrated with R and Python language for advanced analytics Executable name is msmdsrv.exe

SQL Server Reporting Services (SSRS): Provides reporting features and decision-making capabilities It includes integration with Hadoop Executable name is ReportingServicesService.exe

SQL Server Integration Services (SSIS): Provided Extract-Transform and Load capabilities of the different type of data from one source to another It can be view as converting raw information into useful information Executable name is MsDtsSrvr.exe

SQL SERVER INSTANCE

SQL Server allows you to run multiple services at a go, with each service having separate logins, ports, databases, etc These are divided into two:

Primary Instances

Named Instances

There are two ways through which we may access the primary instance First, we can use the server name Secondly, we can use its IP address Named instances are accessed by appending a backslash and instance name

For example, to connect to an instance named xyx on the local server, you should use 127.0.0.1\xyz From SQL Server 2005 and above, you are allowed to run up to 50 instances simultaneously on a server

Note that even though you can have multiple instances on the same server, only one

of them must be the default instance while the rest must be named instances One can run all the instances concurrently, and each instance runs independent of the other instances

IMPORTANCE OF SQL SERVER INSTANCES

The following are the advantages of SQL Server instances:

1 For installation of different versions on one machine

Trang 11

You can have different versions of SQL Server on a single machine Each installation works independently from the other installations

2 For cost reduction

Instances can help us reduce the costs of operating SQL Server, especially in purchasing the SQL Server license You can get different services from different instances, hence no need for purchasing one license for all services

3 For maintenance of development, production and test environments separately

This is the main benefit of having many SQL Server instances on a single machine You can use different instances for development, production and test purposes

4 For reducing temporary database problems

When you have all services running on a single SQL Server instance, there are high chances of having problems with the problems, especially problems that keep on recurring When such services are run on different instances, you can avoid having such problems

5 For separating security privileges

When different services are running on different SQL Server instances, you can focus on securing the instance running the most sensitive service

6 For maintaining a standby server

A SQL Server instance can fail, leading to an outage of services This explains the importance of having a standby server to be brought in if the current server fails This can easily be achieved using SQL Server instances

SQL Server Management Studio (SSMS)

SQL Server Management Studio (SSMS) is an integrated environment for managing any SQL infrastructure Use SSMS to access, configure, manage, administer, and develop all components of SQL Server, Azure SQL Database Azure SQL Managed,

Trang 12

Instance, SQL Server on Azure VM, and Azure Synapse Analytics SSMS provides a single comprehensive utility that combines a broad group of graphical tools with many rich script editors to provide access to SQL Server for developers and database administrators of all skill levels

1.3.2 Visual Studio 2022

Visual Studio 2022 is the latest major release of Microsoft's integrated development environment (IDE) for building software applications It introduces several new features and improvements aimed at enhancing developer productivity, collaboration, and overall development experience Here's an overview of the key highlights of Visual Studio 2022: 64-bit Architecture: Visual Studio 2022 is now available as a native 64-bit application, providing improved performance and stability With the 64-bit architecture, the IDE can handle larger projects and utilize more system resources, resulting in faster builds and smoother operations

Enhanced Performance: Visual Studio 2022 introduces various performance improvements to make the IDE more responsive and efficient These optimizations include faster startup times, improved load times for large solutions, quicker code navigation, and reduced memory usage

Updated User Interface: The IDE's user interface has undergone a refresh in Visual Studio 2022 It features a cleaner and more modern look with redesigned icons, updated themes, and improved layout management options The new UI provides a refreshed coding experience and improves readability

Productivity Enhancements: Visual Studio 2022 brings several productivity enhancements to help developers write code faster and with fewer distractions Some notable features include improved IntelliSense with AI-driven suggestions, enhanced code search capabilities, customizable code formatting, and improved Git integration for easier version control

Trang 13

Collaboration and Live Share: Visual Studio 2022 expands on the collaboration features introduced in previous versions It includes enhancements to Visual Studio Live Share, allowing developers to collaborate in real-time with teammates, regardless of the programming language or platform Live Share enables shared editing, debugging, and code reviews, making it easier to work together on projects

.NET 6 and MAUI Support: Visual Studio 2022 provides comprehensive support for the latest .NET 6 framework and the Multi-platform App UI (MAUI) framework It includes templates, tools, and debugging capabilities to streamline the development of cross-platform applications targeting Windows, macOS, iOS, and Android

Improved Web Development: Visual Studio 2022 offers enhanced web development capabilities with improved support for front-end frameworks like React, Angular, and Vue.js It includes a new Hot Reload feature that allows developers to instantly view code changes in running applications without restarting or losing application state

Cloud Development: The IDE has strengthened support for cloud development scenarios with improved integration with Azure services Visual Studio 2022 provides streamlined workflows for building, deploying, and debugging cloud-native applications, including support for containers and serverless development

These are just a few of the key highlights of Visual Studio 2022 The new release aims to provide a more efficient, modern, and collaborative development environment for developers across various platforms and programming languages Visual Studio 2022 offers a wide range of tools and features to support the development of desktop applications, web applications, mobile apps, and cloud-based solutions

1.3.3 What is ETL?

ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system

Trang 14

As the databases grew in popularity in the 1970s, ETL was introduced as a process for integrating and loading data for computation and analysis, eventually becoming the primary method to process data for data warehousing projects

ETL provides the foundation for data analytics and machine learning workstreams Through a series of business rules, ETL cleanses and organizes data in a way which addresses specific business intelligence needs, like monthly reporting, but it can also tackle more advanced analytics, which can improve back-end processes or end user experiences ETL is often used by an organization to:

 Extract data from legacy systems

 Cleanse the data to improve data quality and establish consistency

 Load data into a target database

1.3.4 What is OLAP?

OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store

Most business data have multiple dimensions—multiple categories into which the data are broken down for presentation, tracking, or analysis For example, sales figures might have several dimensions related to location (region, country, state/province, store), time (year, month, week, day), product (clothing, men/women/children, brand, type), and more

But in a data warehouse, data sets are stored in tables, each of which can organize data into just two of these dimensions at a time OLAP extracts data from multiple relational data sets and reorganizes it into a multidimensional format that enables very fast processing and very insightful analysis

Ngày đăng: 24/08/2023, 10:23

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w