TIỂU LUẬN GIỮA kỳ môn xác SUẤT THỐNG kê ỨNG DỤNG CHO CÔNG NGHỆ THÔNG TIN

median function in the statistics module can be used to calculate median value from an unsorted list.. The median value is either contained in the data-set of values provided or it doesn

Trang 1

TIỂU LUẬN GIỮA KỲ MÔN:

XÁC SUẤT THỐNG KÊ

ỨNG DỤNG CHO CÔNG NGHỆ THÔNG TIN

TIỂU LUẬN GIỮA KỲ

Người hướng dẫn: TS NGUYỄN QUỐC BÌNH

Người thực hiện: LÂM QUANG HUY

Lớp : 21H50201 Khoá : K25

THÀNH PHỐ HỒ CHÍ MINH, NĂM 2022

Trang 2

TỔNG LIÊN ĐOÀN LAO ĐỘNG VIỆT NAM

TRƯỜNG ĐẠI HỌC TÔN ĐỨC THẮNG

KHOA CÔNG NGHỆ THÔNG TIN

TIỂU LUẬN GIỮA KỲ MÔN:

XÁC SUẤT THỐNG KÊ

ỨNG DỤNG CHO CÔNG NGHỆ THÔNG TIN

TIỂU LUẬN GIỮA KỲ

Người hướng dẫn: TS NGUYỄN QUỐC BÌNH

Người thực hiện: LÂM QUANG HUY

Lớp : 21H50201 Khoá : K25

Trang 3

LỜI CẢM ƠN

Em cảm ơn thầy Nguyễn Quốc Bình đã giảng dạy cho em kiến thức về lập trìnhứng dụng xác suất thống kê cũng như đã hướng dẫn em thực hiện bài tiểu luận giữa kỳnày ạ

Trang 4

Ngoài ra, trong luận văn còn sử dụng một số nhận xét, đánh giá cũng như số liệucủa các tác giả khác, cơ quan tổ chức khác đều có trích dẫn và chú thích nguồn gốc.

Nếu phát hiện có bất kỳ sự gian lận nào tôi xin hoàn toàn chịu trách nhiệm

về nội dung luận văn của mình Trường đại học Tôn Đức Thắng không liên quan đến

những vi phạm tác quyền, bản quyền do tôi gây ra trong quá trình thực hiện (nếu có)

TP Hồ Chí Minh, ngày 26 tháng 10 năm 2022

Tác giả

Trang 5

TÓM TẮT

Bài tiểu luận là phần tóm tắt kiến thức mà học sinh học được ở khoảng thời giangiữa kì 1 Về việc áp dụng kiến thức về môn xác suất thống kê đã học ở phần lí thuyếtkết hợp phương pháp lập trình Python đã được học ở lớp thực hành để giải quyết một

số bài toán.Trong đó có cụ thể những nội dung của các nhóm chức năng của mô đunstatistics trong thư viện Python Học sinh thực hiện 2 phần: phần viết code về thuậttoán cân bằng Histogram để xử lí ảnh và phần viết báo cáo (3 chương) Cuối phần tiểuluận là nguồn tài liệu học sinh đã tham khảo để làm tiểu luận

Trang 6

MỤC LỤC

LỜI CẢM ƠN 1

CÔNG TRÌNH ĐƯỢC HOÀN THÀNH TẠI TRƯỜNG ĐẠI HỌC TÔN ĐỨC THẮNG 2

TÓM TẮT 3

MỤC LỤC 4

CHAPTER 1 – OPENING 6

1.1 Statistics library in Python 6

1.1.1 Gererality about Statistics library in Python 6

1.1.2 Some functions relate to Statistisc library 6

1.1.2.1 Statistics.mean(data) 7

1.1.2.2 Statistics.fmean(data) 8

1.1.2.3 statistics.geometric_mean(data) 10

1.1.2.4 Statistics.harmonic_mean(data, weights=None) 11

1.1.2.5 statistics.median(data) 13

1.1.2.6 Statistics.median_low(data) 16

1.1.2.7 Statistics.median_high(data) 18

1.1.2.8 Statistics.median_grouped(data) 19

1.1.2.9 Statistics.mode(data) 22

1.1.2.10 statistics.multimode(data) 24

1.1.2.11 statistics.quantile(data) 25

1.1.2.12 Statistics.pstdev(data, mu=None) 26

1.1.2.13 Statistics pvariance(data, mu=None) 27

1.1.2.14 Statistics.stdev(data, xbar=None) 29

1.1.2.15 Statistics variance(data, mu=None) 31

1.1.2.16 Statistics convariance(x, y, /) 34

1.1.2.17 statistics.correlation(x, y, /) 35

1.1.2.18 statistics.correlation(x, y, /) 36

CHAPTER 2 – HISTOGRAM EQUALIZATION ALGORITHM 38

2.1 Histogram equalization algorithm 38

2.2 Example about Histogram equalization algorithm 39

2.3 My comment, analysis, evaluation 41

CHAPTER 3- IMPLEMENTATION 42

Trang 7

TÀI LIỆU THAM KHẢO 45 PHỤ LỤC 46

Trang 8

CHAPTER 1 – OPENING

1.1 Statistics library in Python

1.1.1 Gererality about Statistics library in Python.

In the era of big data and artificial intelligence, data science and machine learning have become essential in many fields of science and technology A necessary aspect of working with data is the ability to describe, summarize, and represent data visually Python statistics libraries are comprehensive, popular, and widely used tools that will assist you in working with data

This module provides functions for calculating mathematical statistics of

numeric (Real-valued) data.

The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at

professional statisticians such as Minitab, SAS and Matlab It is aimed at the level

of graphing and scientific calculators

Descriptive statistics is about describing and summarizing data It uses two main approaches:

- The quantitative approach describes and summarizes data numerically

- The visual approach illustrates data with charts, plots, histograms, and other graphs

1.1.2 Some functions relate to Statistisc library

Averages and measures of central location:

- statistics.mean(data)

- statistics.fmean(data)

Trang 9

-Syntax : mean([data-set])

-Parameters :

-[data-set] : List or tuple of a set of numbers

-Returns : Sample arithmetic mean of the provided data-set

Trang 11

The only difference in computing mean using mean() and fmean() is that while using fmean() data gets converted to floats whereas in case of mean(), data does not get converted to floats Moreover fmean() function runs faster than the mean() function -Syntax: fmean([data-set}])

-Parameters:[data-set]: List or tuple of a set of numbers

-Returns: floating-point arithmetic mean of the provided data

- Example:

Trang 13

-Parameters :

-[data-set] : List or tuple of a set of numbers

-Returns : the geometric mean of the provided data-set

1.1.2.4 Statistics.harmonic_mean(data, weights=None)

-Harmonic Mean (also known as Contrary mean) is one of several kinds of average and in particular one of the Pythagorean means Usually used in situations when average rates are desired The harmonic mean is also the reciprocal of the arithmetic mean of the reciprocals of a given set of observations

Harmonic mean can be incorporated in Python3 by using harmonic_mean() function from the statistics module

-Syntax : harmonic_mean([data-set])

-Parameters : [data-set]: which is a list or tuple or iterator of real valued numbers -Returntype : Returns the harmonic_mean of the given set of data.

-Errors and Exceptions :StatisticsError when a empty set is passed or if

data-set consist of negative values

TypeError for dataset of non-numeric type values

- Example:

Trang 14

12

Trang 15

1.1.2.5 statistics.median(data)

- The statistics.median() method calculates the median (middle value) of the given data set This method also sorts the data in ascending order before calculating the median

- Python is a very popular language when it comes to data analysis and statistics Luckily, Python3 provide statistics module, which comes with very useful functions like mean(), median(), mode() etc

median() function in the statistics module can be used to calculate median value from

an unsorted list The biggest advantage of using median() function is that the list does not need to be sorted before being sent as parameter to the median() function -Example:

-Median is the value that separates the higher half of a data sample or probability distribution from the lower half For a dataset, it may be thought of as the middle value

Trang 16

The median is the measure of the central tendency of the properties of a data-set in statistics and probability theory Median has a very big advantage over Mean, which is the median value is not skewed so much by extremely large or small values The median value is either contained in the data-set of values provided or it doesn’t sway too much from the data provided

For odd set of elements, the median value is the middle one

-For even set of elements, the median value is the mean of two middle elements -Median can be represented by the following formula :

- Syntax : median( [data-set] )

-Parameters : [data-set] : List or tuple or an iterable with a set of numeric values -Returns : Return the median (middle value) of the iterable containing the data -Exceptions : StatisticsError is raised when iterable passed is empty or when list is

null

Example:

Trang 18

1.1.2.6 Statistics.median_low(data)

- Median is often referred to as the robust measure of the central location and is less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The low median is always a member of the data set When the number of data points is odd, the middle value is returned When it is even, the smaller of the two middle values is returned

- Syntax : median_low( [data-set] )

-Parameters : [data-set] : Takes in a list, tuple or an iterable set of numeric data -Returntype : Returns the low median of numeric data Low median is a member of

actual data-set

Example:

Trang 20

1.1.2.7 Statistics.median_high(data)

-Median is often referred to as the robust measure of the central location and is less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The high median is always a member of the data set.When the number of data points is odd, the middle value is returned When it is even, the larger of the two middle values is returned

-Syntax : median_high( [data – set] )

-Parameters : [data-set] : Takes in a list, or an iterable set of numeric data

-Returntype : Returns the high median of the numeric data (always in actual

data-set)

Example:

Trang 21

1.1.2.8 Statistics.median_grouped(data)

-median_grouped() function under the Statistics module, helps to calculate median value from a set of continuous data

-The data are assumed to be grouped into intervals of width intervals Each data point

in the array is the midpoint of the interval containing the true value The median is

Trang 22

calculated by interpolation within the median interval (the interval containing the median value), assuming that the true values within that interval are distributed

uniformly :

median = L + interval * (N / 2 - CF) / FL = lower limit of the median interval

N = total number of data points

CF = number of data points below the median interval

F = number of data points in the median interval

-Syntax : median_grouped( [data-set], interval)

-Parameters :

[data-set] : List or tuple or an iterable with a set of numeric values

interval (1 by default) : Determines the width of grouped data and changing It will alsochange the interpolation of calculated median

-Returntype : Return the median of grouped continuous data, calculated as 50th

percentile

-Exceptions : StatisticsError is raised when iterable passed is empty or when list is

null

-Example:

Trang 24

- Syntax : mode([data-set])

-Parameters :

[data-set] which is a tuple, list or a iterator of real valued numbers as well as Strings

-Return type :

Returns the most-common data point from discrete or nominal data

-Errors and Exceptions :

Raises StatisticsError when data set is empty

- Example:

Trang 26

1.1.2.10 statistics.multimode(data)

-Sometimes, while working with Python lists we can have a problem in which we need to find mode in list i.e most frequently occurring character But sometimes, we can have more than 1 modes This situation is called multimode Lets discuss certain ways in which this task can be performed

- The function return a list of the most frequently occurring values in the order they were first encountered in the data Will return more than one result if there are multiple modes or an empty list if the data is empty

Example:

Trang 27

1.1.2.11 statistics.quantile(data)

- The function compute the qth quantile of the given data (array elements) along the specified axis Quantile plays a very important role in Statistics when one deals with the Normal Distribution

- Parameters : arr : [array_like]input array q : quantile value axis : [int or tuples of

int]axis along which we want to calculate the quantile value Otherwise, it will considerarr to be flattened(works on all the axis) axis = 0 means along the column and axis = 1 means working along the row out : [ndarray, optional]Different array in which we want to place the result The array must have same dimensions as expected output

Results : qth quantile of the array (a scalar value if axis is none) or array with quantile

values along specified axis

-Example:

Trang 28

-Standard deviation is a measure of how spread out the numbers are.

-A large standard deviation indicates that the data is spread out, - a small standard deviation indicates that the data is clustered closely around the mean

Trang 29

1.1.2.13 Statistics pvariance(data, mu=None)

-The statistics.pvariance() method calculates the variance of an entire population -A large variance indicates that the data is spread out, - a small variance indicates thatthe data is clustered closely around the mean

-Note: To calculate the variance from a sample of data, look at the

Trang 30

28

Trang 31

1.1.2.14 Statistics.stdev(data, xbar=None)

-Statistics module in Python provides a function known as stdev() , which can be used to calculate the standard deviation stdev() function only calculates standard deviation from a sample of data, rather than an entire population

-To calculate standard deviation of an entire population, another function known as pstdev() is used

-Standard Deviation is a measure of spread in Statistics It is used to quantify the measure of spread, variation of a set of data values It is very much similar to variance, gives the measure of deviation whereas variance provides the squared value

-A low measure of Standard Deviation indicates that the data are less spread out, whereas a high value of Standard Deviation shows that the data in a set are spread apart

Trang 32

from their mean average values A useful property of the standard deviation is that, unlike the variance, it is expressed in the same units as the data

-Parameters :

+[data] : An iterable with real valued numbers

+xbar (Optional): Takes actual mean of data-set as value

-Returnype : Returns the actual standard deviation of the values passed as parameter

-Exceptions :

StatisticsError is raised for data-set less than 2 values passed as parameter

-Impossible/precision-less values when the value provided as xbar doesn’t match actual mean of the data-set

- Example:

Trang 33

1.1.2.15 Statistics variance(data, mu=None)

-Statistics module provides very powerful tools, which can be used to compute anything related to Statistics variance() is one such function This function helps to calculate the variance from a sample of data (sample is a subset of populated data)

-variance() function should only be used when variance of a sample needs to be calculated There’s another function known as pvariance(), which is used to calculate the variance of an entire population

- In pure statistics, variance is the squared deviation of a variable from its mean Basically, it measures the spread of random data in a set from its mean or median value A low value for variance indicates that the data are clustered together and are notspread apart widely, whereas a high value would indicate that the data in the given set are much more spread apart from the average value

- Variance is an important tool in the sciences, where statistical analysis of data is common It is the square of standard deviation of the given data-set and is also known

as second central moment of a distribution

- Syntax : variance( [data], xbar )

- Parameters :

+[data] : An iterable with real valued numbers

+xbar (Optional) : Takes actual mean of data-set as value

- Returnype : Returns the actual variance of the values passed as parameter

- Exceptions :

Tiêu đề	Tiểu Luận Giữa Kỳ Môn Xác Suất Thống Kê Ứng Dụng Cho Công Nghệ Thông Tin
Tác giả	Lâm Quang Huy
Người hướng dẫn	TS Nguyễn Quốc Bình
Trường học	Trường Đại Học Tôn Đức Thắng
Chuyên ngành	Xác Suất Thống Kê
Thể loại	Tiểu luận giữa kỳ
Năm xuất bản	2022
Thành phố	Hồ Chí Minh

Định dạng
Số trang	49
Dung lượng	3,03 MB