TIỂU LUẬN GIỮA kỳ môn xác SUẤT THỐNG kê ỨNG DỤNG CHO CÔNG NGHỆ THÔNG TIN

median function in the statistics module can be used to calculate median value from an unsorted list.. The median value is either contained in the data-set of values provided or it doesn

OPENING

Statistics library in Python

1.1.1 Gererality about Statistics library in Python.

In the era of big data and artificial intelligence, data science and machine learning are pivotal across various scientific and technological fields Effectively working with data requires skills in describing, summarizing, and visually representing data to extract meaningful insights Python statistics libraries are powerful and widely adopted tools that facilitate data analysis and visualization, making them essential for modern data-driven projects.

This module provides functions for calculating mathematical statistics of numeric (Real-valued) data.

The module is not intended to be a competitor to third-party libraries such as

NumPy, SciPy, or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab It is aimed at the level of graphing and scientific calculators.

Descriptive statistics is about describing and summarizing data It uses two main approaches:

- The quantitative approach describes and summarizes data numerically.

- The visual approach illustrates data with charts, plots, histograms, and other graphs.

1.1.2 Some functions relate to Statistisc library

Averages and measures of central location:

- statistics.harmonic_mean(data, weights=None)

- statistics.median_grouped(data, interval=1)

-statistics.multimode(data) Measures of spread:

-statistics.pstdev(data, mu=None) -statistics.pvariance(data, mu=None) -statistics.stdev(data, xbar=None) -statistics.variance(data, xbar=None) Statistics for relations between two inputs:

-statistics.covariance(x, y, /) -statistics.correlation(x, y, /) -statistics.linear_regression(x, y, /, *, proportionalse)

- mean() function can be used to calculate mean/average of a given list of numbers It returns mean of the data set passed as parameters.

The arithmetic mean is a fundamental statistical measure that represents the central location of a dataset by dividing the sum of all data points by the total number of values It provides a simple way to understand the average of a set of numbers, especially when the data varies in range In Python, calculating the arithmetic mean involves summing the dataset and then dividing by the count of data points, making it easy to analyze and interpret data efficiently.

-[data-set] : List or tuple of a set of numbers.

-Returns : Sample arithmetic mean of the provided data-set.

TypeError when anything other than numeric values are passed as parameter.

The fmean() function converts all input data into float data-type before calculating the arithmetic mean, or average, of a given sequence or iterable It ensures accurate computation by standardizing data to floats, and the result is always returned as a float value This function is ideal for efficiently obtaining the mean of numeric datasets in Python programming.

The primary difference between mean() and fmean() functions is that fmean() converts data into floats before computation, ensuring faster processing, whereas mean() does not perform this conversion Additionally, fmean() is optimized for speed and typically runs faster than mean() The syntax for using fmean() is straightforward: fmean([data set]) Understanding this distinction helps improve computational efficiency when calculating averages in Python.

-Parameters:[data-set]: List or tuple of a set of numbers.

-Returns: floating-point arithmetic mean of the provided data.

-Convert data to floats and compute the geometric mean.

-The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).

-Raises a StatisticsError if the input dataset is empty, if it contains a zero, or if it contains a negative value The data may be a sequence or iterable.

-No special efforts are made to achieve exact results (However, this may change in the future.)

-[data-set] : List or tuple of a set of numbers.

-Returns : the geometric mean of the provided data-set.

1.1.2.4 Statistics.harmonic_mean(data, weights=None)

The Harmonic Mean, also called the Contrary Mean, is a type of average and one of the Pythagorean means It is especially useful when calculating average rates or ratios in various applications The harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of a dataset, making it ideal for averaging quantities like speeds, densities, or other rate-based measurements.

Harmonic mean can be incorporated in Python3 by using harmonic_mean() function from the statistics module.

-Syntax : harmonic_mean([data-set])

-Parameters : [data-set]: which is a list or tuple or iterator of real valued numbers. -Returntype : Returns the harmonic_mean of the given set of data.

-Errors and Exceptions :StatisticsError when a empty data-set is passed or if data- set consist of negative values.

TypeError for dataset of non-numeric type values.

- The statistics.median() method calculates the median (middle value) of the given data set This method also sorts the data in ascending order before calculating the median.

- Python is a very popular language when it comes to data analysis and statistics.

Python 3's statistics module offers essential functions such as mean(), median(), and mode() to simplify data analysis The median() function is particularly useful for calculating the median value from an unsorted data list, as it does not require the data to be pre-sorted This feature makes it convenient and efficient to analyze datasets without additional sorting steps.

The median is the value that divides a dataset or probability distribution into two equal halves, with half of the data points falling above it and half below It is commonly regarded as the middle value in a dataset, providing a reliable measure of central tendency, especially in skewed distributions Understanding the median is essential for accurate data analysis and interpretation, making it a key concept in statistics and data science.

The median is a key measure of central tendency in statistics and probability theory, offering a reliable indicator of the dataset's center Unlike the mean, the median is less affected by outliers or extreme values, making it a more robust statistic It is either directly included within the dataset or remains close to the values provided, providing a stable reflection of data distribution regardless of skewness.

For odd set of elements, the median value is the middle one.

-For even set of elements, the median value is the mean of two middle elements. -Median can be represented by the following formula :

The function requires a data-set, which is a list, tuple, or iterable of numeric values, to calculate the median It returns the median, or the middle value, of the provided data set If the iterable is empty or null, a StatisticsError is raised, indicating invalid input This process is essential for statistical analysis, providing a central tendency measure from the given data.

The median is a robust measure of central tendency that is less affected by outliers in a data set In Python's statistics module, there are three functions to compute the median: median(), median_low(), and median_high() The median_low() function always returns a value that is a member of the data set and, in cases with an even number of data points, it returns the smaller of the two middle values When the data set contains an odd number of points, all three functions return the middle value.

- Syntax : median_low( [data-set] )

The function parameter 'data-set' accepts a list, tuple, or any iterable collection of numeric data It calculates and returns the low median of the dataset, which is a value that belongs to the original data overall The low median provides a measure of central tendency, highlighting the middle value in an ordered set of numeric data This function is useful for statistical analysis when identifying the lower middle point of your dataset.

The median is a robust measure of central tendency that is less affected by outliers in data In Python's statistics module, you can use three functions—median(), median_low(), and median_high()—to compute the median or middle value of a data set The median() function returns the middle value when data points are odd in number and the average of the two middle values when even In contrast, median_low() returns the lower middle value, while median_high() returns the higher middle value; notably, the median_high() is always a member of the data set.

-Syntax : median_high( [data – set] )

-Parameters : [data-set] : Takes in a list, or an iterable set of numeric data.

-Returntype : Returns the high median of the numeric data (always in actual data- set).

-median_grouped() function under the Statistics module, helps to calculate median value from a set of continuous data.

The data are grouped into intervals of specific widths, with each data point representing the midpoint of its respective interval This method facilitates the approximation of the true values within the dataset Calculating the median in such grouped data involves identifying the interval containing the median position and interpolating within that interval to determine the median value Grouped data analysis provides an efficient way to summarize large datasets while maintaining accurate statistical insights.

The median is calculated by interpolating within the median interval, which contains the median value, assuming a uniform distribution of true values within that interval The formula used is: median = L + interval * (N / 2 - CF) / FL, where L is the lower limit of the median interval This method provides an accurate estimate of the median in statistical analysis.

N = total number of data points

CF = number of data points below the median interval

F = number of data points in the median interval

-Syntax : median_grouped( [data-set], interval)

A data set, whether a list, tuple, or any iterable containing numeric values, serves as the foundation for analysis The 'interval', which defaults to 1, defines the grouping width of the data and influences how the data is segmented Adjusting the interval not only changes the grouping but also affects the interpolation of the calculated median, thereby impacting the overall data interpretation.

-Returntype : Return the median of grouped continuous data, calculated as 50th percentile.

-Exceptions : StatisticsError is raised when iterable passed is empty or when list is null.

HISTOGRAM EQUALIZATION ALGORITHM

Histogram equalization algorithm

Histogram equalization is a fundamental image processing technique that enhances global contrast by modifying the pixel intensity distribution based on the image histogram This method redistributes pixel brightness levels to improve visual clarity, especially in areas with low contrast In image processing, a histogram represents the frequency of different light levels within an image, providing crucial data for contrast adjustment techniques like histogram equalization Implementing this technique can significantly improve image quality by making details more visible and balanced.

Essentially, histogram equalization works by:

Computing a histogram of image pixel intensities

Evenly spreading out and distributing the most frequent pixel values (i.e., the ones with the largest counts in the histogram)

Giving a linear trend to the cumulative distribution function (CDF)

-The result of applying histogram equalization is an image with higher global contrast.

-We can further improve histogram equalization by applying an algorithm called Contrast Limited Adaptive Histogram Equalization (CLAHE), resulting in higher quality output images.

-Other than photographers using histogram equalization to correct under/over-exposed images, the most widely used histogram equalization application can be found in the medical field.

Histogram equalization is commonly applied to X-ray and CT scans to enhance their contrast This image processing technique improves the visibility of details within radiographs, enabling doctors and radiologists to interpret scans more accurately By increasing contrast, histogram equalization aids in precise diagnosis and better patient outcomes.

By the end of this tutorial, you will learn how to effectively apply both basic histogram equalization and adaptive histogram equalization to images using OpenCV These techniques enhance image contrast and detail, with the algorithm optimized for implementation with the OpenCV library Mastering these methods will enable you to improve image quality efficiently and accurately in your computer vision projects.

Example about Histogram equalization algorithm

-Applying histogram equalization starts by computing the histogram of pixel intensities in an input grayscale/single-channel image:

Our histogram features multiple peaks, demonstrating a concentration of pixels in specific intensity ranges Histogram equalization aims to redistribute these pixels more evenly across all intensity levels, enhancing image contrast By spreading pixel intensities from densely populated bins to sparser ones, this technique improves overall image quality and detail visibility.

-Mathematically, what this means is that we’re attempting to apply a linear trend to our cumulative distribution function (CDF):

The before and after histogram equalization application can be seen in Figure 3:

-With adaptive histogram equalization, we divide an input image into an M x N grid

We then apply equalization to each cell in the grid, resulting in a higher quality output image:

-The downside is that adaptive histogram equalization is by definition more computationally complex (but given modern hardware, both implementations are still quite speedy).

-In this tutorial, you learned how to perform both basic histogram equalization and adaptive histogram equalization with OpenCV.

-Basic histogram equalization aims to improve the global contrast of an image by

“spreading out” pixel intensities often used in the image.

IMPLEMENTATION

Implementation

Instruction for building and running my sourcecode:

Import the library and the image

Build the function of calculating Histogram

My equalize The binary image

The chart show the histogram of binary image and the equalized image.

Tiếng Việt https://docs.python.org/3/library/statistics.html

Tiếng Anh https://www.geeksforgeeks.org

Tiêu đề	Tiểu luận giữa kỳ môn Xác Suất Thống Kê Ứng Dụng Cho Công Nghệ Thông Tin
Tác giả	Lâm Quang Huy
Người hướng dẫn	TS Nguyễn Quốc Bình
Trường học	Trường Đại học Tôn Đức Thắng
Chuyên ngành	Công nghệ Thông tin
Thể loại	Tiểu luận giữa kỳ
Năm xuất bản	2022
Thành phố	Thành phố Hồ Chí Minh

Định dạng
Số trang	49
Dung lượng	9,06 MB