departure from symmetry and how tall and sharp the central peak of the data is.. Suppose a sample of size n is given with observed values x1,... If the bulk of the data is at the left ri
Trang 1Statistics in Geophysics: Descriptive Statistics II
Steffen Unkel
Department of Statistics Ludwig-Maximilians-University Munich, Germany
Trang 2The numerical summaries presented in this section can besubdivided into measures of location,spread andshape
center.
departure from symmetry and how tall and sharp the central peak of the data is.
Let X be the variable of interest Suppose a sample of size n
is given with observed values x1, , xn
Trang 3The mode of the sample {1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17} is 6.
Given the list of data {1, 1, 2, 4, 4} the mode is not unique
Trang 4The mean is a meaningful measure for metric data.
It is notarobuststatistic, meaning that it is strongly affected
by outliers
Trang 5Equal proportions of the data fall above and below the
median, xmed Formally,
meaningful for variables that possess at least an ordinal scale
of measurement
Trang 6A sample quantile, xp, is a number having the same units asthe data, which exceeds that proportion of the data given bythe subscript p, with 0 < p < 1
Trang 7The empirical varianceof x1, , xn is
˜
s2 = 1n
Trang 8˜sn2 = 1n
Trang 11Interquartile range
The most common resistant measure of dispersion is the
interquartile range(IQR)
The IQR is defined as
IQR = x0.75− x0.25
The IQR is a good index of the spread in the central part of adata set, since it simply specifies the range of the central 50%
of the data
Trang 12Median absolute deviation
The IQR does not make use of a substantial fraction of thedata
The median absolute deviation (MAD)is easiest to
understand by imagining the transformation yi = |xi− x0.5|
values:
MAD = median(yi) = median|xi − x0.5|
The MAD is analogous to computation of the standarddeviation, but using operations that do not emphasizeoutlying data
Trang 13Skewness and kurtosis
Skewness andkurtosismeasures are often used to describeshape characteristics of a distribution
Skewness tells you whether the distribution is symmetric orskewed to one side
If the bulk of the data is at the left (right) and the right (left)tail is longer, we say that the distribution is skewed right (left)
or positively (negatively) skewed
The height and sharpness of the peakrelative to the rest ofthe data are measured by the kurtosis Higher values indicate
a higher, sharper peak; lower values indicate a lower, lessdistinct peak
Trang 14Skewness and kurtosis II
The moment coefficients of skewness, g1, and kurtosis, g2, aretypically defined as
m3/22
m22 − 3 ,where the r th sample central moment of a sample of size n isdefined as
Trang 15Skewness and kurtosis III
To remove the bias in g1 and g2 corrections need to be
G1 = 0 for symmetric distributions; G1 > 0 (G1 < 0) fordistributions that are right-skewed (left-skewed)
G2 = 0 for mesokurtic distributions; G2 > 0 (G2 < 0) fordistributions that are leptokurtic (platykurtic)
Trang 16Graphical summary of location measures
The boxplot, or box-and-whisker plot, is a very widely usedgraphical tool
It is a simple plot of five numbers: the minimum, x(1), thelower quartile, x0.25, the median, x0.5, the upper quartile,
x0.75, and the maximum, x(n)
distribution of the underlying data
Trang 17Temperature in degrees Fahrenheit
(right) maximum temperature data (n = 31)
Trang 18Boxplot: modified version
The following quantities (called fences) can be used foridentifying extreme values in the tails of the distribution:
Outlier detection criteria: A point beyond an inner fence oneither side is considered a mild outlier A point beyond anouter fence is considered an extreme outlier
Trang 21Boxplots for variables by group
Trang 23Pearson correlation
between two variables is needed
product-moment coefficient of linear correlation between twovariables X and Y Formally,
rXY =
1 n−1
i =1(xi − ¯x )(yi − ¯y )q
1 n−1
The heart of the Pearson correlation is the covariance between
X and Y in the numerator The denominator is in effect just
a scaling constant
Trang 24Pearson correlation II
Interpretation:
i =1xi2− n¯x2 Pn
i =1yi2− n¯y2
Trang 25
Spearman rank correlation
Trang 26Spearman rank correlation II
once) all of these equal values are assigned their average rank
Interpretation:
Trang 27Association between categorical variables
Suppose two variables X and Y with observed tuples(x1, y1), , (xn, yn) are given
The k (k ≤ n) different characteristics of X are denoted by
a1, , ak The m (m ≤ n) different characteristics of Y aredenoted by b1, , bm
categorical variables X and Y
Trang 28Association between categorical variables II
Trang 29Association between categorical variables III