1. Trang chủ
  2. » Giáo án - Bài giảng

Statistics in geophysics descriptive statistics II

29 197 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 29
Dung lượng 274,84 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

departure from symmetry and how tall and sharp the central peak of the data is.. Suppose a sample of size n is given with observed values x1,... If the bulk of the data is at the left ri

Trang 1

Statistics in Geophysics: Descriptive Statistics II

Steffen Unkel

Department of Statistics Ludwig-Maximilians-University Munich, Germany

Trang 2

The numerical summaries presented in this section can besubdivided into measures of location,spread andshape

center.

departure from symmetry and how tall and sharp the central peak of the data is.

Let X be the variable of interest Suppose a sample of size n

is given with observed values x1, , xn

Trang 3

The mode of the sample {1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17} is 6.

Given the list of data {1, 1, 2, 4, 4} the mode is not unique

Trang 4

The mean is a meaningful measure for metric data.

It is notarobuststatistic, meaning that it is strongly affected

by outliers

Trang 5

Equal proportions of the data fall above and below the

median, xmed Formally,

meaningful for variables that possess at least an ordinal scale

of measurement

Trang 6

A sample quantile, xp, is a number having the same units asthe data, which exceeds that proportion of the data given bythe subscript p, with 0 < p < 1

Trang 7

The empirical varianceof x1, , xn is

˜

s2 = 1n

Trang 8

˜sn2 = 1n

Trang 11

Interquartile range

The most common resistant measure of dispersion is the

interquartile range(IQR)

The IQR is defined as

IQR = x0.75− x0.25

The IQR is a good index of the spread in the central part of adata set, since it simply specifies the range of the central 50%

of the data

Trang 12

Median absolute deviation

The IQR does not make use of a substantial fraction of thedata

The median absolute deviation (MAD)is easiest to

understand by imagining the transformation yi = |xi− x0.5|

values:

MAD = median(yi) = median|xi − x0.5|

The MAD is analogous to computation of the standarddeviation, but using operations that do not emphasizeoutlying data

Trang 13

Skewness and kurtosis

Skewness andkurtosismeasures are often used to describeshape characteristics of a distribution

Skewness tells you whether the distribution is symmetric orskewed to one side

If the bulk of the data is at the left (right) and the right (left)tail is longer, we say that the distribution is skewed right (left)

or positively (negatively) skewed

The height and sharpness of the peakrelative to the rest ofthe data are measured by the kurtosis Higher values indicate

a higher, sharper peak; lower values indicate a lower, lessdistinct peak

Trang 14

Skewness and kurtosis II

The moment coefficients of skewness, g1, and kurtosis, g2, aretypically defined as

m3/22

m22 − 3 ,where the r th sample central moment of a sample of size n isdefined as

Trang 15

Skewness and kurtosis III

To remove the bias in g1 and g2 corrections need to be

G1 = 0 for symmetric distributions; G1 > 0 (G1 < 0) fordistributions that are right-skewed (left-skewed)

G2 = 0 for mesokurtic distributions; G2 > 0 (G2 < 0) fordistributions that are leptokurtic (platykurtic)

Trang 16

Graphical summary of location measures

The boxplot, or box-and-whisker plot, is a very widely usedgraphical tool

It is a simple plot of five numbers: the minimum, x(1), thelower quartile, x0.25, the median, x0.5, the upper quartile,

x0.75, and the maximum, x(n)

distribution of the underlying data

Trang 17

Temperature in degrees Fahrenheit

(right) maximum temperature data (n = 31)

Trang 18

Boxplot: modified version

The following quantities (called fences) can be used foridentifying extreme values in the tails of the distribution:

Outlier detection criteria: A point beyond an inner fence oneither side is considered a mild outlier A point beyond anouter fence is considered an extreme outlier

Trang 21

Boxplots for variables by group

Trang 23

Pearson correlation

between two variables is needed

product-moment coefficient of linear correlation between twovariables X and Y Formally,

rXY =

1 n−1

i =1(xi − ¯x )(yi − ¯y )q

1 n−1

The heart of the Pearson correlation is the covariance between

X and Y in the numerator The denominator is in effect just

a scaling constant

Trang 24

Pearson correlation II

Interpretation:

i =1xi2− n¯x2 Pn

i =1yi2− n¯y2

Trang 25

Spearman rank correlation

Trang 26

Spearman rank correlation II

once) all of these equal values are assigned their average rank

Interpretation:

Trang 27

Association between categorical variables

Suppose two variables X and Y with observed tuples(x1, y1), , (xn, yn) are given

The k (k ≤ n) different characteristics of X are denoted by

a1, , ak The m (m ≤ n) different characteristics of Y aredenoted by b1, , bm

categorical variables X and Y

Trang 28

Association between categorical variables II

Trang 29

Association between categorical variables III

Ngày đăng: 04/12/2015, 17:07

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN