1. Trang chủ
  2. » Giáo án - Bài giảng

Statistics in geophysics descriptive statistics

32 262 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 32
Dung lượng 304,17 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Setting the sceneFrequency distributions Variables Types of measurement scales Background Observing systems and computer models in geophysicalsciences produce torrents of numerical data.

Trang 1

Frequency distributions

Statistics in Geophysics: Descriptive Statistics

Steffen Unkel

Department of Statistics Ludwig-Maximilians-University Munich, Germany

Trang 2

Setting the scene

Frequency distributions Variables

Types of measurement scales

Background

Observing systems and computer models in geophysicalsciences produce torrents of numerical data

One important application of statistical ideas is in making

The goal is to extract insights about the processes underlying

describing the main features of a collection of data (sample)

More recently, a collection of summarisation techniques hasbeen formulated under the heading ofexploratory data

Trang 3

Setting the scene

Frequency distributions Variables

Types of measurement scales

Elementary unit and population

Definition: Elementary unit

Objects for which a statistical analysis is desired

Trang 4

Setting the scene

Frequency distributions Variables

Types of measurement scales

Elementary unit and population

Example: Households in Germany

ωi: a household in Germany

Ω: all households in Germany

Population size N: about 40.1 million (as of 2008)

Example: Fish in a lake

ωi: a fish in a lake

Ω: all fish in a lake

Population size: ?

Trang 5

Setting the scene

Frequency distributions Variables

Types of measurement scales

Sample

Definition: Sample

A sample is a subset of the elementary units, drawn from thepopulation by means of a sampling method (e.g randomsample)

Sampling theory is concerned with the selection of a subset ofindividuals from within a statistical population to estimatecharacteristics of the whole population

Sample size: n (n < N)

Statistical analysis of the sample allows us to draw conclusionsabout the population of interest (inferential statistics)

Trang 6

Setting the scene

Frequency distributions Variables

Types of measurement scales

Variable and values of a variable

Definition: Variable or statistical variable

Properties, characteristics or attributes of an elementary unit

Definition: Variable values

The different values a variable can take The values can bequalitative: variable values are not numbers, but may becoded by numerical values Such variables are often calledcategorical

quantitative: variable values are numbers (numerical values)

discrete: finite or countable set of different values

continuous: uncountable set of different values

quasi-continuous: data are continuous but measured in a discrete way

Trang 7

Setting the scene

Frequency distributions Variables

Types of measurement scales

Variable and values of a variable

Examples

Gender: qualitative Coding: 1=male, 2=female

Hair colour: qualitative Coding: 1=red, 2=brown, et ceteraTemperature: quantitative, (quasi-)continuous

Number of car accidents in 2012 in Germany: quantitative,discrete

School grades: qualitative Values: 1,2,3,4,5,6

Trang 8

Setting the scene

Frequency distributions Variables

Types of measurement scalesLevel of measurements

The level at which a variable is measured determines

the choice of numerical summary measuresto describe themain features of the data,

what kind of graphical representationsare useful forexploratory data analysis,

which methods of statistical inferencecan be applied

Trang 9

Setting the scene

Frequency distributions Variables

Types of measurement scalesMeasurement scales

Definition: Nominal scale

Lowest level, unordered set of values

Relation or operation: counting values, equality (=)Units cannot be ordered according to nominal values

No arithmetic operations (addition, substraction, ratio)possible

Definition: Ordinal scale

Ordered set of values

Relation or operation: counting values, order (<)Units can be ordered according to ordinal values

No arithmetic operations (addition, substraction, ratio)possible

Trang 10

Setting the scene

Frequency distributions Variables

Types of measurement scalesMeasurement scales

Definition: Metric scale

Interval scale

All features of ordinal scale

Differences of values are meaningfulZero value arbitrary

Ratio scale

All features of interval scale

Ratios of values are meaningful

Zero value not arbitrary

Trang 11

Setting the scene

Frequency distributions Variables

Types of measurement scalesMeasurement scales

Examples: nominal scale

Hair colour

Gender

Examples: ordinal scale

How often in a week do you eat carrots?

Possible answers: 0 – 1 – 2 – 3 – more than 3 timesSchool grades

Examples: metric scale

Temperature in degrees Celsius (Fahrenheit): interval scaleTemperature in degrees Kelvin: ratio scale

Monthly income of a household: ratio scale

Trang 12

Frequency distributionsAbsolute frequencies

Let X be the variable of interest and suppose a sample of size

n is given with observed values x1, x2, , xn

Count the number of k different variable values (k ≤ n): aj(j = 1, , k)

For each j (j = 1, , k): count the number nj of elementaryunits with variable value aj (Pk

j =1nj = n)

Frequency table of aj and nj for j = 1, , k

Graphical display: Bar chart The x -axis gives the variablevalues aj (ordered if scale is at least ordinal), the bars on the

y -axis have length proportional to nj

Trang 13

Frequency distributionsAbsolute frequencies: Example

Trang 14

Frequency distributionsAbsolute frequencies: Example II

Trang 15

Frequency distributionsRelative frequencies

Given the absolute frequencies divide each nj by the samplesize n: fj = nj/n for j = 1, , k (Pk

j =1fj = 1)

Frequency table of aj, nj and fj for j = 1, , k

Graphical display: Bar chart The x -axis gives the variablevalues aj (ordered if scale is at least ordinal), the bars on the

y -axis have length proportional to fj

Trang 16

Frequency distributionsRelative frequencies: Example

Trang 17

Frequency distributionsRelative frequencies: Example II

Trang 18

Frequency distributionsMetric variables

Bar charts are not useful if k ≈ n

If k ≈ n it may be worth defining classesor intervals

Count how many values fall within the range of each interval.Example: [72, 86], (86, 100], (100, 114], (114, 128]

Graphical displays:

Trang 19

Frequency distributionsHistograms

The number of values falling into each interval is counted

The histogram consists of a series of rectangleswhose

and whose

Usually the widths of the bins are chosen to be equal In thiscase theheights of the histogram bars are proportional to thenumber of counts (absolute or relative frequencies)

If the histogram bins are chosen to have unequal widths, it is

number of counts

Trang 20

Frequency distributionsHistogram: Example

Trang 21

Frequency distributionsHistogram: Example II

Trang 22

Frequency distributionsKernel density smoothing

An alternative to the histogram that produces a smoothresult, iskernel density smoothing

It produces the kernel density estimate, which is a

It is easiest to understand kernel density smoothing as an

Trang 23

Frequency distributionsSome commonly used kernels

Epanechnikov: K (u) = 34(1 − u2) for −1 < u < 1, 0 elsewhereBisquare/Quartic: K (u) = 1516(1 − u2)2 for −1 < u < 1, 0elsewhere

Trang 24

Frequency distributionsKernel density estimate

For data x1, , xn, thekernel density estimateof f (x0) at agiven value x0 is defined as



f (x0) is meant to be thetrue, unknown population density of

X at x0

smoothness of the kernel density estimate

Trang 25

Frequency distributionsKernel density smoothing: Example

Trang 26

Frequency distributionsKernel density smoothing: Example II

Figure: Kernel density estimates for the June temperature data in

Guayaquil, Ecuador (1951-1970) for two different choices of h.

Trang 27

Frequency distributionsEmpirical cumulative distribution function (ECDF)

Sort the different observed values in ascending order:

a(1) < a(2) < · · · < a(k)Compute relative frequencies fa(j ) (j = 1, , k)

Compute cumulative relative frequencies:

fa(1), fa(1)+ fa(2), , fa(1)+ fa(2) + · · · + fa(k)The ECDF is the step functiondefined as

Fn(x ) = X

a(j )≤x

fa(j )

Trang 28

Frequency distributionsECDF: Example

Trang 29

Frequency distributionsECDF: Example II

Trang 30

Frequency distributionsStem-and-leaf display

A stem-and-leaf plot provides the analyst with an initialexposure to the individual data values

In its simplest form, the stem-and-leaf display groups the datavalues according to their all-but-least significant digits

These values are written in either ascending or descendingorder to the left of avertical bar, constituting the “stems”

to the right of the vertical bar, on the same line as the moresignificant digits with which it belongs These least significantvalues constitute the “leaves”

Trang 31

Frequency distributionsStem-and-leaf display: Example

The decimal point is 1 digit(s) to the right of the |

Stem-and-leaf plot for the January 1987 Ithaca maximum

temperatures Separate stems are used for least-significant digitsfrom 0 to 4 and from 5 to 9

Trang 32

Frequency distributionsStem-and-leaf display: Example II

The decimal point is 1 digit(s) to the left of the |

Ngày đăng: 04/12/2015, 17:07

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN