Biostatistics © 2006 1 Sinh thống kê GS TS Lê Hoàng Ninh © 2006Evidence based Chiropractic 2 Dịnh nghỉa một số thuật ngữ trong sinh thống kê • Dữ liệu – Số đo hay quan sát một biến số • Biến số – Đặc[.]
Trang 1Sinh thống kê
GS TS Lê Hoàng Ninh
Trang 2© 2006 Evidence-based Chiropractic 2
– Đặc trưng được khảo sát đo đạt
– Có thể có nhiều trị số khác nhau từ đối tượng nầy đến đối tượng khác
Trang 3Định nghĩa từ dùng trong
thống kê
• Biến số độc lập
– Có trước biến số phụ thuộc; căn nguyên/
nguyên nhân của một hệ quả nào đó
– Thuốc lá -> ung thư phổi
– Thuốc A -> khỏi bệnh
• Biến số phụ thuộc:
– Số đo hệ quả,/ kết cuộc
– Trị số phụ thuộc và biến độc lập
Trang 4© 2006 Evidence-based Chiropractic 4
Trang 5Quần thể
• Quần thể là tập hợp các cá thể mà mẫu được lấy ra
– e.g., headache patients in a chiropractic
office; automobile crash victims in an
emergency room
• Trong nghiên cứu, không thể đo đạt khảo sát
trên toàn bộ quần thể
• Do vậy cần phải lấy mẫu ( tổ hợp con của quần thể)
Trang 6© 2006 Evidence-based Chiropractic 6
Mẫu ngẫu nhiên
• Các đối tượng được lấy ra từ quần thể để sao
cho các cá thể có cơ hội như nhau được chọn ra
• Mẫu ngẫu nhiên thì đại diện cho quần thể
• Mẫu không ngẫu nhiên thì không đại diện
– May be biased regarding age, severity of the condition, socioeconomic status etc
Trang 7Mẫu ngẫu nhiên
• Mẫu ngẫu nhiên hiếm có trong các nghiên cứu chăm sóc bệnh nhân
• Thay vào đó, dùng phân phối ngẫu nhiên vào 2 nhóm điều trị và nhóm chứng
– Each person has an equal chance of being assigned to either of the groups
• Phân phối ngẫu nhiên vào các nhóm =
randomization
Trang 8© 2006 Evidence-based Chiropractic 8
• Cách tóm tắt dữ liệu
• Minh họa bộ dữ liệu = shape, central tendency, and variability of a set of data
– The shape of data has to do with the
frequencies of the values of observations
Trang 9• Thống kê mô tả khác biệt với thống kê suy lý
– Thống kê mô tả không thể kiểm định giả
thuyết
Trang 10© 2006 Evidence-based Chiropractic 10
• Distribution provides a summary of:
– Frequencies of each of the values
Trang 12© 2006 Evidence-based Chiropractic 12
PHÂN PHỐI TẦN SỐ ĐƯỢC BIỂU
THỊ BẰNG histogram
Trang 13Histograms (cont.)
• A histogram is a type of bar chart, but there are
no spaces between the bars
• Histograms are used to visually depict frequency distributions of continuous data
• Bar charts are used to depict categorical
information
– e.g., Male–Female, Mild–Moderate–Severe, etc
Trang 14© 2006 Evidence-based Chiropractic 14
SỐ ĐO KHUYNH HƯỚNG
TRUNG TÂM
• Số trung bình
– The most commonly used DS
• Tính số trung bình
– Add all values of a series of numbers and
then divided by the total number of elements
Trang 15• X is a command that adds all of the X values
• n is the total number of values in the series of a sample and
N is the same for a population
X = Σ
Trang 16© 2006 Evidence-based Chiropractic 16
Số đo trung tâm
• Mode
– The most frequently
occurring value in a
series
– The modal value is
the highest bar in a
histogram
Mode
Trang 17Số đo trung tâm
• Trung vịn
– The value that divides a series of values in
half when they are all listed in order
– When there are an odd number of values
• The median is the middle value
– When there are an even number of values
• Count from each end of the series toward the middle and then average the 2 middle values
Trang 18© 2006 Evidence-based Chiropractic 18
Số đo trung tâm
• Each of the three methods of measuring central tendency has certain advantages and
disadvantages
• Which method should be used?
– It depends on the type of data that is being
analyzed
– e.g., categorical, continuous, and the level of measurement that is involved
Trang 19Cấp độ số đo
• There are 4 levels of measurement
– Nominal, ordinal, interval, and ratio
1 Nominal
– Data are coded by a number, name, or letter
that is assigned to a category or group – Examples
• Gender (e.g., male, female)
• Treatment preference (e.g., manipulation,
mobilization, massage)
Trang 20© 2006 Evidence-based Chiropractic 20
Cấp độ số đo
2 Ordinal
– Is similar to nominal because the
measurements involve categories– However, the categories are ordered by rank– Examples
• Pain level (e.g., mild, moderate, severe)
• Military rank (e.g., lieutenant, captain, major,
colonel, general)
Trang 21Cấp độ số đo
• Ordinal values only describe order, not quantity
– Thus, severe pain is not the same as 2 times mild pain
• The only mathematical operations allowed for nominal and ordinal data are counting of
categories
– e.g., 25 males and 30 females
Trang 22© 2006 Evidence-based Chiropractic 22
Cấp độ số đo
3 Khoảng
– Measurements are ordered (like ordinal
data) – Have equal intervals
– Does not have a true zero
– Examples
• The Fahrenheit scale, where 0° does not
correspond to an absence of heat (no true zero)
• In contrast to Kelvin, which does have a true zero
Trang 23Cấp độ số đo
4 Ratio
– Measurements have equal intervals
– There is a true zero
– Ratio is the most advanced level of
measurement, which can handle most types
of mathematical operations
Trang 24© 2006 Evidence-based Chiropractic 24
Levels of measurement (cont.)
• Ratio examples
– Range of motion
• No movement corresponds to zero degrees
• The interval between 10 and 20 degrees is the same as between 40 and 50 degrees
– Lifting capacity
• A person who is unable to lift scores zero
• A person who lifts 30 kg can lift twice as much as one who lifts 15 kg
Trang 25Levels of measurement (cont.)
• NOIR is a mnemonic to help remember the
names and order of the levels of measurement
– Nominal
Ordinal
Interval
Ratio
Trang 26© 2006 Evidence-based Chiropractic 26
Cấp độ số đo
Measurement scale Permissible mathematic operations central tendency Best measure of
Ordinal Greater or less than operations Median Interval Addition and subtraction Symmetrical – MeanSkewed – MedianRatio multiplication and division Addition, subtraction, Symmetrical – MeanSkewed – Median
Trang 27Hình dạng bộ dữ liệu
• Histograms of frequency distributions have
shape
• Distributions are often symmetrical with most
scores falling in the middle and fewer toward the extremes
• Most biological data are symmetrically
distributed and form a normal curve (
bell-shaped curve)
Trang 28© 2006 Evidence-based Chiropractic 28
Trang 29Phân phối bình thường
• The area under a normal curve has a normal
distribution ( Gaussian distribution)
• Properties of a normal distribution
– It is symmetric about its mean
– The highest point is at its mean
Trang 30© 2006 Evidence-based Chiropractic 30
The normal distribution (cont.)
Mean
A normal distribution is symmetric about its mean
As one moves away from
the mean in either direction
the height of the curve
decreases, approaching,
but never reaching zero
As one moves away from
the mean in either direction
the height of the curve
decreases, approaching,
but never reaching zero
The highest point of the overlying
normal curve is at the mean
The highest point of the overlying
normal curve is at the mean
Trang 31The normal distribution (cont.)
Mean = Median = Mode
Trang 32© 2006 Evidence-based Chiropractic 32
Phân phối lệch (Skewed
– A small number of extreme values are located
in the limits of the opposite end
Trang 33Phân phối lệch
• Skew is always toward the direction of the longer tail
– Positive if skewed to the right
– Negative if to the left
The mean is shifted the most
Trang 34© 2006 Evidence-based Chiropractic 34
Phân phối lệch Skewed
– It will be the central point of any distribution
– 50% of the values are above and 50% below the median
Trang 35Những tính chất đường cong
bình thường
• About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean
• About 95.5% is within two SDs
• About 99.7% is within three SDs
Trang 36© 2006 Evidence-based Chiropractic 36
More properties
of normal curves (cont.)
Trang 37Độ lệch chuẩn (SD)
• SD is a measure of the variability of a set of data
• The mean represents the average of a group of scores, with some of the scores being above the mean and some below
– This range of scores is referred to as
variability or spread
• Variance (S2) is another measure of spread
Trang 38© 2006 Evidence-based Chiropractic 38
Trang 39SD (cont.)
Ages are spread out along an X axis
Ages are spread
out along an X axis
The amount ages are
spread out is known as
dispersion or spread
The amount ages are
spread out is known as
dispersion or spread
Trang 40© 2006 Evidence-based Chiropractic 40
Distances ages deviate above
and below the mean
Adding deviations always equals zero
Adding deviations always equals zero
Etc.
Trang 41• However, the total always equals zero
– Values must first be squared, which cancels the negative signs
Trang 42© 2006 Evidence-based Chiropractic 42
S2 is not in the same units (age), but SD is
Trang 43Wide spread results in higher SDs
narrow spread in lower SDs
Trang 44© 2006 Evidence-based Chiropractic 44
Spread is important when comparing 2 or more group means
It is more difficult to
see a clear distinction
between groups
in the upper example
because the spread is
wider, even though the
means are the same
Trang 45• The number of SDs that a specific score is
above or below the mean in a distribution
• Raw scores can be converted to z-scores by
subtracting the mean from the raw score then dividing the difference by the SD
Trang 46© 2006 Evidence-based Chiropractic 46
z-scores (cont.)
• Standardization
– The process of converting raw to z-scores – The resulting distribution of z-scores will always have a mean of zero, a SD of one, and
an area under the curve equal to one
• The proportion of scores that are higher or lower than a specific z-score can be determined by
referring to a z-table
Trang 47z-scores (cont.)
Refer to a z-table
to find proportion under the curve
Refer to a z-table
to find proportion under the curve
Trang 48© 2006 Evidence-based Chiropractic 48
z-scores (cont.)
Partial z-table (to z = 1.5) showing proportions of the
area under a normal curve for different values of z.
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319