Describe relation between 2 qualitative variablesCross table with levels of one variable in rows, levels of the second variable in columns: - The second variable columns is a dependent
Trang 1C Describe relation between 2 qualitative variables
Cross table with levels of one variable in rows, levels of the second variable in columns:
- The second variable (columns) is a dependent
(descriptive, result, output) variable
Trang 2Y(1) Y(2) Y(m) X(1) n 1,1 n 1,2 n 1,m M 1
X(2) n 2,1 n 2,2 n 2,m M 2
X(k) n k,1 n k,2 n k m, M k
K 1 K2 K N m
i
j
Trang 3Table with percentages % across rows: n i j, / M i gives
information about distrribution of “output” variable Y in each level of “input” variable X
Y(1) Y(2) Y(m) X(1) n1,1 / M 1 n1,2 / M 1 n1,m / M 1
X(2) n2,1 / M 2 n2,2 / M 2 n2,m / M 2
X(k) n k,1 / M k n k,2 / M k n k m, / M k
Trang 4Table with percentages % across columns: n i j, / K j gives
information about distrribution of “input” variable X in each level of “output” variable Y
Y(1) Y(2) Y(m) X(1) n1,1 / K 1 n1,2 / K 2 n1,m / K m
X(2) n2,1 / K 1 n2,2 / K 2 n2,m / K m
X(k) n k,1 / K 1 n k,2 / K 2 n k m, / K m
Trang 5Table with percentages % in whole samle: n i j, / N gives information about total distrribution in sample
Y(1) Y(2) Y(m) X(1) n1,1 / N n1,2 / N n1,m / N
X(2) n2,1 / N n2,2 / N n2,m / N
X(k) n k,1 / N n k,2 / N n k m, / N
Trang 6For calculations percentages across rows or across columns
we use in addition the Cells command box and choose
Row or Column respectively
Trang 7D Describe relation between qualitative variable and quantitative variable
Using cross table with columns present the groups determined
by qualitative variable and Mean value of quantitative
variable taken separately in each group:
Information from the table:
- Mean value of variable is highest, lowest in which group
of variable Y
- Difference between mean values of variable X take in
different levels of variable Y , etc
Trang 8Examle:
- Qualitative variable Y : “Economic status of household”
- Quantitative variable X : “Food expenditure per capita
in household”,
3.511 4.808 7.105 8.455 9.650
Remark: In the table, instead of Mean value we can use other
statistical parameters of quantitative variable: Median, Min, Max, Standard Deviantion, Range, etc
Trang 9Using bar chart
Colums of bar chart present statistical parameters of
quantitative variable X (Mean(X), Med(X), Min(X), etc.) in groups of qualitative variable Y:
MAINHA3
4 3
2 1
Trang 10Using box plot
Box plot is using to compare distributions of quantitative variable in different groups of qualitative variable:
137 68
11080 989
N =
MAINHA3
4 3
2 1
8627 6479
1258 12460 1412 709 3856 12592
1929 549 11474 453 11755 9648 45 9646 10717 3909 12565 7483 12391 4824 10516 2008 11708 3716 10880 9638 10675 9114 12797 3837 12181 8722 11696 3938 10973 3562 10241 6508 11759 4729 239 8136 11738 2511 772 11707 4539 12000 3584 12169 9873 12210 12121 11578
3045 12829
1992
2523 13156 4930 13162 7941 4767
Trang 11Using SPSS to describe relation between qualitative
and quantitative variables
Trang 12Char-graph-plot
SPSS : Use command
Graph
Bar
(or Pie or Boxplot … )
Then choose Other summary function, and a suitable
statistical perameter function (mean, median, min, max, etc.)
Trang 13E Describe relation between 2 quantitative variables
For primary describing relation between 2 quantitative
variables we can use dot (scatter) plot, covariance vµ linear correlation coefficient of two quantitative variables
Trang 14(a) Scatter plot For quantitative variables X , Y with sample
E = ( , ),( ,x y1 1 x y2 2), ,( ,x y n n)
Dot plot of sample E is performed by drawing n points
with cordinates ( ,x y : i i)
y
yi
xi
Trang 15(b) Scatter plot can be used to compare several populations: draw several samples (differently coloured) on a common plot
Notes
(a) Scatter plot providing two-dimentional picture of data represents distribution of data In that plot we can see concentration area of data, see if there are somes outliers, abnormal points, etc
Trang 16b) Covariance and linear correlation coefficient
(i) For presentation relationship between two quantitative
a and b the following equality always valids:
Cov(X+a,Y+b) = Cov(X,Y) ,
Trang 17b) Covariance and linear correlation coefficient
ii) Linear correlation coefficient:
r(X,Y) = Cov(X,Y) / ( (X).(Y))
Property:
2) Not depent on measure scale of variables: For all numberr
a , b different from 0 and a.b>0 we have
r(aX,bY) = r(X,Y)
the following is true:
r(X+a,Y+b) = r(X,Y) ,
Trang 18c) If r(X,Y) close to 0 then X and Y are linearly independent, there is not linear relation between them
Linear correlation coefficient measures the linear
dependence between two variables:
- 1 r(X,Y) 1
r(X,Y) = 1 if and only if Y = aX + b with a > 0 ,
r(X,Y) = - 1 if and only if Y = aX + b with a < 0
b) If r(X,Y) close to 1 (or - 1) then X and Y are very strongly related, can have some linear correlation,
Trang 19 (X) < (Y) (X) = (Y) (X) > (Y)
Trang 20 (X) < (Y) (X) = (Y) (X) > (Y)
Trang 21Remarks
1) If E(1) and E(2) are two samples then
From r(E(1)) ~ 1 and r(E(2)) ~ 1 does not imply
Trang 225) From r(X,Y) ~ 1 and r(Y,Z) ~ does not imply
r(X,Z) ~ 1
r(X,Y) = 0 ,
in spite of
r(X,Y) ~ 1 ,
6) From r(X,Y) ~ 0 and r(Y,Z) ~ 0 does not imply
r(X,Z) ~ 0
Trang 23Applets
Ch ¬ng 8.1