Chapter 20 - Discriminant, factor and cluster analysis. In this chapter, the following content will be discussed: Discriminant analysis, objectives of discriminant analysis, basic concept, discriminant function, discriminant function – a graphical illustration,...
Trang 2Discriminant, Factor and Cluster
Analysis
Trang 4Analysis
• Determining linear combinations of the predictor variables to
separate groups by measuring betweengroup variation relative to withingroup variation
• Developing procedures for assigning new objects, firms, or
individuals, whose profiles, but not group identity are known, to one
of the two groups
• Testing whether significant differences exist between the two groups based on the group centroids
• Determining which variables count most in explaining intergroup differences
Trang 5usual value of C is
where X1 and XII are the mean values for the two groups, respectively.
Basic Concept
Trang 7A Graphical Illustration
Trang 9• Null Hypothesis: In the population, the group means the discriminant function are equal
Ho : μA = μB
• Generally, predictors with relatively large standardized
coefficients contribute more to the discriminating power of the function
• Canonical or discriminant loadings show the variance that
Trang 10Holdout Method
• Uses part of sample to construct classification rule; other subsample used for validation
• Uses classification matrix and hit ratio to evaluate groups
classification
• Uses discriminant weights to generate discriminant scores for cases in subsample
Trang 11U method or Cross Validation
• Uses all available data without serious bias in estimating error rates
• Estimated classification error rates
P1 = m1/ n1 P2 = m2 / n2
where m1 and m2 = number of sample observations
misclassified in groups G1 and G2
Trang 121. Form groups
2 Estimate discriminant function
3 Determine significance of function and variables
4 Interpret the discriminant function
5 Perform classification and validation
Trang 14Multiple Discriminant Analysis
Trang 15Multiple Discriminant Analysis
Trang 16Multiple Discriminant Analysis
Trang 19Factor Analysis Example
Trang 21Factor
▫A variable or construct that is not directly
observable but needs to be inferred from the input variables
▫All included factors (prior to rotation) must explain
at least as much variance as an “average variable”
Eigenvalue Criteria
▫Represents the amount of variance in the original variables that is associated with a factor
▫Sum of the square of the factor loadings of each
variable on a factor represents the eigenvalue
Marketing Research 12th Edition /
Trang 25§ Solutions generated by factor analysis for a data set.
Trang 26▫ Each factor tends to load high (1 or 1) on a smaller number of variables and low, or very low (close to zero), on other variables, to make interpretation of the resulting factors easier.
▫ The variance explained by each unrotated factor is simply rearranged by the
rotation, while the total variance explained by the rotated factors still remains the same.
▫ The first rotated factor will no longer necessarily account for the maximum variance and the amount of variance each factor accounts for has to be recalculated.
▫ The factors are rotated for better interpretation, such that the orthogonality is not
Trang 27correlation matrix
Trang 28Common Factor Analysis – Results (Contd.)
Trang 29Common Factor Analysis Results
Trang 30Common Factor Analysis – Results (Contd.)
Trang 32Steps in Cluster Analysis
Trang 33Hierarchical Clustering
▫ Can start with all objects in one cluster and divide and subdivide them until all objects are in their own singleobject cluster ( ‘topdown’ or decision approach)
▫ Can start with each object in its own singleobject cluster and systematically combine clusters until all objects are in one cluster (‘bottomup’ or agglomerative approach)
Nonhierarchical Clustering
▫ Permits objects to leave one cluster and join another as clusters are being formed
▫ A cluster center is initially selected and all the objects within a prespecified threshold distance are included in that cluster
Trang 36(the point whose coordinates are the means of all the observations in the cluster)
Trang 37Example
Trang 38(Contd.)A dendrogram for hierarchical clustering of bank data
Trang 39(Contd.)
Trang 40▫ Objects can be later reassigned to clusters on the basis of optimizing
Trang 41Nonhierarchical Cluster Analysis Example
Trang 42Analysis – Example (Contd.)
Trang 43Analysis – Example (Contd.)
Trang 44Analysis – Example (Contd.)
Trang 46• Apply two or more different clustering approaches to same data or use different distance measures and compare the results.
• Split the data randomly into two halves and perform clustering on each half and then examine the average profile values of each cluster across sub samples.
• Delete various columns (variables) from the original data, compute dissimilarity measures across remaining variables and compare these results with the results obtained using full set.
• Using simulation procedures create a data set with the properties matching the overall properties of the original data but containing no clusters. Use the same clustering method
Trang 47• Assumptions
▫ The basic measure of similarity on which the clustering is based is a valid measure of the similarity between the objects
Trang 48End of Chapter Twenty