Top
Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of interval variables. The purpose of cluster analysis is to discover a system of organizing observations, usually people, into groups where members of the groups share properties in common. It is cognitively easier for people to predict behavior or properties of people or objects based on group membership, all of whom share similar properties. It is generally cognitively difficult to deal with individuals and predict behavior or properties based on observations of other behaviors or properties.
For example, a person might wish to predict how an animal would respond to an invitation to go for a walk. He or she could be given information about the size and weight of the animal, top speed, average number of hours spent sleeping per day, and so forth and then combine that information into a prediction of behavior. Alternatively, the person could be told that an animal is either a cat or a dog. The latter information allows a much broader range of behaviors to be predicted. The trick in cluster analysis is to collect information and combine it in ways that allow classification into useful groups, such as dog or cat.
Cluster analysis classifies unknown groups while discriminate function analysis classifies known groups. The procedure for doing a discriminate function analysis is well established. There are few options, other than type of output, that need to be specified when doing a discriminate function analysis. Cluster analysis, on the other hand, allows many choices about the nature of the algorithm for combining groups. Each choice may result in a different grouping structure.
It is not the purpose of this chapter to be an extensive presentation of methods of doing a cluster analysis. The purpose of this chapter is to give the student an understanding of how cluster analysis works. The options selected for inclusion in this chapter are designed to be illustrative and easy to compute rather than useful in real work. The interested reader is directed to a more comprehensive work on cluster analysis.
Cluster analysis has proven to be very useful in marketing. Larson (1992) describes the efforts of a company, Clarita's, to cluster neighborhoods (zip codes) into forty different groups based on census information, such as population density, income, and age. These groups were eventually given names like "Blue Blood Estates," "Shotguns and pickups," and "Bohemian Mix." This classification scheme, call PRIZM for Potential Rating Index for Zip Markets, has proven to be very useful in direct mail advertising, radio station formats, and decisions about where to locate stores.
|