Print This Page
Jessica Franzén
-- licentiate thesis --
On Cluster Analysis - A Bayesian and Model-Based Approach
Abstract
Cluster analysis is the automated search for homogenous and cohesive groups in
a given data set. Traditional cluster analysis is based on deterministic methods
which use measures between objects and objects and centroids to create well separated
groups. Despite considerable research, there is little guidance how to handle
practical questions such as how many clusters there are and how to handle outliers
objects. A model-based approach to cluster analysis is presented. As opposed to
the mechanical classification used in deterministic clustering, we regard observations
as outcomes of different distributions. A finite mixture model is used, where
each probability distribution corresponds to a cluster. This approach opens up for
new possibilities. The model is capable to handle groups of different sizes, shapes,
and directions by allowing for different distributions and parametrization among
clusters. In reality, clusters do seldom appear as well separated. The method handles
overlapping groups, by taking into account cluster membership probabilities
in these areas. In many data sets there are objects not suitable for classification.
A special approach of this thesis is to create a deviant cluster of larger variance,
consisting of these outlier objects. Bayesian inference via Gibbs sampling is used to
estimate distribution parameters and proportions between clusters. The method is
tested on simulated and real data sets and shows promising results. Model selection
by an approximation of Bayes factors is applied, with the purpose of selecting the
number of clusters and to decide if a deviant group is to prefer in the model.
Download Introduction and Summary of Reports -->>
Download report 1: Bayesian Inference for a Mixture Model using Gibbs Sampler -->>
Download report 2: Model-Based Cluster Analysis - Classification of Twelve Year Old Children with a Deviant Group -->>
|