Madhawa's Note: Clustering

Clustering analysis has been a topic of emerging research issue in data mining due its variety of applications. It is broadly use in wide variety of applications, including statistics, image processing, computational biology, mobile communication, medicine and economics. Clustering is a process which partitions a given data set into homogeneous groups based on given features such that similar objects are kept in a group whereas dissimilar objects are in different groups.

It is the most important unsupervised learning problem. It deals with finding structure in a collection of unlabeled data.

Here you can find my Assignment submission for K Means

K Means Code

Properties of K Means

1) The learning algorithm requires prior specification of the number of cluster centers.
2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means will not
be able to resolve that there are two clusters.
3) The learning algorithm is not invariant to non-linear transformations
4) The learning algorithm provides the local optima of the squared error function.
5) Randomly choosing of the cluster center cannot lead us to the fruitful result.
6) Applicable only when mean is defined.
7) Unable to handle noisy data and outliers.
8) Algorithm fails for non-linear data set.
9) Relatively efficient: O(tknd),
where
n is # objects,
k is # clusters,
d is # dimension of each object,
and t is # iterations. Normally, k, t, d << n.
10) Gives best result when data set are distinct or well separated from each other.

Madhawa's Note

Monday, September 15, 2014

Clustering - K Means

Properties of K Means

No comments:

Post a Comment