FUZZY CLASSIFICATION
Classification Methods
Two popular methods of classification
• Classification By Equivalence
Crisp Relations
Fuzzy Relations
• Fuzzy C-means (FCM)
Fuzzy c-means (FCM) is a method of clustering which allows one piece of
data to belong to two or more clusters.
Classification By Equivalence
Crisp Relation
• Crisp relation is defined on the Cartesian product of two universal sets determined as
• The crisp relation R is defined by its membership function
• Here “1” implies complete truth degree for the pair to be in relation and “0” implies no relation.
• Define a set, [xi ] = {xj | (xi, xj ) ∈ R}, as the equivalent class of xi on a universe of
data points, X. This class is contained in a special relation, R, known as an equivalence relation .
• This class is a set of all elements related to xi that have the following properties :
1. xi ∈ [xi ] therefore (xi, xi ) ∈ R
2. [xi ] = [xj ] ⇒ [xi ] ∩ [xj ] = Ø
3.x∈X [x] = X.
• The first property is reflexivity
• The second property indicates that equivalent classes do not overlap
• The third property simply expresses that the union of all equivalent classes exhausts the universe.
• Hence, the equivalence relation R can divide the universe X into mutually exclusive equivalent classes, that is,
X|R = {[x] | x ∈ X}
Fuzzy Relation
• Fuzzy relations are mapping elements of one universe, to those of another universe, Y, through
the Cartesian product of two universes.
• crisp equivalence relations can be used to divide the universe X into mutually exclusive classes.
• In fuzzy relations, for all fuzzy equivalence relations, their λ-cuts are equivalent ordinary
relations.
• Hence, to classify data points in the universe using fuzzy relations, we need to find the associated
fuzzy equivalence relation.
Fuzzy C Means
• Fuzzy c-means (FCM) is a data clustering technique in which a data set is grouped
into N clusters with every data point in the dataset belonging to every cluster to a certain
degree.
• It is frequently used in pattern recognition.
• For example, a data point that lies close to the center of a cluster will have a high degree of
membership in that cluster, and another data point that lies far away from the center of a
cluster will have a low degree of membership to that cluster.
Cluster Analysis
• Cluster analysis is a statistical classification technique in which a set of objects or points with
similar characteristics are grouped together in clusters.
• The aim of cluster analysis is to organize observed data into meaningful structures in order to
gain further insight from them.
Cluster Validity
• cluster validation is used to design the procedure of evaluating the goodness of clustering
algorithm results.
• This is important to avoid finding patterns in a random data, as well as, in the situation where
to compare two clustering algorithms.
Algorithm
• This algorithm works by assigning membership to each data point corresponding to each
cluster center on the basis of distance between the cluster center and the data point.
• More the data is near to the cluster center more is its membership towards the particular
cluster center.
• Clearly, summation of membership of each data point should be equal to one.
Algorithmic steps for Fuzzy c-means
clustering
Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the fuzzy membership 'µij' using:
3) Compute the fuzzy centers 'vj' using:
4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U(k+1) - U(k)|| < β.
where,
‘k’ is the iteration step.
‘β’ is the termination criterion between [0, 1].
‘U = (µij)n*c’ is the fuzzy membership matrix.
‘J’ is the objective function.
Advantages
1) Gives best result for overlapped data set and comparatively better then k-
means algorithm.
2) Unlike k-means where data point must exclusively belong to one cluster center
here data point is assigned membership to each cluster center as a result of which data point
may belong to more then one cluster center.
Disadvantages
1) Apriori specification of the number of clusters.
2) With lower value of β we get the better result but at the expense of more
number of iteration.
3) Euclidean distance measures can unequally weight underlying factors.
THANK YOU