Slide 1:
Fuzzy c-means clustering
And
Its application in case of forest fires intelligent system Maedeh Zirak Javanmard
M.Sc. Computer Application
Roll No.4812
mdhjavanmard@iasri.res.in
November 2010 Fuzzy c-means clustering
And
Its application in case of forest fires intelligent system Course Seminar II Slide 2:
Clustering is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each cluster share some common attributes Cluster analysis attempts to isolate regions of similarity within a dataset and find the relationships between multiple clusters. The differences among members of a cluster, in terms of their absolute difference from the cluster’s calculated center or centroid, define a metric of compactness and homogeneity Slide 3:
Cluster Analysis Crisp Clustering Technique Fuzzy Clustering Technique Density Based Algorithm Hierarchical Algorithm Partitional Algorithm Grid Based Algorithm Model Based Method Various algorithmic approaches that perform group of cases in various clusters based on specific attributes Slide 4:
Attempts to cluster data by grouping related attributes in uniquely defined clusters. Each data point in the dataset is assigned to only one cluster. In partitioning the data, only the centers of the clusters are moved and condition of all the data points are fixed clustering is an iterative process of finding better and better cluster centers. Slide 5:
Cluster centers are randomly initialized and data point (xi) assigned into clusters (Cj , j = 1 to k). Distance metric measures (Euclidean distance wildly use) calculate how far away a point is from a cluster center. When all data points have been assigned to clusters, new cluster centers (centroids) are calculated. The process of calculating cluster memberships and recalculating cluster centers continues until the cluster centers no longer change from one cycle to the next. Thus, the cluster centers were stabled on the final In the typical crisp cluster analysis the identification of the clusters requires the determination of the relation “belongs to”, “does not belong to” and also the estimation of the cluster centre. The “belongs to” relation between a data point and a crisp cluster might have two potential values, 1 or 0. There is no overlapping in clusters, so there are no data points belonging to several clusters at the same time C1: 1
C2: 0 C1 C2 Slide 6:
C1 C2 ? Some data can be assigned to the cluster C1 and C2, the middle data point has an ambiguous status with respect to class membership. Slide 7:
Introduced by Zadeh in 1969 to overcome the idea that all things can be absolutely T or F. Zadeh was the first to claim that something can be 0.70 true. According to fuzzy Algebra every element of the universe can belong to any fuzzy set with a degree of membership that varies from 0 to 1 taking real values. if an element of the universe belongs to a FS with a degree of μ1 then it belongs to its complement with a degree of 1 − μ1 Slide 8:
Degree of membership Member of cluster 1 0 Fuzzy clustering is an approach operating towards fuzzy logic and it provides the flexible method of assigns the data points to the clusters. Slide 9:
“linguistic” terms like “much”, “much more”, “less”, “more or less”, “more than” and others can use in fuzzy clustering. data points are given partial degree of membership in multiple nearby clusters. Central point in the fuzzy clustering is always no unique partitioning of the data in a collection of clusters. In this membership value is assign to the each cluster. Sometimes this membership has been used to decide whether the data points belong to the cluster or not Slide 10:
One dimention data point Crisp clustering of data poin 1 A B 0.2 Fuzzy clustering of data poin Slide 11:
The most well-known fuzzy clustering algorithm is fuzzy c-means, a modification by Bezdek of an original crisp clustering methodology. Bezdek introduced the idea of a fuzzification parameter (m) in the range [1, n], which determines the degree of fuzziness in the clusters. Slide 12:
The most well-known fuzzy clustering algorithm is fuzzy c-means, a modification by Bezdek of an original crisp clustering methodology. Bezdek introduced the idea of a fuzzification parameter (m) in the range [1, n], which determines the degree of fuzziness in the clusters. When m = 1 the effect is a crisp clustering of points.
when m > 1 the degree of fuzziness among points in the decision space increases. Slide 13:
the membership value for each sample point using the cluster centers. the cluster centers using all new membership values. When the cluster centers stabilize the clustering algorithm is finished. Step 1 Step 2 Slide 14:
is the ith data point The centroid of a fuzzy cluster ( j = 1, 2, . . ., p). This value is repeatedly calculated by the algorithm The distance of the ith data point from the jth cluster center with using the Euclidean distance. Slide 15:
The number of fuzzy clusters specified as part of the algorithm. A fuzzification parameter A fuzzy membership qualification indicating the membership of sample xi to the jth cluster. ` Slide 16:
FCC applied on public database of automobile property information.
clustering is done in two attributes ACCEL (acceleration) and WGT (weight)
m= 1.25
P= 2 Slide 17:
Step 1:
Randomly initializing the cluster center Slide 18:
Step2:
Creating distance matrix
from a point xi to each of the cluster
centers to with taking the Euclidean
distance between the point and the cluster center. Slide 19:
Step3: Creating membership matrix
takes the fractional distance from the point to the cluster center and makes this a fuzzy measurement by raising the fraction to the inverse fuzzification parameter.
This is divided by the sum of all fractional distances, thereby ensuring that the sum of all memberships is 1. Slide 20:
Step3: Creating membership matrix
Fuzzy c-means imposes a direct constraint on the fuzzy membership function associated with each point, as follows. The total membership for a point in sample or decision space must add to 1. Slide 21:
Step3:
Generating new centroied for each cluster Slide 22:
Step3:
Generating new centroied for each cluster
With iteration all this step optimize cluster centers will generate. Slide 23:
Step4:
WeightxAcceleration Cluster Assignments Slide 24:
Rational consideration of the problem is in Greece there are 5–10 areas that are severely burned down (in terms of burned hectares). The civil protection authority would manage to design an effective protection and prevention policy if they could spot as many of them as possible before the start of the fire season . From a conceptual point of view, in risk problems, fuzzy clustering performs more flexible than pure statistical k-means, because it offers an additional differentiation of the areas that belong to the same characteristic cluster. Slide 25:
For example, the k-means approach might determine that areas a1, a2, a5, a6, a7 belong to the “Very Risky” cluster. On the other hand, fuzzy c-means offers the additional production of the DOM. though the areas under examination have been assigned the “Very Risky” cluster, they are not considered by the fuzzy c-means algorithm as having the same risk. This is due to their differentiation based on their DOM. This means that even cases of the same cluster can be ranked accordingly. Slide 26:
L.S. lliadis and et al. in Greece ,GFD, have developed intelligent fuzzy information system that uses fuzzy set theory operators and fuzzy c-means clustering algorithm . This integrated model performs fuzzy clustering introducing two risk degree indices, “high risk” and ”moderate risk” based on historical data. The system has been tested and has showed with remarkable results . Slide 27:
After applying FCC in Greek forest department (GFD) historical data the system output the coordinates of the cluster centers.
This is done for each year separately. The cluster centers are used for the determination of its corresponding linguistic which of course describes the proper degree of risk. Slide 28:
k-means algorithm applied in same data, it is a fact that the crisp k-means analysis does not make any distinction between the GFD of the most risky cluster. They all belong to the most risky cluster either with a DOM of 1 or 0. Slide 29:
Restriction the sum of the cluster memberships for a sample data point must equal 1 Assign high degrees of membership to outlier points. Outliers are obviously outside the cluster but fuzzy c-means algorithm calculates high membership degrees in clusters than is warranted by their locations in the dataset Slide 30:
[1] L.S. Iliadis M. Vangeloudha and S. Spartalisb, An intelligent system employing an enhanced fuzzy c-means clustering model, Application in the case of forest fires: Computers and Electronics in Agriculture 70 ,276–284, 2010.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Fuzzy Sets and Systems 13 (2), 153-167, 1984.
[3] E. Cox, Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration, Elsevier Inc., USA, 2005.
[4] A Tutorial on Clustering Algorithms, available online at:
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html Slide 31:
[5] S. Akthar, Sk.Md.Rafiet, Improving the software architecture therough fuzzy clustering technique, Indian Journal of Computer Science and Engineering 1 (1), 54-57.
[6] L. Iliadis, Application of expert systems in the problem of forest fires. Ph.D. Thesis. Aristotle University of Thessaloniki, Greece, 1998.
[7] L. Iliadis, A decision support system applying an integrated fuzzy model for long-term forest fire risk estimation, Environmental Modelling and Software 20 (5), 613-621, 2005.
[8] Clustering methods, available online at: http://www.cis.hut.fi/~sami/thesis/node9.html
[9] Pattern recognition, available online at:
http://pami.uwaterloo.ca/tizhoosh/pattern_recognition.htm Slide 32:
Thank you