Decision Tree: Outline :Babu Ram Dawadi 1 Decision Tree: Outline Decision tree representation
ID3 learning algorithm
Entropy, information gain
Overfitting
Defining the Task :Babu Ram Dawadi 2 Defining the Task Imagine we’ve got a set of data containing several types, or classes.
E.g. information about customers, and class=whether or not they buy anything.
Can we predict, i.e classify, whether a previously unseen customer will buy something?
An Example Decision Tree :3 An Example Decision Tree We create a ‘decision tree’. It acts like a function that can predict and output given an input Attributen Attributem Attributek Attributel vn1 vn2 vn3 vm1 vm2 vl1 vl2 vk1 vk2 Class1 Class2 Class2 Class2 Class1 Class1
Decision Trees :Babu Ram Dawadi 4 Decision Trees The idea is to ask a series of questions, starting at the root, that will lead to a leaf node.
The leaf node provides the classification.
Slide 5:5 Classification by Decision Tree Induction Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
Decision tree generation consists of two phases
Tree construction
At start, all the training examples are at the root
Partition examples recursively based on selected attributes
Tree pruning
Identify and remove branches that reflect noise or outliers
Once the tree is build
Use of decision tree: Classifying an unknown sample
Decision Tree for PlayTennis :6 Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
Decision Tree for PlayTennis :7 Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal No Yes
Decision Tree for PlayTennis :8 Decision Tree for PlayTennis Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak ?
Decision Trees :9 Decision Trees Consider these data:
A number of examples of weather, for several days, with a classification ‘PlayTennis.’
Decision Tree Algorithm :10 Decision Tree Algorithm Building a decision tree
Select an attribute
Create the subsets of the example data for each value of the attribute
For each subset
if not all the elements of the subset belongs to same class repeat the steps 1-3 for the subset
Building Decision Trees :Babu Ram Dawadi 11 Building Decision Trees Let’s start building the tree from scratch. We first need to decide which attribute to make a decision. Let’s say we selected “humidity” Humidity high normal D1,D2,D3,D4
D8,D12,D14 D5,D6,D7,D9
D10,D11,D13
Building Decision Trees :12 Building Decision Trees Now lets classify the first subset D1,D2,D3,D4,D8,D12,D14 using attribute “wind” Humidity high normal D1,D2,D3,D4
D8,D12,D14 D5,D6,D7,D9
D10,D11,D13
Building Decision Trees :13 Building Decision Trees Subset D1,D2,D3,D4,D8,D12,D14 classified by attribute “wind” Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak D1,D3,D4,D8 D2,D12,D14
Building Decision Trees :14 Building Decision Trees Now lets classify the subset D2,D12,D14 using attribute “outlook” Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak D1,D3,D4,D8 D2,D12,D14
Building Decision Trees :15 Building Decision Trees Subset D2,D12,D14 classified by “outlook” Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak D1,D3,D4,D8 D2,D12,D14
Building Decision Trees :16 Building Decision Trees subset D2,D12,D14 classified using attribute “outlook” Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak D1,D3,D4,D8 outlook Sunny Rain Overcast Yes No No
Building Decision Trees :17 Building Decision Trees Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak D1,D3,D4,D8 outlook Sunny Rain Overcast Yes No No Now lets classify the subset D1,D3,D4,D8 using attribute “outlook”
Building Decision Trees :18 Building Decision Trees Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak outlook Sunny Rain Overcast Yes No No subset D1,D3,D4,D8 classified by “outlook” outlook Sunny Rain Overcast Yes No Yes
Building Decision Trees :19 Building Decision Trees Humidity high normal D5,D6,D7,D9
D10,D11,D13 wind strong weak outlook Sunny Rain Overcast Yes No No Now classify the subset D5,D6,D7,D9,D10,D11,D13 using attribute “outlook” outlook Sunny Rain Overcast Yes No Yes
Building Decision Trees :20 Building Decision Trees Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No subset D5,D6,D7,D9,D10,D11,D13 classified by “outlook” outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes D5,D6,D10
Building Decision Trees :21 Building Decision Trees Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No Finally classify subset D5,D6,D10by “wind” outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes D5,D6,D10
Building Decision Trees :22 Building Decision Trees Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No subset D5,D6,D10 classified by “wind” outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes wind strong weak Yes No
Decision Trees and Logic :23 Decision Trees and Logic Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No (humidity=high ? wind=strong ? outlook=overcast) ?
(humidity=high ? wind=weak ? outlook=overcast) ?
(humidity=normal ? outlook=sunny) ?
(humidity=normal ? outlook=overcast) ?
(humidity=normal ? outlook=rain ? wind=weak) ? ‘Yes’ outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes wind strong weak Yes No The decision tree can be expressed as an expression or if-then-else sentences:
Using Decision Trees :24 Using Decision Trees Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No Now let’s classify an unseen example: =? outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes wind strong weak Yes No
Using Decision Trees :25 Using Decision Trees Classifying: =? Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No outlook Sunny Rain Overcast Yes No Yes Rain Overcast Yes wind strong weak Yes No outlook Sunny Yes
Using Decision Trees :26 Using Decision Trees Classification for: =Yes Humidity high normal wind strong weak outlook Sunny Rain Overcast Yes No No outlook Sunny Rain Overcast Yes No Yes outlook Sunny Rain Overcast Yes Yes wind strong weak Yes No
A Big Problem… :27 A Big Problem… Here’s another tree from the same training data that has a different attribute order: Which attribute should we choose for each branch?
Choosing Attributes :28 Choosing Attributes We need a way of choosing the best attribute each time we add a node to the tree.
Most commonly we use a measure called entropy.
Entropy measure the degree of disorder in a set of objects.
Entropy :29 Entropy In our system we have
9 positive examples
5 negative examples
The entropy, E(S), of a set of examples is:
E(S) = ?-pi log pi
Where c = no of classes and pi = ratio of the number of examples of this value over the total number of examples. P+ = 9/14
P- = 5/14
E = - 9/14 log2 9/14 - 5/14 log2 5/14
E = 0.940 - In a homogenous (totally ordered) system, the entropy is 0.
- In a totally heterogeneous system (totally disordered), all classes have equal numbers of instances; the entropy is 1
Entropy :30 Entropy We can evaluate each attribute for their entropy.
E.g. evaluate the attribute “Temperature”
Three values: ‘Hot’, ‘Mild’, ‘Cool.’
So we have three subsets, one for each value of ‘Temperature’. Shot={D1,D2,D3,D13}
Smild={D4,D8,D10,D11,D12,D14}
Scool={D5,D6,D7,D9}
We will now find:
E(Shot)
E(Smild)
E(Scool)
Entropy :31 Entropy Shot= {D1,D2,D3,D13}
Examples:
2 positive
2 negative
Totally heterogeneous + disordered therefore:
p+= 0.5
p-= 0.5
Entropy(Shot),=
-0.5log20.5
-0.5log20.5 = 1.0 Smild= {D4,D8,D10,
D11,D12,D14}
Examples:
4 positive
2 negative
Proportions of each class in this subset:
p+= 0.666
p-= 0.333
Entropy(Smild),=
-0.666log20.666
-0.333log20.333 = 0.918 Scool={D5,D6,D7,D9}
Examples:
3 positive
1 negative
Proportions of each class in this subset:
p+= 0.75
p-= 0.25
Entropy(Scool),=
-0.25log20.25
-0.75log20.75 = 0.811
Gain :32 Gain Now we can compare the entropy of the system before we divided it into subsets using “Temperature”, with the entropy of the system afterwards. This will tell us how good “Temperature” is as an attribute.
The entropy of the system after we use attribute “Temperature” is:
(|Shot|/|S|)*E(Shot) + (|Smild|/|S|)*E(Smild) + (|Scool|/|S|)*E(Scool) This difference between the entropy of the system before and after the split into subsets is called the gain: (4/14)*1.0 + (6/14)*0.918 + (4/14)*0.811 = 0.9108 Gain(S,Temperature) = 0.940 - 0.9108 = 0.029 E(before) E(afterwards)
Decreasing Entropy :33 Decreasing Entropy 7red class 7pink class: E=1.0 All subset: E=0.0 Both subsets
E=-2/7log2/7 –5/7log5/7 Has a cross? Has a ring? Has a ring? no no no yes yes yes From the initial state,
Where there is total disorder… …to the final state where all subsets contain a single class
Tabulating the Possibilities :34 Tabulating the Possibilities … etc This shows the entropy calculations…
Table continued… :35 Table continued… …and this shows the gain calculations
Gain :36 Gain We calculate the gain for all the attributes.
Then we see which of them will bring more ‘order’ to the set of examples. Gain(S,Outlook) = 0.246
Gain(S,Humidity) = 0.151
Gain(S,Wind) = 0.048
Gain(S, Temp’) = 0.029
The first node in the tree should be the one with the highest value, i.e. ‘Outlook’.
ID3 (Decision Tree Algorithm: (Quinlan 1979)) :37 ID3 (Decision Tree Algorithm: (Quinlan 1979)) ID3 was the first proper decision tree algorithm to use this mechanism: Building a decision tree with ID3 algorithm
Select the attribute with the most gain
Create the subsets for each value of the attribute
For each subset
if not all the elements of the subset belongs to same class repeat the steps 1-3 for the subset Main Hypothesis of ID3: The simplest tree that classifies training examples will work best on future examples (Occam’s Razor)
ID3 (Decision Tree Algorithm) :38 ID3 (Decision Tree Algorithm) Function DecisionTtreeLearner(Examples, TargetClass, Attributes)
create a Root node for the tree
if all Examples are positive, return the single-node tree Root, with label = Yes
if all Examples are negative, return the single-node tree Root, with label = No
if Attributes list is empty,
return the single-node tree Root, with label = most common value of TargetClass in Examples
else
A = the attribute from Attributes with the highest information gain with respect to Examples
Make A the decision attribute for Root
for each possible value v of A:
add a new tree branch below Root, corresponding to the test A = v
let Examplesv be the subset of Examples that have value v for attribute A
if Examplesv is empty then
add a leaf node below this new branch with label = most common value of TargetClass in Examples
else
add the subtree DTL(Examplesv, TargetClass, Attributes - { A })
end if
end
return Root
The Problem of Overfitting :39 The Problem of Overfitting Trees may grow to include irrelevant attributes
Noise may add spurious nodes to the tree.
This can cause overfitting of the training data relative to test data. Hypothesis H overfits the data if there exists H’ with greater error than H, over training examples, but less error than H over entire distribution of instances.
Fixing Over-fitting :40 Fixing Over-fitting Two approaches to pruning
Prepruning: Stop growing tree during the training when it is determined that there is not enough data to make reliable choices.
Postpruning: Grow whole tree but then remove the branches that do not contribute good overall performance.
Rule Post-Pruning :41 Rule Post-Pruning Rule post-pruning
prune (generalize) each rule by removing any preconditions (i.e., attribute tests) that result in improving its accuracy over the validation set
sort pruned rules by accuracy, and consider them in this order when classifying subsequent instances IF (Outlook = Sunny) ^ (Humidity = High) THEN PlayTennis = No
Try removing (Outlook = Sunny) condition or (Humidity = High) condition from the rule and select whichever pruning step leads to the biggest improvement in accuracy on the validation set (or else neither if no improvement results).
converting to rules improves readability
Advantage and Disadvantages of Decision Trees :42 Advantage and Disadvantages of Decision Trees Advantages:
Easy to understand and map nicely to a production rules
Suitable for categorical as well as numerical inputs
No statistical assumptions about distribution of attributes
Generation and application to classify unknown outputs is very fast
Disadvantages:
Output attributes must be categorical
Unstable: slight variations in the training data may result in different attribute selections and hence different trees
Numerical input attributes leads to complex trees as attribute splits are usually binary
Homework :43 Homework Given the training data set, to identify whether a customer buys computer or not, Develop a Decision Tree using ID3 technique.
Association Rules :44 Association Rules Example1: a female shopper buys a handbag is likely to buy shoes
Example2: when a male customer buys beer, he is likely to buy salted peanuts
It is not very difficult to develop algorithms that will find this associations in a large database
The problem is that such an algorithm will also uncover many other associations that are of very little value.
Association Rules :45 Association Rules It is necessary to introduce some measures to distinguish interesting associations from non-interesting ones
Look for associations that have a lots of examples in the database: support of an association rule
May be that a considerable group of people who read all three magazines but there is a much larger group that buys A & B, but not C; association is very weak here although support might be very high.
Associations…. :46 Associations…. Percentage of records for which C holds, within the group of records for which A & B hold: confidence
Association rules are only useful in data mining if we already have a rough idea of what is we are looking for.
We will represent an association rule in the following way:
MUSIC_MAG, HOUSE_MAG=>CAR_MAG
Somebody that reads both a music and a house magazine is also very likely to read a car magazine
Associations… :Babu Ram Dawadi 47 Associations… Example: shopping Basket analysis
Example… :48 Example… 1. find all frequent Itemsets:
(a) 1-itemsets
K= [{Chips}C=1,{Rasbari}C=3,{Samosa}C=2, {Tea}C=1]
(b) extend to 2-itemsets:
L=[{Chips, Rasbari}C=1, {Rasbari,Samosa}C=2,{Rasbari,Tea}C=1,{Samosa,Tea}C=1]
(c) Extend to 3-Itemsets:
M=[{Rasbari, Samosa,Tea}C=1
Examples.. :49 Examples.. Match with the requirements:
Min. Support is 2 (66%)
(a) >> K1={{Rasbari}, {Samosa}}
(b) >>L1={Rasbari,Samosa}
(c) >>M1={}
Build All possible rules:
(a) no rule
(b) >> possible rules:
Rasbari=>Samosa
Samosa=>Rasbari
(c) No rule
Support: given the association rule X1,X2…Xn=>Y, the support is the Percentage of records for which X1,X2…Xn and Y both hold true.
Example.. :50 Example.. Calculate Confidence for b:
Confidence of [Rasbari=>Samosa]
{Rasbari,Samosa}C=2/{Rasbari}C=3
=2/3
66%
Confidence of Samosa=> Rasbari
{Rasbari,Samosa}C=2/{Samosa}C=2
=2/2
100%
Confidence: Given the association rule X1,X2….Xn=>Y, the confidence is the percentage of records for which Y holds within the group of records for which X1,X2…Xn holds true.
The A-Priori Algorithm :51 The A-Priori Algorithm Set the threshold for support rather high – to focus on a small number of best candidates,
Observation: If a set of items X has support s, then each subset of X must also have support at least s.
( if a pair {i,j} appears in say, 1000 baskets, then we know there are at least 1000 baskets with item i and at least 1000 baskets with item j )
Algorithm:
Find the set of candidate items – those that appear in a sufficient number of baskets by themselves
Run the query on only the candidate items
Apriori Algorithm :52 Apriori Algorithm Scan the database and count the frequency of the candidate item-sets, then Large Item-sets are decided based on the user specified min_sup. Based on the Large Item-sets, expand them with one more item to generate new Candidate item-sets. Initialise the candidate Item-sets as single items in database. Any new Large
Item-sets? Stop Begin NO YES
Apriori: A Candidate Generation-and-test Approach :53 Apriori: A Candidate Generation-and-test Approach Any subset of a frequent itemset must be frequent
if {beer, diaper, nuts} is frequent, so is {beer, diaper}
Every transaction having {beer, diaper, nuts} also contains {beer, diaper}
Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!
The performance studies show its efficiency and scalability
The Apriori Algorithm — An Example :54 The Apriori Algorithm — An Example Database TDB 1st scan C1 L1 L2 C2 C2 2nd scan C3 L3 3rd scan
Problems with A-priori Algorithms :55 Problems with A-priori Algorithms It is costly to handle a huge number of candidate sets. For example if there are 104 large 1-itemsets, the Apriori algorithm will need to generate more than 107 candidate 2-itemsets. Moreover for 100-itemsets, it must generate more than 2100 ? 1030 candidates in total.
The candidate generation is the inherent cost of the Apriori Algorithms, no matter what implementation technique is applied.
To mine a large data sets for long patterns – this algorithm is NOT a good idea.
When Database is scanned to check Ck for creating Lk, a large number of transactions will be scanned even they do not contain any k-itemset.
Artificial Neural Network: Outline :Babu Ram Dawadi 56 Artificial Neural Network: Outline Perceptrons
Multi-layer networks
Backpropagation Neuron switching time : > 10-3 secs
Number of neurons in the human brain: ~1011
Connections (synapses) per neuron : ~104–105
Face recognition : 0.1 secs
High degree of parallel computation
Distributed representations
Neural Network: Characteristics :57 Neural Network: Characteristics Highly parallel structure; hence a capability for fast computing
Ability to learn and adapt to changing system parameters
High degree of tolerance to damage in the connections
Ability to learn through parallel and distributed processing
Neural Networks :58 Neural Networks A neural Network is composed of a number of nodes, or units, connected by links. Each link has a numeric weight associated with it.
Each unit has a set of input links from other units, a set of output links to other units, a current activation level, and a means of computing the activation level at the next step in time.
Slide 59:59 Linear treshold unit (LTU) ? .
.
. w1 w2 wn w0 x0=1 ?i=0n wi xi 1 if ?i=0n wi xi >0
o(xi)=
-1 otherwise o { Input Unit Activation Unit Output Unit
Layered network :60 Layered network Single layered
Multi layered I1 I2 H3 H4 O5 w13 w24 w14 w23 w35 w45 Two layer, feed forward network with two inputs, two hidden nodes and one output node.
Perceptrons :61 Perceptrons A single-layered, feed-forward network can be taken as a perceptron. Ij Wj,i Oi Ij Wj O Single Perceptron
Perceptron Learning Rule :62 Perceptron Learning Rule wi = wi + ?wi
?wi = ? (t - o) xi
t=c(x) is the target value
o is the perceptron output
? Is a small constant (e.g. 0.1) called learning rate If the output is correct (t=o) the weights wi are not changed
If the output is incorrect (t?o) the weights wi are changed
such that the output of the perceptron for the new weights
is closer to t.
Backpropagation :63 Backpropagation A more sophisticated architecture that contains the hidden layers in which it has random weightings on its synapses in its initial stage.
For each training instance, the actual output of the network is compared with the desired output that would give a correct answer.
If there is a difference between the correct answer and the actual answer
Then the weightings of the individual nodes and synapses of the network are adjusted
Backpropagation :64 Backpropagation Process is repeated until the responses are more or less accurate
Once the structure of the network stablizes, the learning stage is over, and the network is now trained and ready to categorize unknown input.
During the training stage, the network receives examples of input and output pairs corresponding to records in the database, and adapts the weights of the different branches until all the inputs match the appropriate outputs.
Genetic Algorithm :65 Genetic Algorithm Derived inspiration from biology
The most fertile area for exchange of views between biology and computer science is ‘evolutionary computing’
This area evolved from three stages or less independent development:
Genetic algorithms
Evolutionary programming
Evolution strategies
GA.. :66 GA.. The investigators began to see a strong relationship between these areas, and at present, genetic algorithms are consideered to be among the most successful machine-learning techniques.
In the “origin of species”, Darwin described the theory of evolution, with the ‘natural selection’ as the central notion.
Each species has an overproduction of individuals and in a tough struggle for life, only those individuals that are best adapted to the environment survive.
The long DNA molecules, consisting of only four building blocks, suggest that all the heriditary information of a human individual, or of any living creature, has been laid down in a language of only four letters (C,G,A & T in language of genetics)
GA.. :67 GA.. The collection of genetic instruction for human is about 3 billion letters long
Each individual inherits some characteristics of the father and some of the mother.
Individual differences between people, such as hair color and eye color, and also pre-disposition for diseases, are caused by differences in genetic coding
Even the twins are different in numerous aspects.
GA.. :68 GA.. Following are the formula for constructing a genetic algorithm for the solution of problem
Write a good coding in terms of strings of limited alphabets
Invent an artificial environment in the computer where solution can join each other
Develop ways in which possible solutions can be combined. Like father’s and mother’s strings are simply cut and after changing, stuck together again called cross- over
Provide an initial population or solution set and make the computer play evolution by removing bad solutions from each generation and replacing them with mutations of good solutions
Stop when a family of successful solutions has been produced
Example :69 Example
Genetic algorithms :70 Genetic algorithms
Clustering :71 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster and dissimilar to the objects in other clusters
Cluster analysis is the process of finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters.
Clustering: Given a database D = {t1, t2, .., tn}, a distance measure dis(ti, tj) defined between any two objects ti and tj, and an integer value k, the clustering problem is to define a mapping f: D -> {1, …, k} where each ti is assigned to one cluster Kj, 1<=j<=k. here ‘k’ is the number of clusters.
Examples of Clustering Applications :72 Examples of Clustering Applications Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs
Land use: Identification of areas of similar land use in an earth observation database
Insurance: Identifying groups of motor insurance policy holders with a high average claim cost
City-planning: Identifying groups of houses according to their house type, value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
Clustering: Classification :73 Clustering: Classification Partitioning Clustering
Construct various partitions and then evaluate them by some criterion
Hierarchical Clustering
Create a hierarchical decomposition of the set of data (or objects) using some criterion
Partitioning Algorithms: Basic Concept :74 Partitioning Algorithms: Basic Concept Partitioning method: Construct a partition of a database D of n objects into a set of k clusters
Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion
Heuristic methods: k-means and k-medoids algorithms
k-means (MacQueen’67): Each cluster is represented by the center of the cluster
k-medoids or PAM (Partition around medoids): Each cluster is represented by one of the objects in the cluster
The K-Means Clustering Method :75 The K-Means Clustering Method Chose k number of clusters to be determined
Chose k objects randomly as the initial cluster centers
Repeat
Assign each object to their closest cluster center, using Euclidean distance
Compute new cluster centers, calculate mean point
Until
No change in cluster centers or
No object change its clusters
The K-Means Clustering Method :76 The K-Means Clustering Method
K-Means Clustering :77 K-Means Clustering
K-Means Clustering :78 K-Means Clustering
K-Means Clustering :79 K-Means Clustering
K-Means Clustering :80 K-Means Clustering
Weakness of K-means :81 Weakness of K-means Applicable only when mean is defined, then what about categorical data?
Need to specify K, the number of clusters, in advance
run the algorithm with different K values
Unable to handle noisy data and outliers
Works best when clusters are of approximately of equal size
Hierarchical Clustering :82 Hierarchical Clustering With hierarchal clustering, a nested set of cluster is created. Each level in the hierarchy has the separate set of clusters.
At the lowest level, each item is in its own unique cluster.
At the highest level, all items belong to the same cluster.
Hierarchical Clustering: Types :83 Hierarchical Clustering: Types Agglomerative:
starts with as many clusters as there are records, with each cluster having only one record. Then pairs of clusters are successively merged until the number of clusters reduces to k.
at each stage, the pair of clusters are merged which are nearest to each other. If the merging is continued, it terminates in the hierarchy of clusters which is built with just a single cluster containing all the records.
Divisive algorithm takes the opposite approach from the agglomerative techniques. These starts with all the records in one cluster, and then try to split that clusters into smaller pieces.
Slide 84:84 Agglomerative Divisive Remove
Outlier
A Dendrogram Shows How the Clusters are Merged Hierarchically :85 A Dendrogram Shows How the Clusters are Merged Hierarchically Decompose data objects into a several levels of nested partitioning (tree of clusters), called a dendrogram.
A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster. A B C D E