cs206 4

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Decision Trees: 

Decision Trees

Example Decision Tree: 

Example Decision Tree Married? y n Own dog? y n Own home? y n Own home? y n Own dog? y n Bad ?? Good Bad Bad Good

Applications: 

Applications Credit-card companies and banks develop DT’s to decide whether to grant a card or loan. Medical apps, e.g., given information about patients, decide which will benefit from a new drug. Many others.

Issues: 

Issues How do you use a decision tree? I.e., given a record, how do you decide whether it is good or bad? How do you design decision trees? Ad-hoc: decide yourself. Training: algorithm to construct “best” DT from given data. Hope future records match the training set.

Designing a Decision Tree: 

Designing a Decision Tree Typically, we are given data consisting of a number of records, perhaps representing individuals. Each record has a value for each of several attributes. Often binary attributes, e.g., “has dog.” Sometimes numeric, e.g. “age”, or discrete, multiway, like “school attended.”

Designing a Decision Tree 2: 

Designing a Decision Tree 2 Records are classified into “good” or “bad.” More generally: some number of outcomes. The goal is to make a small number of tests involving attributes to decide as best we can whether a record is good or bad.

Using a Decision Tree: 

Using a Decision Tree Given a record to classify, start at the root, and answer the question at the root for that record. E.g., is the record for a married person? Move next to the indicated child. Recursively, apply the DT rooted at that child, until we reach a decision.

Training Sets: 

Training Sets Decision-tree construction is today considered a type of “machine learning.” We are given a training set of example records, properly classified, with which to construct our decision tree.

Example: 

Example Here is the data on which our example DT was based: Married? Home? Dog? Rating 0 1 0 G 0 0 1 G 0 1 1 G 1 0 0 G 1 0 0 B 0 0 0 B 1 0 1 B 1 1 0 B

Binary Attributes: 

Binary Attributes When all attributes are binary, we can pick an attribute to place at the root by considering how nonrandom are the sets of records that go to each side. Branches correspond to the value of the chosen attribute.

Entropy: A Measure of Goodness: 

Entropy: A Measure of Goodness Consider the pools of records on the “yes” and “no” sides. If fraction p on on a side are “good,” the entropy of that side is -(p log2p + (1-p) log2(1-p)). = p log2(1/p) + (1-p) log2(1/(1-p)) Pick attribute that minimizes maximum entropies of the sides.

Shape of Entropy Function: 

Shape of Entropy Function 0 1 0 1/2 1

Intuition: 

Intuition Entropy 1 = random behavior, no useful information. Low entropy = significant information. At entropy = 0, we know exactly. Ideally, we find an attribute such that most of the “good’s” are on one side, and most of the “bad’s” are on the other.

Example: 

Example Our Married, Home, Dog, Rating data: 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Married: 1/4 of Y is G; 1/4 of N is B. Entropy = ((1/4) log24 + (3/4) log2(4/3)) = .81 on both sides.

Example, Continued: 

Example, Continued 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Home: 1/3 of Y is B; 2/5 of N is G. Entropy is (1/3) log23 + (2/3) log2(3/2) = .92 on Y side. Entropy is (2/5) log2(5/2) + (3/5) log2(5/3) = .98 on N side. Max = .98, greater than for Married. Dog is similar, so Married “wins.”

The “Training” Process: 

The “Training” Process Married? 100G, 100B, y n 010G, 001G 101B, 110B 011G, 000B Dog? 101B y n 100G, 100B 110B Bad Home? 010G, y n 001G, 011G 000B Good Home? 110B y n 100B, 100G Bad ?? Dog? 001G y n 000B Good Bad

Handling Numeric Data: 

Handling Numeric Data While complicated tests at a node are permissible, e.g., “age = 30 or age < 50 and age > 42,” the simplest thing is to pick one breakpoint, and divide records by value < breakpoint and value > breakpoint. Rate an attribute and breakpoint by min-max entropy of the two sides.

Overfitting: 

Overfitting A major problem in designing decision trees is that one tends to create too many levels. The number of records reaching a node is small, so significance is lost. Extreme example: our data was generated by coin-flips; the tree is unlikely to reflect additional data that it would be used to classify.

Possible Solutions: 

Possible Solutions 1. Limit depth of tree so that each decision is based on a sufficiently large pool of training data. 2. Create several trees independently (needs randomness in choice of attribute). Decision based on vote of D.T.’s. Filters out irrelevant factors.