logging in or signing up cs206 4 Jade Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 344 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Decision Trees: Decision Trees Example Decision Tree: Example Decision Tree Married? y n Own dog? y n Own home? y n Own home? y n Own dog? y n Bad ?? Good Bad Bad GoodApplications: Applications Credit-card companies and banks develop DT’s to decide whether to grant a card or loan. Medical apps, e.g., given information about patients, decide which will benefit from a new drug. Many others.Issues: Issues How do you use a decision tree? I.e., given a record, how do you decide whether it is good or bad? How do you design decision trees? Ad-hoc: decide yourself. Training: algorithm to construct “best” DT from given data. Hope future records match the training set.Designing a Decision Tree: Designing a Decision Tree Typically, we are given data consisting of a number of records, perhaps representing individuals. Each record has a value for each of several attributes. Often binary attributes, e.g., “has dog.” Sometimes numeric, e.g. “age”, or discrete, multiway, like “school attended.”Designing a Decision Tree 2: Designing a Decision Tree 2 Records are classified into “good” or “bad.” More generally: some number of outcomes. The goal is to make a small number of tests involving attributes to decide as best we can whether a record is good or bad.Using a Decision Tree: Using a Decision Tree Given a record to classify, start at the root, and answer the question at the root for that record. E.g., is the record for a married person? Move next to the indicated child. Recursively, apply the DT rooted at that child, until we reach a decision.Training Sets: Training Sets Decision-tree construction is today considered a type of “machine learning.” We are given a training set of example records, properly classified, with which to construct our decision tree.Example: Example Here is the data on which our example DT was based: Married? Home? Dog? Rating 0 1 0 G 0 0 1 G 0 1 1 G 1 0 0 G 1 0 0 B 0 0 0 B 1 0 1 B 1 1 0 BBinary Attributes: Binary Attributes When all attributes are binary, we can pick an attribute to place at the root by considering how nonrandom are the sets of records that go to each side. Branches correspond to the value of the chosen attribute.Entropy: A Measure of Goodness: Entropy: A Measure of Goodness Consider the pools of records on the “yes” and “no” sides. If fraction p on on a side are “good,” the entropy of that side is -(p log2p + (1-p) log2(1-p)). = p log2(1/p) + (1-p) log2(1/(1-p)) Pick attribute that minimizes maximum entropies of the sides.Shape of Entropy Function: Shape of Entropy Function 0 1 0 1/2 1Intuition: Intuition Entropy 1 = random behavior, no useful information. Low entropy = significant information. At entropy = 0, we know exactly. Ideally, we find an attribute such that most of the “good’s” are on one side, and most of the “bad’s” are on the other.Example: Example Our Married, Home, Dog, Rating data: 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Married: 1/4 of Y is G; 1/4 of N is B. Entropy = ((1/4) log24 + (3/4) log2(4/3)) = .81 on both sides.Example, Continued: Example, Continued 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Home: 1/3 of Y is B; 2/5 of N is G. Entropy is (1/3) log23 + (2/3) log2(3/2) = .92 on Y side. Entropy is (2/5) log2(5/2) + (3/5) log2(5/3) = .98 on N side. Max = .98, greater than for Married. Dog is similar, so Married “wins.”The “Training” Process: The “Training” Process Married? 100G, 100B, y n 010G, 001G 101B, 110B 011G, 000B Dog? 101B y n 100G, 100B 110B Bad Home? 010G, y n 001G, 011G 000B Good Home? 110B y n 100B, 100G Bad ?? Dog? 001G y n 000B Good BadHandling Numeric Data: Handling Numeric Data While complicated tests at a node are permissible, e.g., “age = 30 or age < 50 and age > 42,” the simplest thing is to pick one breakpoint, and divide records by value < breakpoint and value > breakpoint. Rate an attribute and breakpoint by min-max entropy of the two sides.Overfitting: Overfitting A major problem in designing decision trees is that one tends to create too many levels. The number of records reaching a node is small, so significance is lost. Extreme example: our data was generated by coin-flips; the tree is unlikely to reflect additional data that it would be used to classify.Possible Solutions: Possible Solutions 1. Limit depth of tree so that each decision is based on a sufficiently large pool of training data. 2. Create several trees independently (needs randomness in choice of attribute). Decision based on vote of D.T.’s. Filters out irrelevant factors. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
cs206 4 Jade Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 344 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Decision Trees: Decision Trees Example Decision Tree: Example Decision Tree Married? y n Own dog? y n Own home? y n Own home? y n Own dog? y n Bad ?? Good Bad Bad GoodApplications: Applications Credit-card companies and banks develop DT’s to decide whether to grant a card or loan. Medical apps, e.g., given information about patients, decide which will benefit from a new drug. Many others.Issues: Issues How do you use a decision tree? I.e., given a record, how do you decide whether it is good or bad? How do you design decision trees? Ad-hoc: decide yourself. Training: algorithm to construct “best” DT from given data. Hope future records match the training set.Designing a Decision Tree: Designing a Decision Tree Typically, we are given data consisting of a number of records, perhaps representing individuals. Each record has a value for each of several attributes. Often binary attributes, e.g., “has dog.” Sometimes numeric, e.g. “age”, or discrete, multiway, like “school attended.”Designing a Decision Tree 2: Designing a Decision Tree 2 Records are classified into “good” or “bad.” More generally: some number of outcomes. The goal is to make a small number of tests involving attributes to decide as best we can whether a record is good or bad.Using a Decision Tree: Using a Decision Tree Given a record to classify, start at the root, and answer the question at the root for that record. E.g., is the record for a married person? Move next to the indicated child. Recursively, apply the DT rooted at that child, until we reach a decision.Training Sets: Training Sets Decision-tree construction is today considered a type of “machine learning.” We are given a training set of example records, properly classified, with which to construct our decision tree.Example: Example Here is the data on which our example DT was based: Married? Home? Dog? Rating 0 1 0 G 0 0 1 G 0 1 1 G 1 0 0 G 1 0 0 B 0 0 0 B 1 0 1 B 1 1 0 BBinary Attributes: Binary Attributes When all attributes are binary, we can pick an attribute to place at the root by considering how nonrandom are the sets of records that go to each side. Branches correspond to the value of the chosen attribute.Entropy: A Measure of Goodness: Entropy: A Measure of Goodness Consider the pools of records on the “yes” and “no” sides. If fraction p on on a side are “good,” the entropy of that side is -(p log2p + (1-p) log2(1-p)). = p log2(1/p) + (1-p) log2(1/(1-p)) Pick attribute that minimizes maximum entropies of the sides.Shape of Entropy Function: Shape of Entropy Function 0 1 0 1/2 1Intuition: Intuition Entropy 1 = random behavior, no useful information. Low entropy = significant information. At entropy = 0, we know exactly. Ideally, we find an attribute such that most of the “good’s” are on one side, and most of the “bad’s” are on the other.Example: Example Our Married, Home, Dog, Rating data: 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Married: 1/4 of Y is G; 1/4 of N is B. Entropy = ((1/4) log24 + (3/4) log2(4/3)) = .81 on both sides.Example, Continued: Example, Continued 010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B. Home: 1/3 of Y is B; 2/5 of N is G. Entropy is (1/3) log23 + (2/3) log2(3/2) = .92 on Y side. Entropy is (2/5) log2(5/2) + (3/5) log2(5/3) = .98 on N side. Max = .98, greater than for Married. Dog is similar, so Married “wins.”The “Training” Process: The “Training” Process Married? 100G, 100B, y n 010G, 001G 101B, 110B 011G, 000B Dog? 101B y n 100G, 100B 110B Bad Home? 010G, y n 001G, 011G 000B Good Home? 110B y n 100B, 100G Bad ?? Dog? 001G y n 000B Good BadHandling Numeric Data: Handling Numeric Data While complicated tests at a node are permissible, e.g., “age = 30 or age < 50 and age > 42,” the simplest thing is to pick one breakpoint, and divide records by value < breakpoint and value > breakpoint. Rate an attribute and breakpoint by min-max entropy of the two sides.Overfitting: Overfitting A major problem in designing decision trees is that one tends to create too many levels. The number of records reaching a node is small, so significance is lost. Extreme example: our data was generated by coin-flips; the tree is unlikely to reflect additional data that it would be used to classify.Possible Solutions: Possible Solutions 1. Limit depth of tree so that each decision is based on a sufficiently large pool of training data. 2. Create several trees independently (needs randomness in choice of attribute). Decision based on vote of D.T.’s. Filters out irrelevant factors.