logging in or signing up selfTraining Tatlises Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 121 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Semi-Supervised Time Series Classification: Semi-Supervised Time Series Classification Li Wei Eamonn Keogh University of California, Riverside {wli, eamonn}@cs.ucr.eduIndexing of Handwritten Documents: A) B) C) A) B) C) Figure 1: A) A sample of text written by George Washington. B) The word “Alexandria” after having its slant removed. C) A time series created by tracing the upper profile of the word (Image courtesy of Raghavan Manmatha, used with permission) There has been a recent explosion of interest in indexing handwritten documents. Note that simply treating the words as “time series” (see Figure 1) is an extremely competitive approach for classifying (and thus indexing) handwritten documents. Handwriting classifiers must be trained on each individual’s particular handwriting. However the cost of obtaining labeled data for each word, for every individual is very expensive as measured in human time. A semi-supervised approach where a user annotates just a few training examples would have great utility. Indexing of Handwritten DocumentsValue of Unlabeled Data: Figure 3: The small dataset of labeled instances shown in Figure 2 has been augmented by incorporating the previously unlabeled examples. Now the instance to be classified (marked with “?”) is closest to F5, and is correctly classified Figure 2: A simple example to motivate semi-supervised classification. The instance to be classified (marked with “?”) is actually a F (female) but happens to be closer to a M (male) in this small dataset of labeled instances Unlabeled data do contain information which can help classification. For example in Figure 2, we need to classify the instance marked with “?”, which clearly belongs to the F (female) class. However this particular image happens to show the actor in a pose which is very similar to one of the M (male) instances, M1, and is thus misclassified. Note that F1 is a very close match to the unlabeled instance U4, and we could simply change the label from U4 to F2, and add it to our dataset of labeled instances. In fact, the basic tenet of semi-supervised learning is that we can do this repeatedly, and thus end up with the situation shown in Figure 3. Value of Unlabeled DataTraining the Classifier: Training the Classifier A) B) C) Single positively labeled example Positive Class Negative Class Positive Class Negative Class Positive Class Negative Class Added in first iteration Added in second iteration … Added in seventeenth iteration Single positively labeled example Single positively labeled example Figure 4: Semi-supervised training on a simple two-class dataset yields much higher accuracy than a naive k-nearest-neighbor classifier We let the classifier teach itself by its own predication. For example in Figure 4 we have a two-class dataset, where initially only one example is known as positive (the solid square in subplot A). In subplot B, we can see the chaining effect of semi-supervised learning: a positive example is labeled which helps labeling other positive examples and so on. Eventually all positive examples are correctly classified. In contrast, if we simply put the seventeen nearest neighbors of the single labeled example to the positive class, we will get very poor accuracy (see subplot C).Stopping Heuristic I: Stopping Heuristic I 0.2 0.4 0.6 0.8 1 Precision-recall breakeven point 50 100 150 200 250 300 350 400 0 0.5 1 1.5 2 2.5 Number of Iteration Distance between the closest pair in P 2.5 moving into negative space adding positive examples find the closest pair Figure 5: Changing of the minimal nearest neighbor distance of the labeled set on ECG dataset Because we do not know the ground truth of the data, it is very hard (if not impossible) to know the true performance of the classifier. Fortunately, the distance statistics give us some hint about how well the classifier is doing. In Figure 4, we can see that the minimal nearest neighbor distance decreases dramatically in the first few iterations, stabilizes for a relatively long time, and drops again. Interestingly, the precision-recall breakeven point achieved by the classifier has a corresponding trend of increasing, stabilizing, and decreasing. Stopping Heuristic II: Stopping Heuristic II -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 Closest pair in labeled Positive set Closest pair in labeled Positive set A) B) C) D) A negative instance is added into labeled positive set Figure 6: A sample dataset shown in two-dimensional space. A) Initially the two solid (red) squares are labeled as positive. B) At some point the closest pair in the positive set is added into labeled positive set. C) A negative instance is being added into labeled positive set. D) The closest pair in labeled positive set changes to two negative instances In hindsight, the phenomenon in Figure 5 is not surprising. In the first few iterations, the labeled positive set is relatively small. By adding more positive examples into it, the space gets denser, and as a result, the minimal nearest neighbor distance decreases. At some point, the closest pair of the positive examples is incorporated in the labeled set. The minimal nearest neighbor distance will be the distance between them. However if a negative example is being labeled as positive, chances are high that we will keep adding negative examples because the negative space is much denser than the positive space. Thus we will see a drop of the minimal nearest neighbor distance of the positive set. Figure 6 illustrates the process on a small sample dataset.ECG Dataset: ECG Dataset 20 40 60 80 100 120 140 160 180 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Number of iterations Precision-recall breakeven point Figure 7: Classification performance on ECG DatasetWord Spotting Dataset: Word Spotting Dataset Figure 9: Ranking changes of two instances in Word Spotting dataset during semi-supervised training Figure 8: Classification performance on Word Spotting DatasetGun Dataset: Gun Dataset 5 10 15 20 25 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 Number of iterations Precision-recall breakeven point Figure 10: Classification performance on Gun DatasetWafer Dataset: Wafer Dataset 5 10 15 20 25 30 35 40 45 50 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 Number of iterations Precision-recall breakeven point Figure 11: Classification performance on Wafer DatasetYoga Dataset: Yoga Dataset Figure 13: Classification performance on Yoga Dataset Figure 12: Shapes can be converted to time series. The distance from every point on the profile to the center is measured and treated as the Y-axis of a time series You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
selfTraining Tatlises Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 121 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Semi-Supervised Time Series Classification: Semi-Supervised Time Series Classification Li Wei Eamonn Keogh University of California, Riverside {wli, eamonn}@cs.ucr.eduIndexing of Handwritten Documents: A) B) C) A) B) C) Figure 1: A) A sample of text written by George Washington. B) The word “Alexandria” after having its slant removed. C) A time series created by tracing the upper profile of the word (Image courtesy of Raghavan Manmatha, used with permission) There has been a recent explosion of interest in indexing handwritten documents. Note that simply treating the words as “time series” (see Figure 1) is an extremely competitive approach for classifying (and thus indexing) handwritten documents. Handwriting classifiers must be trained on each individual’s particular handwriting. However the cost of obtaining labeled data for each word, for every individual is very expensive as measured in human time. A semi-supervised approach where a user annotates just a few training examples would have great utility. Indexing of Handwritten DocumentsValue of Unlabeled Data: Figure 3: The small dataset of labeled instances shown in Figure 2 has been augmented by incorporating the previously unlabeled examples. Now the instance to be classified (marked with “?”) is closest to F5, and is correctly classified Figure 2: A simple example to motivate semi-supervised classification. The instance to be classified (marked with “?”) is actually a F (female) but happens to be closer to a M (male) in this small dataset of labeled instances Unlabeled data do contain information which can help classification. For example in Figure 2, we need to classify the instance marked with “?”, which clearly belongs to the F (female) class. However this particular image happens to show the actor in a pose which is very similar to one of the M (male) instances, M1, and is thus misclassified. Note that F1 is a very close match to the unlabeled instance U4, and we could simply change the label from U4 to F2, and add it to our dataset of labeled instances. In fact, the basic tenet of semi-supervised learning is that we can do this repeatedly, and thus end up with the situation shown in Figure 3. Value of Unlabeled DataTraining the Classifier: Training the Classifier A) B) C) Single positively labeled example Positive Class Negative Class Positive Class Negative Class Positive Class Negative Class Added in first iteration Added in second iteration … Added in seventeenth iteration Single positively labeled example Single positively labeled example Figure 4: Semi-supervised training on a simple two-class dataset yields much higher accuracy than a naive k-nearest-neighbor classifier We let the classifier teach itself by its own predication. For example in Figure 4 we have a two-class dataset, where initially only one example is known as positive (the solid square in subplot A). In subplot B, we can see the chaining effect of semi-supervised learning: a positive example is labeled which helps labeling other positive examples and so on. Eventually all positive examples are correctly classified. In contrast, if we simply put the seventeen nearest neighbors of the single labeled example to the positive class, we will get very poor accuracy (see subplot C).Stopping Heuristic I: Stopping Heuristic I 0.2 0.4 0.6 0.8 1 Precision-recall breakeven point 50 100 150 200 250 300 350 400 0 0.5 1 1.5 2 2.5 Number of Iteration Distance between the closest pair in P 2.5 moving into negative space adding positive examples find the closest pair Figure 5: Changing of the minimal nearest neighbor distance of the labeled set on ECG dataset Because we do not know the ground truth of the data, it is very hard (if not impossible) to know the true performance of the classifier. Fortunately, the distance statistics give us some hint about how well the classifier is doing. In Figure 4, we can see that the minimal nearest neighbor distance decreases dramatically in the first few iterations, stabilizes for a relatively long time, and drops again. Interestingly, the precision-recall breakeven point achieved by the classifier has a corresponding trend of increasing, stabilizing, and decreasing. Stopping Heuristic II: Stopping Heuristic II -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 Closest pair in labeled Positive set Closest pair in labeled Positive set A) B) C) D) A negative instance is added into labeled positive set Figure 6: A sample dataset shown in two-dimensional space. A) Initially the two solid (red) squares are labeled as positive. B) At some point the closest pair in the positive set is added into labeled positive set. C) A negative instance is being added into labeled positive set. D) The closest pair in labeled positive set changes to two negative instances In hindsight, the phenomenon in Figure 5 is not surprising. In the first few iterations, the labeled positive set is relatively small. By adding more positive examples into it, the space gets denser, and as a result, the minimal nearest neighbor distance decreases. At some point, the closest pair of the positive examples is incorporated in the labeled set. The minimal nearest neighbor distance will be the distance between them. However if a negative example is being labeled as positive, chances are high that we will keep adding negative examples because the negative space is much denser than the positive space. Thus we will see a drop of the minimal nearest neighbor distance of the positive set. Figure 6 illustrates the process on a small sample dataset.ECG Dataset: ECG Dataset 20 40 60 80 100 120 140 160 180 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Number of iterations Precision-recall breakeven point Figure 7: Classification performance on ECG DatasetWord Spotting Dataset: Word Spotting Dataset Figure 9: Ranking changes of two instances in Word Spotting dataset during semi-supervised training Figure 8: Classification performance on Word Spotting DatasetGun Dataset: Gun Dataset 5 10 15 20 25 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 Number of iterations Precision-recall breakeven point Figure 10: Classification performance on Gun DatasetWafer Dataset: Wafer Dataset 5 10 15 20 25 30 35 40 45 50 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 Number of iterations Precision-recall breakeven point Figure 11: Classification performance on Wafer DatasetYoga Dataset: Yoga Dataset Figure 13: Classification performance on Yoga Dataset Figure 12: Shapes can be converted to time series. The distance from every point on the profile to the center is measured and treated as the Y-axis of a time series