ExcelR delivered a multitude of corporate trainings and considered as one of the leaders in JIRA corporate training space

Comments

Posting comment...

Premium member

Presentation Transcript

Machine Learning:

Machine Learning

k-Nearest Neighbor Classifiers :

k-Nearest Neighbor Classifiers

1-Nearest Neighbor Classifier:

1-Nearest Neighbor Classifier Training Examples (Instances) Some for each CLASS Test Examples (What class to assign this?)

1-Nearest Neighbor:

1 -Nearest Neighbor x http:// www.math.le.ac.uk /people/ag153/homepage/KNN/ OliverKNN_Talk.pdf

2-Nearest Neighbor:

2 -Nearest Neighbor ?

3-Nearest Neighbor:

3-Nearest Neighbor X

8-Nearest Neighbor:

8-Nearest Neighbor X

Controlling COMPLEXITY in k-NN:

Controlling COMPLEXITY in k-NN

Measuring similarity with distance:

Measuring similarity with distance Locating the tomato's nearest neighbors requires a distance function , or a formula that measures the similarity between the two instances . There are many different ways to calculate distance. Traditionally, the k-NN algorithm uses Euclidean distance , which is the distance one would measure if it were possible to use a ruler to connect two points, illustrated in the previous figure by the dotted lines connecting the tomato to its neighbors.

Euclidean distance:

Euclidean distance Euclidean distance is specified by the following formula, where p and q are the examples to be compared, each having n features. The term p 1 refers to the value of the first feature of example p , while q 1 refers to the value of the first feature of example q :

Application of KNN:

Application of KNN Which Class Tomoto belongs to given the feature values: Tomato ( sweetness = 6 , crunchiness = 4 ),

K = 3, 5, 7, 9:

K = 3, 5, 7, 9

K = 11,13,15,17:

K = 11,13,15,17

Bayesian Classifiers :

Bayesian Classifiers

Slide 19:

Understanding probability The probability of an event is estimated from the observed data by dividing the number of trials in which the event occurred by the total number of trials For instance, if it rained 3 out of 10 days with similar conditions as today, the probability of rain today can be estimated as 3 / 10 = 0.30 or 30 percent. Similarly, if 10 out of 50 prior email messages were spam, then the probability of any incoming message being spam can be estimated as 10 / 50 = 0.20 or 20 percent. Note: The probability of all the possible outcomes of a trial must always sum to 1 For example, given the value P(spam) = 0.20, we can calculate P(ham) = 1 – 0.20 = 0.80

Slide 20:

For example, given the value P(spam) = 0.20, we can calculate P(ham) = 1 – 0.20 = 0.80 Understanding probability cont.. Because an event cannot simultaneously happen and not happen, an event is always mutually exclusive and exhaustive with its complement The complement of event A is typically denoted Ac or A'. Additionally, the shorthand notation P(¬A) can used to denote the probability of event A not occurring, as in P(¬spam) = 0.80. This notation is equivalent to P(Ac).

Slide 21:

Understanding joint probability Often, we are interested in monitoring several nonmutually exclusive events for the same trial Spam 20% Lottery 5% Ham 80% All emails

Slide 22:

Lottery without appearing in Spam Lottery appearing in Ham Lottery appearing in Spam Understanding joint probability Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam ∩ Lottery). the notation A ∩ B refers to the event in which both A and B occur.

Slide 23:

Calculating P(spam ∩ Lottery) depends on the joint probability of the two events or how the probability of one event is related to the probability of the other. If the two events are totally unrelated, they are called independent events If P(spam) and P(Lottery) were independent, we could easily calculate P(spam ∩ Lottery), the probability of both events happening at the same time. Because 20 percent of all the messages are spam, and 5 percent of all the e-mails contain the word Lottery, we could assume that 1 percent of all messages are spam with the term Lottery. More generally, for independent events A and B, the probability of both happening can be expressed as P(A ∩ B) = P(A) * P(B). 0.05 * 0.20 = 0.01

Bayes Rule:

Bayes Rule Bayes Rule : The most important Equation in ML! ! Posterior Probability (Probability of class AFTER seeing the data) Class Prior Data Likelihood given Class Data Prior (Marginal)

Naïve Bayes Classifier :

Naïve Bayes Classifier

Conditional Independence:

Conditional Independence Simple Independence between two variables: Class Conditional Independence assumption: Fever Body Ache Viral Infection

Naïve Bayes Classifier :

Naïve Bayes Classifier Conditional Independence among variables g iven Classes ! Simplifying assumption Baseline model especially when large number of features Taking log and ignoring denominator:

Naïve Bayes Classifier for Categorical Valued Variables :

Naïve Bayes Classifier for Categorical Valued Variables

Let’s Naïve Bayes!:

Let’s Naïve Bayes! #EXMPLS COLOR SHAPE LIKE 20 Red Square Y 10 Red Circle Y 10 Red Triangle N 10 Green Square N 5 Green Circle Y 5 Green Triangle N 10 Blue Square N 10 Blue Circle N 20 Blue Triangle Y

Parameter Estimation:

Parameter Estimation What / How many Parameters? Class Priors: Conditional Probabilities:

Naïve Bayes Classifier for Text Classifier :

Naïve Bayes Classifier for Text Classifier

Text Classification Example:

Text Classification Example Doc1 = {buy two shirts get one shirt half off} Doc2 = {get a free watch. send your contact details now} Doc3 = {your flight to chennai is delayed by two hours} Doc4 = {you have three tweets from @ sachin } Four Class Problem : Spam, Promotions, Social, Main

Bag-of-Words Representation:

Bag-of-Words Representation Structured (e.g. Multivariate) data – fixed number of features Unstructured (e.g. Text) data arbitrary length documents, high dimensional feature space (many words in vocabulary), Sparse (small fraction of vocabulary words present in a doc.) Bag-of-Words Representation: Ignore Sequential order of words Represent as a Weighted-Set – Term Frequency of each term RawDoc = {buy two shirts get one shirt half off } Stemming = {buy two shirt get one shirt half off} BoW’s = {buy:1, two:1, shirt:2, get:1, one:1, half:1, off:1}

Naïve Bayes Classifier with BoW:

Naïve Bayes Classifier with BoW Make an “ independence assumption ” about words | class BoW = {buty:1, two:1, shirt:2, get:1, one:1, half:1, off:1}

Naïve Bayes Text Classifiers:

Naïve Bayes Text Classifiers Log Likelihood of document given class. Parameters in Naïve Bayes Text classifiers:

Naïve Bayes Parameters:

Likelihood of a word given class. For each word, each class. Estimating these parameters from data: Naïve Bayes Parameters

Bayesian Classifier Multi-variate real-valued data :

Bayesian Classifier Multi- variate real-valued data

Bayes Rule:

Bayes Rule Posterior Probability (Probability of class AFTER seeing the data) Class Prior Data Likelihood given Class Data Prior (Marginal)

Simple Bayesian Classifier:

Simple Bayesian Classifier Each Class Conditional Probability is assumed to be a Uni -Modal (Single Cloud) (NORMAL) Distribution

Controlling COMPLEXITY:

Controlling COMPLEXITY

You do not have the permission to view this presentation. In order to view it, please
contact the author of the presentation.

Send to Blogs and Networks

Processing ....

Premium member

Use HTTPs

HTTPS (Hypertext Transfer Protocol Secure) is a protocol used by Web servers to transfer and display Web content securely. Most web browsers block content or generate a “mixed content” warning when users access web pages via HTTPS that contain embedded content loaded via HTTP. To prevent users from facing this, Use HTTPS option.