# data scientist course in Mumbai

Views:

Category: Entertainment

## Presentation Description

No description available.

## Presentation Transcript

### slide 2:

© 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Logisc regression Binomial distribuon is used when output has ‘2’ categories •  Mulnomial regression classiﬁcaon model is used when output has ‘2’ categories •  Extension to logisc regression •  No natural ordering of categories •  Response variable has ‘2’ categories hence we apply mullogit •  Understand the impact of cost me on the various modes of transport Mode of transport Car Carpool Bus Rail All modes Count 218 32 81 122 453 Probability 0.48 0.07 0.18 0.27 1

### slide 3:

© 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Whether we have ‘Y’ response or ‘X’ predictor which is categorical with ‘s’ categories ü  Lowest in numerical / lexicographical value is chosen as baseline / reference ü  Missing level in output is baseline level ü  We can choose the baseline level of our choice based on ‘relevel’ funcon in R ü  Model formulates the relaonship between transformed logit Y numerical X linearly ü  Modeling quantave variables linearly might not always be correct

### slide 4:

© 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Iteraon History: •  Iterave procedure is used to compute maximum likelihood esmates •  iteraons convergence status is provided •  -2logL 2 negave log likelihood •  -2logL has χ 2 distribuon which is used for hypothesis tesng of goodness of ﬁt parameters 27

### slide 5:

© 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output LogPchoice carpool | x / Pchoice car | x β 20 + β 21 cost.car + β 22 cost.carpool + ……………. This equaon compares the log of probabilies of carpool to car •  ‘car’ has been chosen as baseline •  x vector represenng the values of all inputs •  The regression coeﬃcient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’ the log odds of ‘carpool’ to ‘car’ increases by 0.636 •  Intercept value does not mean anything in this context •  If we have a categorical X also say Gender female 0 male 1 then regression coeﬃcient say 0.22 indicates that relave to females males increase the log odds of ‘carpool’ to ‘car’ by 0.22

### slide 6:

© 2013 ExcelR Solutions. All Rights Reserved Probability •  Let p px | A be the probability of any event say arion under condion A say gender female •  Then px | A ÷ 1 - px | A is called the odds associated with the event Odds •  If there are two condions A gender female B gender male then the rao px | A ÷ 1 - px | A / px | B ÷ 1 - px | B is called as odds rao of A with respect to B Odds Ratio •  px | A ÷ px | B is called as relave risk Relative Risk hps:// en.wikipedia.org/wiki/Relave_risk

### slide 7:

© 2013 ExcelR Solutions. All Rights Reserved •  Odds rao is computed from the coeﬃcients in the linear model equaon by simply exponenang •  Exponenated regression coeﬃcients are odds rao for a unit change in a predictor variable •  The odds rao for a unit increase in cost.car is 1.88 for choosing carpool vs car Odds Ratio

### slide 8:

© 2013 ExcelR Solutions. All Rights Reserved Goodness of fit Linear GLM Analysis of Variance Analysis of Deviance Residual Deviance Residual Sum of Squares OLS Maximum Likelihood •  Residual Deviance is -2 log L •  Adding more parameters to the model will reduce Residual Deviance even if it is not going to be useful for predicon •  In order to control this penalty of “2 number of parameters” is added to to Residual deviance •  This penalized value of -2 log L is called as AIC criterion •  AIC -2 log L + 2 number of parameters Note: “Mullogit Model with Interacon ”