logging in or signing up LDAUnrolled vasuki64 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 35 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: July 07, 2011 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. : Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research , 3:993-1022, January 2003. Jonathan Huang (jch1@cs.cmu.edu) Advisor: Carlos Guestrin 11/15/2005“Bag of Words” Models: “Bag of Words” Models Let’s assume that all the words within a document are exchangeable.Mixture of Unigrams: Mixture of Unigrams Mixture of Unigrams Model (this is just Naïve Bayes) For each of M documents, Choose a topic z. Choose N words by drawing each one independently from a multinomial conditioned on z. In the Mixture of Unigrams model, we can only have one topic per document! Z i w 4i w 3i w 2i w i1The pLSI Model: The pLSI Model Probabilistic Latent Semantic Indexing (pLSI) Model For each word of document d in the training set, Choose a topic z according to a multinomial conditioned on the index d. Generate the word by drawing from a multinomial conditioned on z. In pLSI, documents can have multiple topics. d z d4 z d3 z d2 z d1 w d4 w d3 w d2 w d1Motivations for LDA: Motivations for LDA In pLSI, the observed variable d is an index into some training set. There is no natural way for the model to handle previously unseen documents. The number of parameters for pLSI grows linearly with M (the number of documents in the training set). We would like to be Bayesian about our topic mixture proportions.Dirichlet Distributions: Dirichlet Distributions In the LDA model, we would like to say that the topic mixture proportions for each document are drawn from some distribution. So, we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one. The space is of all of these multinomials has a nice geometric interpretation as a (k-1)- simplex , which is just a generalization of a triangle to (k-1) dimensions. Criteria for selecting our prior: It needs to be defined for a (k-1)-simplex. Algebraically speaking, we would like it to play nice with the multinomial distribution.Dirichlet Examples: Dirichlet ExamplesDirichlet Distributions: Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter i can be thought of as a prior count of the i th class.The LDA Model: The LDA Model z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 b z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 For each document, Choose ~ Dirichlet( ) For each of the N words wn: Choose a topic z n » Multinomial( ) Choose a word w n from p(w n |z n , ), a multinomial probability conditioned on the topic z n .The LDA Model: The LDA Model For each document, Choose » Dirichlet( ) For each of the N words w n : Choose a topic z n » Multinomial( ) Choose a word w n from p(w n |z n , ), a multinomial probability conditioned on the topic z n .Inference: Inference The inference problem in LDA is to compute the posterior of the hidden variables given a document and corpus parameters and . That is, compute p( ,z|w, , ). Unfortunately, exact inference is intractable, so we turn to alternatives…Variational Inference: Variational Inference In variational inference, we consider a simplified graphical model with variational parameters , and minimize the KL Divergence between the variational and posterior distributions.Parameter Estimation: Parameter Estimation Given a corpus of documents, we would like to find the parameters and which maximize the likelihood of the observed data. Strategy ( Variational EM) : Lower bound log p(w| , ) by a function L( , ; , ) Repeat until convergence: Maximize L( , ; , ) with respect to the variational parameters , . Maximize the bound with respect to parameters and .Some Results: Some Results Given a topic, LDA can return the most probable words. For the following results, LDA was trained on 10,000 text articles posted to 20 online newsgroups with 40 iterations of EM. The number of topics was set to 50.Some Results: Some Results Political Team Space Drive God Party Game NASA Windows Jesus Business Play Research Card His Convention Year Center DOS Bible Institute Games Earth SCSI Christian Committee Win Health Disk Christ States Hockey Medical System Him Rights Season Gov Memory Christians “politics” “sports” “space” “computers” “christianity”Extensions/Applications: Extensions/Applications Multimodal Dirichlet Priors Correlated Topic Models Hierarchical Dirichlet Processes Abstract Tagging in Scientific Journals Object Detection/RecognitionVisual Words: Visual Words Idea: Given a collection of images, Think of each image as a document. Think of feature patches of each image as words. Apply the LDA model to extract topics. (J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. Discovering object categories in image collections. MIT AI Lab Memo AIM-2005-005 , February, 2005. )Visual Words: Visual Words Examples of ‘visual words’Visual Words: Visual WordsThanks!: Thanks! Questions? References: Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research , 3:993-1022, January 2003. Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. Hierarchical topic models and the nested Chinese restaurant process. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16 , Cambridge, MA, 2004. MIT Press. Discovering object categories in image collections. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. MIT AI Lab Memo AIM-2005-005 , February, 2005. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
LDAUnrolled vasuki64 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 35 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: July 07, 2011 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. : Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research , 3:993-1022, January 2003. Jonathan Huang (jch1@cs.cmu.edu) Advisor: Carlos Guestrin 11/15/2005“Bag of Words” Models: “Bag of Words” Models Let’s assume that all the words within a document are exchangeable.Mixture of Unigrams: Mixture of Unigrams Mixture of Unigrams Model (this is just Naïve Bayes) For each of M documents, Choose a topic z. Choose N words by drawing each one independently from a multinomial conditioned on z. In the Mixture of Unigrams model, we can only have one topic per document! Z i w 4i w 3i w 2i w i1The pLSI Model: The pLSI Model Probabilistic Latent Semantic Indexing (pLSI) Model For each word of document d in the training set, Choose a topic z according to a multinomial conditioned on the index d. Generate the word by drawing from a multinomial conditioned on z. In pLSI, documents can have multiple topics. d z d4 z d3 z d2 z d1 w d4 w d3 w d2 w d1Motivations for LDA: Motivations for LDA In pLSI, the observed variable d is an index into some training set. There is no natural way for the model to handle previously unseen documents. The number of parameters for pLSI grows linearly with M (the number of documents in the training set). We would like to be Bayesian about our topic mixture proportions.Dirichlet Distributions: Dirichlet Distributions In the LDA model, we would like to say that the topic mixture proportions for each document are drawn from some distribution. So, we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one. The space is of all of these multinomials has a nice geometric interpretation as a (k-1)- simplex , which is just a generalization of a triangle to (k-1) dimensions. Criteria for selecting our prior: It needs to be defined for a (k-1)-simplex. Algebraically speaking, we would like it to play nice with the multinomial distribution.Dirichlet Examples: Dirichlet ExamplesDirichlet Distributions: Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter i can be thought of as a prior count of the i th class.The LDA Model: The LDA Model z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 b z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 z 4 z 3 z 2 z 1 w 4 w 3 w 2 w 1 For each document, Choose ~ Dirichlet( ) For each of the N words wn: Choose a topic z n » Multinomial( ) Choose a word w n from p(w n |z n , ), a multinomial probability conditioned on the topic z n .The LDA Model: The LDA Model For each document, Choose » Dirichlet( ) For each of the N words w n : Choose a topic z n » Multinomial( ) Choose a word w n from p(w n |z n , ), a multinomial probability conditioned on the topic z n .Inference: Inference The inference problem in LDA is to compute the posterior of the hidden variables given a document and corpus parameters and . That is, compute p( ,z|w, , ). Unfortunately, exact inference is intractable, so we turn to alternatives…Variational Inference: Variational Inference In variational inference, we consider a simplified graphical model with variational parameters , and minimize the KL Divergence between the variational and posterior distributions.Parameter Estimation: Parameter Estimation Given a corpus of documents, we would like to find the parameters and which maximize the likelihood of the observed data. Strategy ( Variational EM) : Lower bound log p(w| , ) by a function L( , ; , ) Repeat until convergence: Maximize L( , ; , ) with respect to the variational parameters , . Maximize the bound with respect to parameters and .Some Results: Some Results Given a topic, LDA can return the most probable words. For the following results, LDA was trained on 10,000 text articles posted to 20 online newsgroups with 40 iterations of EM. The number of topics was set to 50.Some Results: Some Results Political Team Space Drive God Party Game NASA Windows Jesus Business Play Research Card His Convention Year Center DOS Bible Institute Games Earth SCSI Christian Committee Win Health Disk Christ States Hockey Medical System Him Rights Season Gov Memory Christians “politics” “sports” “space” “computers” “christianity”Extensions/Applications: Extensions/Applications Multimodal Dirichlet Priors Correlated Topic Models Hierarchical Dirichlet Processes Abstract Tagging in Scientific Journals Object Detection/RecognitionVisual Words: Visual Words Idea: Given a collection of images, Think of each image as a document. Think of feature patches of each image as words. Apply the LDA model to extract topics. (J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. Discovering object categories in image collections. MIT AI Lab Memo AIM-2005-005 , February, 2005. )Visual Words: Visual Words Examples of ‘visual words’Visual Words: Visual WordsThanks!: Thanks! Questions? References: Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research , 3:993-1022, January 2003. Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. Hierarchical topic models and the nested Chinese restaurant process. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16 , Cambridge, MA, 2004. MIT Press. Discovering object categories in image collections. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. MIT AI Lab Memo AIM-2005-005 , February, 2005.