logging in or signing up 202 20051117 Sibilla Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 93 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Introduction to Probabilistic Information Retrieval Today: Today Introduction – Why probabilistic IR? Background: Probabilistic Models Bayes Theorem and Bayesian inference Bayesian Inference for IR Probabilistic Indexing (Model 1) Probabilistic Retrieval (Model 2) Unified Model (Model 3) Model 0 and real-world IR Regression ModelsDocuments in Vector Space: Documents in Vector Space t1 t2 t3 D1 D2 D10 D3 D9 D4 D7 D8 D5 D11 D6Problems with Vector Space: Problems with Vector Space There is no real theoretical basis for the assumption of a term space it is more for visualization than having any real basis most similarity measures work about the same regardless of model Terms are not really orthogonal dimensions Terms are not independent of all other termsProbabilistic Models: Probabilistic Models Rigorous formal model attempts to predict the probability that a given document will be relevant to a given query Ranks retrieved documents according to this probability of relevance (Probability Ranking Principle) Rely on accurate estimates of probabilitiesProbability Ranking Principle: Probability Ranking Principle If a reference retrieval system’s response to each request is a ranking of the documents in the collections in the order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data has been made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data. Stephen E. Robertson, J. Documentation 1977Bayes’ Formula: Bayes’ Formula Bayesian statistical inference -- used for a wide variety of inferential situation Background assumptions… A, B, and C are some eventsBayes’ theorem: Bayes’ theorem For example: A: disease B: symptom I.e., the “a priori probabilities’Bayes’ Theorem: Application: Bayes’ Theorem: Application Box1 Box2 p(box1) = .5 P(red ball | box1) = .4 P(blue ball | box1) = .6 p(box2) = .5 P(red ball | box2) = .5 P(blue ball | box2) = .5 Toss a fair coin. If it lands head up, draw a ball from box 1; otherwise, draw a ball from box 2. If the ball is blue, what is the probability that it is drawn from box 2?Bayes Example : A drugs manufacturer claims that its roadside drug test will detect the presence of cannabis in the blood (i.e. show positive for a driver who has smoked cannabis in the last 72 hours) 90% of the time. However, the manufacturer admits that 10% of all cannabis-free drivers also test positive. A national survey indicates that 20% of all drivers have smoked cannabis during the last 72 hours. Draw a complete Bayesian tree for the scenario described above Bayes Example The following examples are from http://www.dcs.ex.ac.uk/~anarayan/teaching/com2408/)Bayes Example – cont.: (ii) One of your friends has just told you that she was recently stopped by the police and the roadside drug test for the presence of cannabis showed positive. She denies having smoked cannabis since leaving university several months ago (and even then she says that she didn’t inhale). Calculate the probability that your friend smoked cannabis during the 72 hours preceding the drugs test. That is, we calculate the probability of your friend having smoked cannabis given that she tested positive. (F=smoked cannabis, E=tests positive) That is, there is only a 31% chance that your friend is telling the truth. Bayes Example – cont.Bayes Example – cont.: New information arrives which indicates that, while the roadside drugs test will now show positive for a driver who has smoked cannabis 99.9% of the time, the number of cannabis-free drivers testing positive has gone up to 20%. Re-draw your Bayesian tree and recalculate the probability to determine whether this new information increases or decreases the chances that your friend is telling the truth. That is, the new information has increased the chance that your friend is telling the truth by 13%, but the chances still are that she is lying (just). Bayes Example – cont.More Complex Bayes: The Bayes Theorem example includes only two events. Consider a more complex tree/network: If an event E at a leaf node happens (say, M) and we wish to know whether this supports A, we need to ‘chain’ our Bayesian rule as follows: P(A,C,F,M)=P(A|C,F,M)*P(C|F,M)*P(F|M)*P(M) That is, P(X1,X2,…,Xn)= where Pai= parents(Xi) More Complex BayesExample (taken from IDIS website): Imagine the following set of rules: If it is raining or sprinklers are on then the street is wet. If it is raining or sprinklers are on then the lawn is wet. If the lawn is wet then the soil is moist. If the soil is moist then the roses are OK. Example (taken from IDIS website) Graph representation of rules Example (taken from IDIS website)Bayesian Networks: We can construct conditional probabilities for each (binary) attribute to reflect our knowledge of the world: (These probabilities are arbitrary.) Bayesian NetworksSlide16: The joint probability of the state where the roses are OK, the soil is dry, the lawn is wet, the street is wet, the sprinklers are off and it is raining is: P(sprinklers=F, rain=T, street=wet, lawn=wet, soil=dry, roses=OK) = P(roses=OK|soil=dry) * P(soil=dry|lawn=wet) * P(lawn=wet|rain=T, sprinklers=F) * P(street=wet|rain=T, sprinklers=F) * P(sprinklers=F) * P(rain=T) = 0.2*0.1*1.0*1.0*0.6*0.7=0.0084Calculating probabilities in sequence: Calculating probabilities in sequence Now imagine we are told that the roses are OK. What can we infer about the state of the lawn? That is, P(lawn=wet|roses=OK) and P(lawn=dry|roses=OK)? We have to work through soil first. P(roses OK|soil=moist)=0.7; P(roses OK|soil=dry)=0.2 P(soil=moist|lawn=wet)=0.9; P(soil=dry|lawn=wet)=0.1* P(soil=dry|lawn=dry)=0.6; P(soil=moist|lawn=dry)=0.4* P(R, S, L)= P(R) * P(R|S) * P(S|L) For R=ok, S=moist, L=wet, 1.0*0.7*0.9 = 0.63 For R=ok, S=dry, L=wet, 1.0*0.2*0.1= 0.02 For R=ok, S=moist, L=dry, 1.0*0.7*0.4=0.28 For R=ok, S=dry, L=dry, 1.0*0.2*0.6=0.12 Lawn=wet = 0.63+0.02 = 0.65 (un-normalised) Lawn=dry = 0.28+0.12 = 0.3 (un-normalised) That is, there is greater chance that the lawn is wet. *inferredProblems with Bayes nets: Problems with Bayes nets Loops can sometimes occur with belief networks and have to be avoided. We have avoided the issue of where the probabilities come from. The probabilities either are given or have to be learned. Similarly, the network structure also has to be learned. (See http://www.bayesware.com/products/discoverer/discoverer.html) The number of paths to explore grows exponentially with each node. (The problem of exact probabilistic inference in Bayes network is NP=hard. Approximation techniques may have to be used.) Applications: Applications You have all used Bayes Belief Networks, probably a few dozen times, when you use Microsoft Office! (See http://research.microsoft.com/~horvitz/lum.htm) As you have read, Bayesian networks are also used in spam filters Another application is IR where the EVENT you want to estimate a probability for is whether a document is relevant for a particular queryBayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR Goal: want to estimate the probability that a document D is relevant to a given query. It is easier to estimate log odds of probability of relevance (and odds avoid problems with invalid probabilities in some calculations)Bayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR If documents are represented by binary vectors, then Steven & Sparck Jones term weighting Bayes Theorem: Application in IR: Bayes Theorem: Application in IRBayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR The task of estimating probability of relevance reduces to estimate the class-conditional probability density functions. Log odds of relevance:Overview of Probabilistic Models: Overview of Probabilistic Models Probabilistic Models Probabilistic Indexing (Model 1) Probabilistic Retrieval (Model 2) Unified Model (Model 3) Model 0 and real-world IR Regression Models OthersModel 1 – Maron and Kuhns: Model 1 – Maron and Kuhns Concerned with estimating probabilities of relevance at the point of indexing: If a patron came with a request using term ti, what is the probability that she/he would be satisfied with document Dj ? Model 1: Model 1 A patron submits a query (call it Q) consisting of some specification of her/his information need. Different patrons submitting the same stated query may differ as to whether or not they judge a specific document to be relevant. The function of the retrieval system is to compute for each individual document the probability that it will be judged relevant by a patron who has submitted query Q. Robertson, Maron & Cooper, 1982Model 1 Bayes: Model 1 Bayes A is the class of events of using the system Di is the class of events of Document i being judged relevant Ij is the class of queries consisting of the single term Ij P(Di|A,Ij) = probability that if a query is submitted to the system then a relevant document is retrieved Model 2: Model 2 Documents have many different properties; some documents have all the properties that the patron asked for, and other documents have only some or none of the properties. If the inquiring patron were to examine all of the documents in the collection she/he might find that some having all the sought after properties were relevant, but others (with the same properties) were not relevant. And conversely, he/she might find that some of the documents having none (or only a few) of the sought after properties were relevant, others not. The function of a document retrieval system is to compute the probability that a document is relevant, given that it has one (or a set) of specified properties. Robertson, Maron & Cooper, 1982Model 2 – Robertson & Sparck Jones: Model 2 – Robertson & Sparck Jones Document Relevance Document indexing Given a term t and a query q + - + r n-r n - R-r N-n-R+r N-n R N-R NRobertson-Spark Jones Weights: Robertson-Spark Jones Weights Retrospective formulation --Robertson-Sparck Jones Weights: Robertson-Sparck Jones Weights Predictive formulation Probabilistic Models: Some Unifying Notation: Probabilistic Models: Some Unifying Notation D = All present and future documents Q = All present and future queries (Di,Qj) = A document query pair x = class of similar documents, y = class of similar queries, Relevance is a relation:Probabilistic Models: Probabilistic Models Model 1 -- Probabilistic Indexing, P(R|y,Di) Model 2 -- Probabilistic Querying, P(R|Qj,x) Model 3 -- Merged Model, P(R| Qj, Di) Model 0 -- P(R|y,x) Probabilities are estimated based on prior usage or relevance estimationProbabilistic Models: Probabilistic Models Q D x y Di QjLogistic Regression: Logistic Regression Another approach to estimating probability of relevance Based on work by William Cooper, Fred Gey and Daniel Dabney Builds a regression model for relevance prediction based on a set of training data Uses less restrictive independence assumptions than Model 2 Linked DependenceSo What’s Regression?: So What’s Regression? A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion The most common type of regression is linear regression What’s Regression?: What’s Regression? Least Squares Fitting is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantityLogistic Regression: Logistic RegressionProbabilistic Models: Logistic Regression: Probabilistic Models: Logistic Regression Estimates for relevance based on log-linear model with various statistical measures of document content as independent variables Log odds of relevance is a linear function of attributes: Term contributions summed: Probability of Relevance is inverse of log odds:Logistic Regression Attributes: Logistic Regression Attributes Average Absolute Query Frequency Query Length Average Absolute Document Frequency Document Length Average Inverse Document Frequency Inverse Document Frequency Number of Terms in common between query and document -- logged Logistic Regression: Logistic Regression Probability of relevance is based on Logistic regression from a sample set of documents to determine values of the coefficients At retrieval the probability estimate is obtained by: For the 6 X attribute measures shown previously Other Probabilistic Models: Other Probabilistic Models Language Models… There are many more probabilistically based approaches to IR, but LM is the one that has shown best results in evaluations Language Models: Language Models A new approach to probabilistic IR, derived from work in automatic speech recognition, OCR and MT Language models attempt to statistically model the use of language in a collection to estimate the probability that a query was generated from a particular document The assumption is, roughly, that if the query could have come from the document, then that document is likely to be relevantPonte and Croft LM: Ponte and Croft LM For the original Ponte and Croft Language Models the goal is to estimate: That is, the probability of query given the language model of document d. One approach would be to use: I.e., the Maximum likelihood estimate of the probability of term t in document d, where tf(t,d) is the raw term freq. in doc d and dld is the total number of tokens in document d You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
202 20051117 Sibilla Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 93 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Introduction to Probabilistic Information Retrieval Today: Today Introduction – Why probabilistic IR? Background: Probabilistic Models Bayes Theorem and Bayesian inference Bayesian Inference for IR Probabilistic Indexing (Model 1) Probabilistic Retrieval (Model 2) Unified Model (Model 3) Model 0 and real-world IR Regression ModelsDocuments in Vector Space: Documents in Vector Space t1 t2 t3 D1 D2 D10 D3 D9 D4 D7 D8 D5 D11 D6Problems with Vector Space: Problems with Vector Space There is no real theoretical basis for the assumption of a term space it is more for visualization than having any real basis most similarity measures work about the same regardless of model Terms are not really orthogonal dimensions Terms are not independent of all other termsProbabilistic Models: Probabilistic Models Rigorous formal model attempts to predict the probability that a given document will be relevant to a given query Ranks retrieved documents according to this probability of relevance (Probability Ranking Principle) Rely on accurate estimates of probabilitiesProbability Ranking Principle: Probability Ranking Principle If a reference retrieval system’s response to each request is a ranking of the documents in the collections in the order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data has been made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data. Stephen E. Robertson, J. Documentation 1977Bayes’ Formula: Bayes’ Formula Bayesian statistical inference -- used for a wide variety of inferential situation Background assumptions… A, B, and C are some eventsBayes’ theorem: Bayes’ theorem For example: A: disease B: symptom I.e., the “a priori probabilities’Bayes’ Theorem: Application: Bayes’ Theorem: Application Box1 Box2 p(box1) = .5 P(red ball | box1) = .4 P(blue ball | box1) = .6 p(box2) = .5 P(red ball | box2) = .5 P(blue ball | box2) = .5 Toss a fair coin. If it lands head up, draw a ball from box 1; otherwise, draw a ball from box 2. If the ball is blue, what is the probability that it is drawn from box 2?Bayes Example : A drugs manufacturer claims that its roadside drug test will detect the presence of cannabis in the blood (i.e. show positive for a driver who has smoked cannabis in the last 72 hours) 90% of the time. However, the manufacturer admits that 10% of all cannabis-free drivers also test positive. A national survey indicates that 20% of all drivers have smoked cannabis during the last 72 hours. Draw a complete Bayesian tree for the scenario described above Bayes Example The following examples are from http://www.dcs.ex.ac.uk/~anarayan/teaching/com2408/)Bayes Example – cont.: (ii) One of your friends has just told you that she was recently stopped by the police and the roadside drug test for the presence of cannabis showed positive. She denies having smoked cannabis since leaving university several months ago (and even then she says that she didn’t inhale). Calculate the probability that your friend smoked cannabis during the 72 hours preceding the drugs test. That is, we calculate the probability of your friend having smoked cannabis given that she tested positive. (F=smoked cannabis, E=tests positive) That is, there is only a 31% chance that your friend is telling the truth. Bayes Example – cont.Bayes Example – cont.: New information arrives which indicates that, while the roadside drugs test will now show positive for a driver who has smoked cannabis 99.9% of the time, the number of cannabis-free drivers testing positive has gone up to 20%. Re-draw your Bayesian tree and recalculate the probability to determine whether this new information increases or decreases the chances that your friend is telling the truth. That is, the new information has increased the chance that your friend is telling the truth by 13%, but the chances still are that she is lying (just). Bayes Example – cont.More Complex Bayes: The Bayes Theorem example includes only two events. Consider a more complex tree/network: If an event E at a leaf node happens (say, M) and we wish to know whether this supports A, we need to ‘chain’ our Bayesian rule as follows: P(A,C,F,M)=P(A|C,F,M)*P(C|F,M)*P(F|M)*P(M) That is, P(X1,X2,…,Xn)= where Pai= parents(Xi) More Complex BayesExample (taken from IDIS website): Imagine the following set of rules: If it is raining or sprinklers are on then the street is wet. If it is raining or sprinklers are on then the lawn is wet. If the lawn is wet then the soil is moist. If the soil is moist then the roses are OK. Example (taken from IDIS website) Graph representation of rules Example (taken from IDIS website)Bayesian Networks: We can construct conditional probabilities for each (binary) attribute to reflect our knowledge of the world: (These probabilities are arbitrary.) Bayesian NetworksSlide16: The joint probability of the state where the roses are OK, the soil is dry, the lawn is wet, the street is wet, the sprinklers are off and it is raining is: P(sprinklers=F, rain=T, street=wet, lawn=wet, soil=dry, roses=OK) = P(roses=OK|soil=dry) * P(soil=dry|lawn=wet) * P(lawn=wet|rain=T, sprinklers=F) * P(street=wet|rain=T, sprinklers=F) * P(sprinklers=F) * P(rain=T) = 0.2*0.1*1.0*1.0*0.6*0.7=0.0084Calculating probabilities in sequence: Calculating probabilities in sequence Now imagine we are told that the roses are OK. What can we infer about the state of the lawn? That is, P(lawn=wet|roses=OK) and P(lawn=dry|roses=OK)? We have to work through soil first. P(roses OK|soil=moist)=0.7; P(roses OK|soil=dry)=0.2 P(soil=moist|lawn=wet)=0.9; P(soil=dry|lawn=wet)=0.1* P(soil=dry|lawn=dry)=0.6; P(soil=moist|lawn=dry)=0.4* P(R, S, L)= P(R) * P(R|S) * P(S|L) For R=ok, S=moist, L=wet, 1.0*0.7*0.9 = 0.63 For R=ok, S=dry, L=wet, 1.0*0.2*0.1= 0.02 For R=ok, S=moist, L=dry, 1.0*0.7*0.4=0.28 For R=ok, S=dry, L=dry, 1.0*0.2*0.6=0.12 Lawn=wet = 0.63+0.02 = 0.65 (un-normalised) Lawn=dry = 0.28+0.12 = 0.3 (un-normalised) That is, there is greater chance that the lawn is wet. *inferredProblems with Bayes nets: Problems with Bayes nets Loops can sometimes occur with belief networks and have to be avoided. We have avoided the issue of where the probabilities come from. The probabilities either are given or have to be learned. Similarly, the network structure also has to be learned. (See http://www.bayesware.com/products/discoverer/discoverer.html) The number of paths to explore grows exponentially with each node. (The problem of exact probabilistic inference in Bayes network is NP=hard. Approximation techniques may have to be used.) Applications: Applications You have all used Bayes Belief Networks, probably a few dozen times, when you use Microsoft Office! (See http://research.microsoft.com/~horvitz/lum.htm) As you have read, Bayesian networks are also used in spam filters Another application is IR where the EVENT you want to estimate a probability for is whether a document is relevant for a particular queryBayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR Goal: want to estimate the probability that a document D is relevant to a given query. It is easier to estimate log odds of probability of relevance (and odds avoid problems with invalid probabilities in some calculations)Bayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR If documents are represented by binary vectors, then Steven & Sparck Jones term weighting Bayes Theorem: Application in IR: Bayes Theorem: Application in IRBayes’ Theorem: Application in IR: Bayes’ Theorem: Application in IR The task of estimating probability of relevance reduces to estimate the class-conditional probability density functions. Log odds of relevance:Overview of Probabilistic Models: Overview of Probabilistic Models Probabilistic Models Probabilistic Indexing (Model 1) Probabilistic Retrieval (Model 2) Unified Model (Model 3) Model 0 and real-world IR Regression Models OthersModel 1 – Maron and Kuhns: Model 1 – Maron and Kuhns Concerned with estimating probabilities of relevance at the point of indexing: If a patron came with a request using term ti, what is the probability that she/he would be satisfied with document Dj ? Model 1: Model 1 A patron submits a query (call it Q) consisting of some specification of her/his information need. Different patrons submitting the same stated query may differ as to whether or not they judge a specific document to be relevant. The function of the retrieval system is to compute for each individual document the probability that it will be judged relevant by a patron who has submitted query Q. Robertson, Maron & Cooper, 1982Model 1 Bayes: Model 1 Bayes A is the class of events of using the system Di is the class of events of Document i being judged relevant Ij is the class of queries consisting of the single term Ij P(Di|A,Ij) = probability that if a query is submitted to the system then a relevant document is retrieved Model 2: Model 2 Documents have many different properties; some documents have all the properties that the patron asked for, and other documents have only some or none of the properties. If the inquiring patron were to examine all of the documents in the collection she/he might find that some having all the sought after properties were relevant, but others (with the same properties) were not relevant. And conversely, he/she might find that some of the documents having none (or only a few) of the sought after properties were relevant, others not. The function of a document retrieval system is to compute the probability that a document is relevant, given that it has one (or a set) of specified properties. Robertson, Maron & Cooper, 1982Model 2 – Robertson & Sparck Jones: Model 2 – Robertson & Sparck Jones Document Relevance Document indexing Given a term t and a query q + - + r n-r n - R-r N-n-R+r N-n R N-R NRobertson-Spark Jones Weights: Robertson-Spark Jones Weights Retrospective formulation --Robertson-Sparck Jones Weights: Robertson-Sparck Jones Weights Predictive formulation Probabilistic Models: Some Unifying Notation: Probabilistic Models: Some Unifying Notation D = All present and future documents Q = All present and future queries (Di,Qj) = A document query pair x = class of similar documents, y = class of similar queries, Relevance is a relation:Probabilistic Models: Probabilistic Models Model 1 -- Probabilistic Indexing, P(R|y,Di) Model 2 -- Probabilistic Querying, P(R|Qj,x) Model 3 -- Merged Model, P(R| Qj, Di) Model 0 -- P(R|y,x) Probabilities are estimated based on prior usage or relevance estimationProbabilistic Models: Probabilistic Models Q D x y Di QjLogistic Regression: Logistic Regression Another approach to estimating probability of relevance Based on work by William Cooper, Fred Gey and Daniel Dabney Builds a regression model for relevance prediction based on a set of training data Uses less restrictive independence assumptions than Model 2 Linked DependenceSo What’s Regression?: So What’s Regression? A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion The most common type of regression is linear regression What’s Regression?: What’s Regression? Least Squares Fitting is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantityLogistic Regression: Logistic RegressionProbabilistic Models: Logistic Regression: Probabilistic Models: Logistic Regression Estimates for relevance based on log-linear model with various statistical measures of document content as independent variables Log odds of relevance is a linear function of attributes: Term contributions summed: Probability of Relevance is inverse of log odds:Logistic Regression Attributes: Logistic Regression Attributes Average Absolute Query Frequency Query Length Average Absolute Document Frequency Document Length Average Inverse Document Frequency Inverse Document Frequency Number of Terms in common between query and document -- logged Logistic Regression: Logistic Regression Probability of relevance is based on Logistic regression from a sample set of documents to determine values of the coefficients At retrieval the probability estimate is obtained by: For the 6 X attribute measures shown previously Other Probabilistic Models: Other Probabilistic Models Language Models… There are many more probabilistically based approaches to IR, but LM is the one that has shown best results in evaluations Language Models: Language Models A new approach to probabilistic IR, derived from work in automatic speech recognition, OCR and MT Language models attempt to statistically model the use of language in a collection to estimate the probability that a query was generated from a particular document The assumption is, roughly, that if the query could have come from the document, then that document is likely to be relevantPonte and Croft LM: Ponte and Croft LM For the original Ponte and Croft Language Models the goal is to estimate: That is, the probability of query given the language model of document d. One approach would be to use: I.e., the Maximum likelihood estimate of the probability of term t in document d, where tf(t,d) is the raw term freq. in doc d and dld is the total number of tokens in document d