logging in or signing up 04 1 IR Basics 3 Carmela Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 370 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Web Search – Summer Term 2006II. Information Retrieval (Basics Cont.): Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-UniversityOrganizational Remarks: Organizational Remarks Exercises: Please, register for the exercises by sending me (huerst@informatik.uni-freiburg.de) an email till Friday, May 5th, with - Your name, - Matrikelnummer, - Studiengang (BA, MSc, Diploma, …) - Plans for exam (yes, no, undecided) This is just to organize the exercises but has no effect if you decide to drop this course later.Recap: IR System & Tasks Involved: INDEX Recap: IR System & Tasks Involved INFORMATION NEED DOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSINGEvaluation of IR Systems: Evaluation of IR Systems Standard approaches for algorithm and computer system evaluation Speed / processing time Storage requirements Correctness of used algorithms and their implementation But most importantly Performance, effectiveness Another important issue: Usability, users’ perception Questions: What is a good / better search engine? How to measure search engine quality? How to perform evaluations? Etc.What does Performance/Effectivenessof IR Systems mean?: What does Performance/Effectiveness of IR Systems mean? Typical questions: How good is the quality of a system? Which system should I buy? Which one is better? How can I measure the quality of a system? What does quality mean for me? Etc. Their answer depends on users, application, … Very different views and perceptions User vs. search engine provider, developer vs. manager, seller vs. buyer, … And remember: Queries can be ambiguous, unspecific, etc. Hence, in practice, use restrictions and idealization, e.g. only binary decisionsPrecision & Recall: Precision & Recall PRECISION = # FOUND & RELEVANT # FOUND RECALL = # FOUND & RELEVANT # RELEVANT RESULT: DOCUMENTS: 1. DOC. B 2. DOC. E 3. DOC. F 4. DOC. G 5. DOC. D 6. DOC. H Restrictions: 0/1 Relevance, Set instead of order/ranking But: We can use this for eval. of ranking, too (via top N docs.)Calculating Precision & Recall: Calculating Precision & Recall Precision: Can be calculated directly from the result Recall: Requires relevance ratings for whole (!) data collection In practice: Approaches to estimate recall 1.) Use a representative sample instead of whole data collection 2.) Document-source method 3.) Expanding queries 4.) Compare result with external sources 5.) Pooling methodPrecision & Recall – Special cases: Precision & Recall – Special cases Special treatment is necessary, if no doc. is found or no relevant docs. exist (division by zero) NO REL. DOC. EXISTS: A = C = 0 1st CASE: B = 0 2nd CASE: B > 0 EMPTY RESULT SET: A = B = 0 1st CASE: C = 0 2nd CASE: C > 0Precision & Recall Graphs: Precision & Recall Graphs Comparing 2 systems: System 1: Prec 1 = 0.6, Rec 1 = 0.3 System 2: Prec 2 = 0.4, Rec 2 = 0.6 Which one is better? Prec.-Recall-Graph:The F Measure: The F Measure Alternative measures exist, including ones combining Prec. p and Rec. r in 1 single value Example: The F Measure ( = rel. weight for recall, manually set) SOURCE: N. FUHR (UNIV. DUISBURG) SKRIPTUM ZUR VORLESUNG INFORMATION RETRIEVAL, SS 2006 Example for different Calculating Average Prec. Values: Calculating Average Prec. Values 1. Macro assessment Estimates the expected value for the precision of a randomly chosen query (query or user oriented) Problem: Queries with empty result set 2. Micro assessment Estimates the likelihood of a randomly chosen doc. being relevant (document or system oriented) Problem: Does not support monotonyMonotony of Precision & Recall: Monotony of Precision & Recall Monotony: Adding a query that delivers the same results for both systems does not change their quality assessment. Example (Precision): Precision & Recall for Rankings: Distinguish between linear and weak ranking Basic idea: Evaluate precision and recall by looking at the top n results for different n Generally: Precision decreases and recall increases with growing n Precision & Recall for Rankings PRECISION RECALLPrecision & Recall for Rankings (Cont.): Precision & Recall for Rankings (Cont.)Realizing Evaluations: Realizing Evaluations Now we have a system to evaluate and: Measures to quantify performance Methods to calculate them What else do we need? Documents dj (test set) Tasks (information needs) and respective queries qi Relevance judgments rij (normally binary) Results (delivered by the system) Evaluation = comparison of Given, perfect result: (qi, dj, rij) with result from the system: (qi, dj, rij(S1))The TREC Conference Series: The TREC Conference Series In the old days: IR evaluation critical because No good (i.e. big) test sets No comparability because of different test sets Motivation for initiatives such as TREC: Text REtrieval Conference (TREC), since 1992, see http://trec.nist.gov/ Goals of TREC: Create realistic, significant test sets Achieve comparability of different systems Establish common basics for IR evaluation Increase technology transfer between industries and researchThe TREC Conf. Series (Cont.): The TREC Conf. Series (Cont.) TREC offers Various collections of test data Standardized retrieval tasks (queries & topics) Related relevance measures Different tasks (tracks) for certain problems Examples for Tracks targeted by TREC: Traditional text retrieval Spoken document retrieval Non-English or multilingual retrieval Information filtering User interactions Web search, SPAM (since 2005), Blog (since 2005) Video retrieval etc.Advantages and Disadv. of TREC: Advantages and Disadv. of TREC TREC (and other IR initiatives) Very successful, progress which otherwise might probably not have happened But disadvantages exist as well, e.g. Only compares performance but not actual reasons for different behavior Unrealistic data (e.g. still too small, not represen- tative enough) Often just batch mode evaluation, no interactivity or user experience (Note: There are interactivity tracks!) Often no analysis of significance Note: Most of these arguments are general problems of IR evaluation and not necessarily TREC specificTREC Home Page: TREC Home Page Visit the TREC site at http://trec.nist.gov and browse the different Tracks (gives you an idea about what is going on in the IR community)Recap: IR System & Tasks Involved: INDEX Recap: IR System & Tasks Involved INFORMATION NEED DOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSING You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
04 1 IR Basics 3 Carmela Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 370 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Web Search – Summer Term 2006II. Information Retrieval (Basics Cont.): Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-UniversityOrganizational Remarks: Organizational Remarks Exercises: Please, register for the exercises by sending me (huerst@informatik.uni-freiburg.de) an email till Friday, May 5th, with - Your name, - Matrikelnummer, - Studiengang (BA, MSc, Diploma, …) - Plans for exam (yes, no, undecided) This is just to organize the exercises but has no effect if you decide to drop this course later.Recap: IR System & Tasks Involved: INDEX Recap: IR System & Tasks Involved INFORMATION NEED DOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSINGEvaluation of IR Systems: Evaluation of IR Systems Standard approaches for algorithm and computer system evaluation Speed / processing time Storage requirements Correctness of used algorithms and their implementation But most importantly Performance, effectiveness Another important issue: Usability, users’ perception Questions: What is a good / better search engine? How to measure search engine quality? How to perform evaluations? Etc.What does Performance/Effectivenessof IR Systems mean?: What does Performance/Effectiveness of IR Systems mean? Typical questions: How good is the quality of a system? Which system should I buy? Which one is better? How can I measure the quality of a system? What does quality mean for me? Etc. Their answer depends on users, application, … Very different views and perceptions User vs. search engine provider, developer vs. manager, seller vs. buyer, … And remember: Queries can be ambiguous, unspecific, etc. Hence, in practice, use restrictions and idealization, e.g. only binary decisionsPrecision & Recall: Precision & Recall PRECISION = # FOUND & RELEVANT # FOUND RECALL = # FOUND & RELEVANT # RELEVANT RESULT: DOCUMENTS: 1. DOC. B 2. DOC. E 3. DOC. F 4. DOC. G 5. DOC. D 6. DOC. H Restrictions: 0/1 Relevance, Set instead of order/ranking But: We can use this for eval. of ranking, too (via top N docs.)Calculating Precision & Recall: Calculating Precision & Recall Precision: Can be calculated directly from the result Recall: Requires relevance ratings for whole (!) data collection In practice: Approaches to estimate recall 1.) Use a representative sample instead of whole data collection 2.) Document-source method 3.) Expanding queries 4.) Compare result with external sources 5.) Pooling methodPrecision & Recall – Special cases: Precision & Recall – Special cases Special treatment is necessary, if no doc. is found or no relevant docs. exist (division by zero) NO REL. DOC. EXISTS: A = C = 0 1st CASE: B = 0 2nd CASE: B > 0 EMPTY RESULT SET: A = B = 0 1st CASE: C = 0 2nd CASE: C > 0Precision & Recall Graphs: Precision & Recall Graphs Comparing 2 systems: System 1: Prec 1 = 0.6, Rec 1 = 0.3 System 2: Prec 2 = 0.4, Rec 2 = 0.6 Which one is better? Prec.-Recall-Graph:The F Measure: The F Measure Alternative measures exist, including ones combining Prec. p and Rec. r in 1 single value Example: The F Measure ( = rel. weight for recall, manually set) SOURCE: N. FUHR (UNIV. DUISBURG) SKRIPTUM ZUR VORLESUNG INFORMATION RETRIEVAL, SS 2006 Example for different Calculating Average Prec. Values: Calculating Average Prec. Values 1. Macro assessment Estimates the expected value for the precision of a randomly chosen query (query or user oriented) Problem: Queries with empty result set 2. Micro assessment Estimates the likelihood of a randomly chosen doc. being relevant (document or system oriented) Problem: Does not support monotonyMonotony of Precision & Recall: Monotony of Precision & Recall Monotony: Adding a query that delivers the same results for both systems does not change their quality assessment. Example (Precision): Precision & Recall for Rankings: Distinguish between linear and weak ranking Basic idea: Evaluate precision and recall by looking at the top n results for different n Generally: Precision decreases and recall increases with growing n Precision & Recall for Rankings PRECISION RECALLPrecision & Recall for Rankings (Cont.): Precision & Recall for Rankings (Cont.)Realizing Evaluations: Realizing Evaluations Now we have a system to evaluate and: Measures to quantify performance Methods to calculate them What else do we need? Documents dj (test set) Tasks (information needs) and respective queries qi Relevance judgments rij (normally binary) Results (delivered by the system) Evaluation = comparison of Given, perfect result: (qi, dj, rij) with result from the system: (qi, dj, rij(S1))The TREC Conference Series: The TREC Conference Series In the old days: IR evaluation critical because No good (i.e. big) test sets No comparability because of different test sets Motivation for initiatives such as TREC: Text REtrieval Conference (TREC), since 1992, see http://trec.nist.gov/ Goals of TREC: Create realistic, significant test sets Achieve comparability of different systems Establish common basics for IR evaluation Increase technology transfer between industries and researchThe TREC Conf. Series (Cont.): The TREC Conf. Series (Cont.) TREC offers Various collections of test data Standardized retrieval tasks (queries & topics) Related relevance measures Different tasks (tracks) for certain problems Examples for Tracks targeted by TREC: Traditional text retrieval Spoken document retrieval Non-English or multilingual retrieval Information filtering User interactions Web search, SPAM (since 2005), Blog (since 2005) Video retrieval etc.Advantages and Disadv. of TREC: Advantages and Disadv. of TREC TREC (and other IR initiatives) Very successful, progress which otherwise might probably not have happened But disadvantages exist as well, e.g. Only compares performance but not actual reasons for different behavior Unrealistic data (e.g. still too small, not represen- tative enough) Often just batch mode evaluation, no interactivity or user experience (Note: There are interactivity tracks!) Often no analysis of significance Note: Most of these arguments are general problems of IR evaluation and not necessarily TREC specificTREC Home Page: TREC Home Page Visit the TREC site at http://trec.nist.gov and browse the different Tracks (gives you an idea about what is going on in the IR community)Recap: IR System & Tasks Involved: INDEX Recap: IR System & Tasks Involved INFORMATION NEED DOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSING