logging in or signing up uiuc indri Kliment Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 341 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript An Overview of the Indri Search Engine: An Overview of the Indri Search Engine Don Metzler Center for Intelligent Information Retrieval University of Massachusetts, Amherst Joint work with Trevor Strohman, Howard Turtle, and Bruce CroftOutline: Outline Overview Retrieval Model System Architecture Evaluation ConclusionsZoology 101: Zoology 101 Lemurs are primates found only in Madagascar 50 species (17 are endangered) Ring-tailed lemurs lemur catta Zoology 101: Zoology 101 The indri is the largest type of lemur When first spotted the natives yelled “Indri! Indri!” Malagasy for "Look! Over there!" What is INDRI?: What is INDRI? INDRI is a “larger” version of the Lemur Toolkit Influences INQUERY [Callan, et. al. ’92] Inference network framework Structured query language Lemur [http://www.lemurproject.org/] Language modeling (LM) toolkit Lucene [http://jakarta.apache.org/lucene/docs/index.html] Popular off the shelf Java-based IR system Based on heuristic retrieval models No IR system currently combines all of these featuresDesign Goals: Design Goals Robust retrieval model Inference net + language modeling [Metzler and Croft ’04] Powerful query language Extensions to INQUERY query language driven by requirements of QA, web search, and XML retrieval Designed to be as simple to use as possible, yet robust Off the shelf (Windows, *NIX, Mac platforms) Separate download, compatible with Lemur Simple to set up and use Fully functional API w/ language wrappers for Java, etc… Scalable Highly efficient code Distributed retrievalComparing Collections: Comparing CollectionsOutline: Outline Overview Retrieval Model Model Query Language Applications System Architecture Evaluation ConclusionsDocument Representation: Document Representation <html> <head> <title>Department Descriptions</title> </head> <body> The following list describes … <h1>Agriculture</h1> … <h1>Chemistry</h1> … <h1>Computer Science</h1> … <h1>Electrical Engineering</h1> … … <h1>Zoology</h1> </body> </html> <title>department descriptions</title> <h1>agriculture</h1> <h1>chemistry</h1>… <h1>zoology</h1> . . . <body>the following list describes … <h1>agriculture</h1> … </body> <title> context <body> context <h1> context 1. agriculture 2. chemistry … 36. zoology <h1> extents 1. the following list describes <h1>agriculture </h1> … <body> extents 1. department descriptions <title> extentsModel: Model Based on original inference network retrieval framework [Turtle and Croft ’91] Casts retrieval as inference in simple graphical model Extensions made to original model Incorporation of probabilities based on language modeling rather than tf.idf Multiple language models allowed in the network (one per indexed context)Model: Model D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … I q1 q2 α,βtitle α,βbody α,βh1 Document node (observed) Model hyperparameters (observed) Context language models Representation nodes (terms, phrases, etc…) Belief nodes (#combine, #not, #max) Information need node (belief node) Model: Model I D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … q1 q2 α,βtitle α,βbody α,βh1 P( r | θ ): P( r | θ ) Probability of observing a term, phrase, or “concept” given a context language model ri nodes are binary Assume r ~ Bernoulli( θ ) “Model B” – [Metzler, Lavrenko, Croft ’04] Nearly any model may be used here tf.idf-based estimates (INQUERY) Mixture modelsModel: Model I P( θ | α, β, D ): P( θ | α, β, D ) Prior over context language model determined by α, β Assume P( θ | α, β ) ~ Beta( α, β ) Bernoulli’s conjugate prior αw = μP( w | C ) + 1 βw = μP( ¬ w | C ) + 1 μ is a free parameter Model: Model I D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … q1 q2 α,βtitle α,βbody α,βh1 P( q | r ) and P( I | r ): P( q | r ) and P( I | r ) Belief nodes are created dynamically based on query Belief node CPTs are derived from standard link matrices Combine evidence from parents in various ways Allows fast inference by making marginalization computationally tractable Information need node is simply a belief node that combines all network evidence into a single value Documents are ranked according to: P( I | α, β, D)Example: #AND: Example: #AND A B QQuery Language: Query Language Extension of INQUERY query language Structured query language Term weighting Ordered / unordered windows Synonyms Additional features Language modeling motivated constructs Added flexibility to deal with fields via contexts Generalization of passage retrieval (extent retrieval) Robust query language that handles many current language modeling tasksTerms: TermsDate / Numeric Fields: Date / Numeric FieldsProximity: ProximityContext Restriction: Context RestrictionContext Evaluation: Context EvaluationBelief Operators: Belief Operators * #wsum is still available in INDRI, but should be used with discretionExtent / Passage Retrieval: Extent / Passage RetrievalExtent Retrieval Example: Extent Retrieval Example <document> <section><head>Introduction</head> Statistical language modeling allows formal methods to be applied to information retrieval. ... </section> <section><head>Multinomial Model</head> Here we provide a quick review of multinomial language models. ... </section> <section><head>Multiple-Bernoulli Model</head> We now examine two formal methods for statistically modeling documents and queries based on the multiple-Bernoulli distribution. ... </section> … </document> Query: #combine[section]( dirichlet smoothing ) SCORE DOCID BEGIN END 0.50 IR-352 51 205 0.35 IR-352 405 548 0.15 IR-352 0 50 … … … … 0.15 Treat each section extent as a “document” Score each “document” according to #combine( … ) Return a ranked list of extents. 0.50 0.05Other Operators: Other OperatorsExample Tasks : Example Tasks Ad hoc retrieval Flat documents SGML/XML documents Web search Homepage finding Known-item finding Question answering KL divergence based ranking Query models Relevance modelingAd Hoc Retrieval: Ad Hoc Retrieval Flat documents Query likelihood retrieval: q1 … qN ≡ #combine( q1 … qN ) SGML/XML documents Can either retrieve documents or extents Context restrictions and context evaluations allow exploitation of document structureWeb Search: Web Search Homepage / known-item finding Use mixture model of several document representations [Ogilvie and Callan ’03] Example query: Yahoo! #combine( #wsum( 0.2 yahoo.(body) 0.5 yahoo.(inlink) 0.3 yahoo.(title) ) )Question Answering: Question Answering More expressive passage- and sentence-level retrieval Example: Where was George Washington born? #combine[sentence]( #1( george washington ) born #any:LOCATION ) Returns a ranked list of sentences containing the phrase George Washington, the term born, and a snippet of text tagged as a LOCATION named entityKL / Cross Entropy Ranking: KL / Cross Entropy Ranking INDRI handles ranking via KL / cross entropy Query models [Zhai and Lafferty ’01] Relevance modeling [Lavrenko and Croft ’01] Example: Form user/relevance/query model P(w | θQ) Formulate query as: #weight (P(w1 | θQ) w1 … P(w|V| | θQ) w|V|) Ranked list equivalent to scoring by: KL(θQ || θD) In practice, probably want to truncateOutline: Outline Overview Retrieval Model System Architecture Indexing Query processing Evaluation ConclusionsSystem Overview: System Overview Indexing Inverted lists for terms and fields Repository consists of inverted lists, parsed documents, and document vectors Query processing Local or distributed Computing local / global statistics FeaturesRepository Tasks: Repository Tasks Maintains: inverted lists document vectors field extent lists statistics for each field Store compressed versions of documents Save stopping and stemming informationInverted Lists: Inverted Lists One list per term One list entry for each term occurrence in the corpus Entry: (termID, documentID, position) Delta-encoding, byte-level compression Significant space savings Allows index size to be smaller than collection Space savings translates into higher speed Inverted List Construction: Inverted List Construction All lists stored in one file 50% of terms occur only once Single term entry = approximately 30 bytes Minimum file size: 4K Directory lookup overhead Lists written in segments Collect as much information in memory as possible Write segment when memory is full Merge segments at endField Extent Lists: Field Extent Lists Like inverted lists, but with extent information List entry documentID begin (first word position) end (last word position) number (numeric value of field)Term Statistics: Term Statistics Statistics for collection language models total term count counts for each term document length Field statistics total term count in a field counts for each term in the field document field length Example: “dog” appears: 45 times in the corpus 15 times in a title field Corpus contains 56,450 words Title field contains 12,321 words Query Architecture: Query ArchitectureQuery Processing: Query Processing Parse query Perform query tree transformations Collect query statistics from servers Run the query on servers Retrieve document information from servers Query Parsing: Query Parsing #combine( white house #1(white house) )Query Optimization: Query OptimizationEvaluation: EvaluationOff the Shelf: Off the Shelf Indexing and retrieval GUIs API / Wrappers Java PHP Formats supported TREC (text, web) PDF Word, PowerPoint (Windows only) Text HTMLProgramming Interface (API): Programming Interface (API) Indexing methods open / create addFile / addString / addParsedDocument setStemmer / setStopwords Querying methods addServer / addIndex removeServer / removeIndex setMemory / setScoringRules / setStopwords runQuery / runAnnotatedQuery documents / documentVectors / documentMetadata termCount / termFieldCount / fieldList / documentCountOutline: Outline Overview Retrieval Model System Architecture Evaluation TREC Terabyte Track Efficiency Effectiveness ConclusionsTREC Terabyte Track: TREC Terabyte Track Initial evaluation platform for INDRI Task: ad hoc retrieval on a web corpus Goals: Examine how a larger corpus impacts current retrieval models Develop new evaluation methodologies to deal with hugely insufficient judgmentsTerabyte Track Summary: Terabyte Track Summary GOV2 test collection Collection size: 25,205,179 documents (426 GB) Index size: 253 GB (includes compressed collection) Index time: 6 hours (parallel across 6 machines) ~ 12GB/hr/machine Vocabulary size: 49,657,854 Total terms: 22,811,162,783 Parsing No index-time stopping Porter stemmer Normalization (U.S. => US, etc…) Topics 50 .gov-related standard TREC ad hoc topicsUMass Runs: UMass Runs indri04QL query likelihood indri04QLRM query likelihood + pseudo relevance feedback indri04AW phrases indri04AWRM phrases + pseudo relevance feedback indri04FAW phrases + fieldsindri04QL / indri04QLRM: indri04QL / indri04QLRM Query likelihood Standard query likelihood run Smoothing parameter trained on TREC 9 and 10 main web track data Example: #combine( pearl farming ) Pseudo-relevance feedback Estimate relevance model from top n documents in initial retrieval Augment original query with these term Formulation: #weight( 0.5 #combine( QORIGINAL ) 0.5 #combine( QRM ) )indri04AW / indri04AWRM: indri04AW / indri04AWRM Goal: Given only a title query, automatically construct an Indri query How can we make use of the query language? Include phrases in query Ordered window (#N) Unordered window (#uwN)Example Query: Example Query prostate cancer treatment => #weight( 1.5 prostate 1.5 cancer 1.5 treatment 0.1 #1( prostate cancer ) 0.1 #1( cancer treatment ) 0.1 #1( prostate cancer treatment ) 0.3 #uw8( prostate cancer ) 0.3 #uw8( prostate treatment ) 0.3 #uw8( cancer treatment ) 0.3 #uw12( prostate cancer treatment ) )indri04FAW: indri04FAW Combines evidence from different fields Fields indexed: anchor, title, body, and header (h1, h2, h3, h4) Formulation: #weight( 0.15 QANCHOR 0.25 QTITLE 0.10 QHEADING 0.50 QBODY ) Needs to be explore in more detailSlide56: T = title D = description N = narrative Indri Terabyte Track Results italicized values denote statistical significance over QLSlide57: 33 GB / hr 12 GB / hr 33 GB / hr 3 GB / hr 2 GB / hr Didn’t index entire collectionConclusions: Conclusions INDRI extends INQUERY and Lemur Off the shelf Scalable Geared towards tagged (structured) documents Employs robust inference net approach to retrieval Extended query language can tackle many current retrieval tasks Competitive in both terms of effectiveness and efficiencyQuestions?: Questions? Contact Info Email: metzler@cs.umass.edu Web: http://ciir.cs.umass.edu/~metzler You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
uiuc indri Kliment Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 341 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 16, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript An Overview of the Indri Search Engine: An Overview of the Indri Search Engine Don Metzler Center for Intelligent Information Retrieval University of Massachusetts, Amherst Joint work with Trevor Strohman, Howard Turtle, and Bruce CroftOutline: Outline Overview Retrieval Model System Architecture Evaluation ConclusionsZoology 101: Zoology 101 Lemurs are primates found only in Madagascar 50 species (17 are endangered) Ring-tailed lemurs lemur catta Zoology 101: Zoology 101 The indri is the largest type of lemur When first spotted the natives yelled “Indri! Indri!” Malagasy for "Look! Over there!" What is INDRI?: What is INDRI? INDRI is a “larger” version of the Lemur Toolkit Influences INQUERY [Callan, et. al. ’92] Inference network framework Structured query language Lemur [http://www.lemurproject.org/] Language modeling (LM) toolkit Lucene [http://jakarta.apache.org/lucene/docs/index.html] Popular off the shelf Java-based IR system Based on heuristic retrieval models No IR system currently combines all of these featuresDesign Goals: Design Goals Robust retrieval model Inference net + language modeling [Metzler and Croft ’04] Powerful query language Extensions to INQUERY query language driven by requirements of QA, web search, and XML retrieval Designed to be as simple to use as possible, yet robust Off the shelf (Windows, *NIX, Mac platforms) Separate download, compatible with Lemur Simple to set up and use Fully functional API w/ language wrappers for Java, etc… Scalable Highly efficient code Distributed retrievalComparing Collections: Comparing CollectionsOutline: Outline Overview Retrieval Model Model Query Language Applications System Architecture Evaluation ConclusionsDocument Representation: Document Representation <html> <head> <title>Department Descriptions</title> </head> <body> The following list describes … <h1>Agriculture</h1> … <h1>Chemistry</h1> … <h1>Computer Science</h1> … <h1>Electrical Engineering</h1> … … <h1>Zoology</h1> </body> </html> <title>department descriptions</title> <h1>agriculture</h1> <h1>chemistry</h1>… <h1>zoology</h1> . . . <body>the following list describes … <h1>agriculture</h1> … </body> <title> context <body> context <h1> context 1. agriculture 2. chemistry … 36. zoology <h1> extents 1. the following list describes <h1>agriculture </h1> … <body> extents 1. department descriptions <title> extentsModel: Model Based on original inference network retrieval framework [Turtle and Croft ’91] Casts retrieval as inference in simple graphical model Extensions made to original model Incorporation of probabilities based on language modeling rather than tf.idf Multiple language models allowed in the network (one per indexed context)Model: Model D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … I q1 q2 α,βtitle α,βbody α,βh1 Document node (observed) Model hyperparameters (observed) Context language models Representation nodes (terms, phrases, etc…) Belief nodes (#combine, #not, #max) Information need node (belief node) Model: Model I D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … q1 q2 α,βtitle α,βbody α,βh1 P( r | θ ): P( r | θ ) Probability of observing a term, phrase, or “concept” given a context language model ri nodes are binary Assume r ~ Bernoulli( θ ) “Model B” – [Metzler, Lavrenko, Croft ’04] Nearly any model may be used here tf.idf-based estimates (INQUERY) Mixture modelsModel: Model I P( θ | α, β, D ): P( θ | α, β, D ) Prior over context language model determined by α, β Assume P( θ | α, β ) ~ Beta( α, β ) Bernoulli’s conjugate prior αw = μP( w | C ) + 1 βw = μP( ¬ w | C ) + 1 μ is a free parameter Model: Model I D θtitle θbody θh1 r1 rN … r1 rN … r1 rN … q1 q2 α,βtitle α,βbody α,βh1 P( q | r ) and P( I | r ): P( q | r ) and P( I | r ) Belief nodes are created dynamically based on query Belief node CPTs are derived from standard link matrices Combine evidence from parents in various ways Allows fast inference by making marginalization computationally tractable Information need node is simply a belief node that combines all network evidence into a single value Documents are ranked according to: P( I | α, β, D)Example: #AND: Example: #AND A B QQuery Language: Query Language Extension of INQUERY query language Structured query language Term weighting Ordered / unordered windows Synonyms Additional features Language modeling motivated constructs Added flexibility to deal with fields via contexts Generalization of passage retrieval (extent retrieval) Robust query language that handles many current language modeling tasksTerms: TermsDate / Numeric Fields: Date / Numeric FieldsProximity: ProximityContext Restriction: Context RestrictionContext Evaluation: Context EvaluationBelief Operators: Belief Operators * #wsum is still available in INDRI, but should be used with discretionExtent / Passage Retrieval: Extent / Passage RetrievalExtent Retrieval Example: Extent Retrieval Example <document> <section><head>Introduction</head> Statistical language modeling allows formal methods to be applied to information retrieval. ... </section> <section><head>Multinomial Model</head> Here we provide a quick review of multinomial language models. ... </section> <section><head>Multiple-Bernoulli Model</head> We now examine two formal methods for statistically modeling documents and queries based on the multiple-Bernoulli distribution. ... </section> … </document> Query: #combine[section]( dirichlet smoothing ) SCORE DOCID BEGIN END 0.50 IR-352 51 205 0.35 IR-352 405 548 0.15 IR-352 0 50 … … … … 0.15 Treat each section extent as a “document” Score each “document” according to #combine( … ) Return a ranked list of extents. 0.50 0.05Other Operators: Other OperatorsExample Tasks : Example Tasks Ad hoc retrieval Flat documents SGML/XML documents Web search Homepage finding Known-item finding Question answering KL divergence based ranking Query models Relevance modelingAd Hoc Retrieval: Ad Hoc Retrieval Flat documents Query likelihood retrieval: q1 … qN ≡ #combine( q1 … qN ) SGML/XML documents Can either retrieve documents or extents Context restrictions and context evaluations allow exploitation of document structureWeb Search: Web Search Homepage / known-item finding Use mixture model of several document representations [Ogilvie and Callan ’03] Example query: Yahoo! #combine( #wsum( 0.2 yahoo.(body) 0.5 yahoo.(inlink) 0.3 yahoo.(title) ) )Question Answering: Question Answering More expressive passage- and sentence-level retrieval Example: Where was George Washington born? #combine[sentence]( #1( george washington ) born #any:LOCATION ) Returns a ranked list of sentences containing the phrase George Washington, the term born, and a snippet of text tagged as a LOCATION named entityKL / Cross Entropy Ranking: KL / Cross Entropy Ranking INDRI handles ranking via KL / cross entropy Query models [Zhai and Lafferty ’01] Relevance modeling [Lavrenko and Croft ’01] Example: Form user/relevance/query model P(w | θQ) Formulate query as: #weight (P(w1 | θQ) w1 … P(w|V| | θQ) w|V|) Ranked list equivalent to scoring by: KL(θQ || θD) In practice, probably want to truncateOutline: Outline Overview Retrieval Model System Architecture Indexing Query processing Evaluation ConclusionsSystem Overview: System Overview Indexing Inverted lists for terms and fields Repository consists of inverted lists, parsed documents, and document vectors Query processing Local or distributed Computing local / global statistics FeaturesRepository Tasks: Repository Tasks Maintains: inverted lists document vectors field extent lists statistics for each field Store compressed versions of documents Save stopping and stemming informationInverted Lists: Inverted Lists One list per term One list entry for each term occurrence in the corpus Entry: (termID, documentID, position) Delta-encoding, byte-level compression Significant space savings Allows index size to be smaller than collection Space savings translates into higher speed Inverted List Construction: Inverted List Construction All lists stored in one file 50% of terms occur only once Single term entry = approximately 30 bytes Minimum file size: 4K Directory lookup overhead Lists written in segments Collect as much information in memory as possible Write segment when memory is full Merge segments at endField Extent Lists: Field Extent Lists Like inverted lists, but with extent information List entry documentID begin (first word position) end (last word position) number (numeric value of field)Term Statistics: Term Statistics Statistics for collection language models total term count counts for each term document length Field statistics total term count in a field counts for each term in the field document field length Example: “dog” appears: 45 times in the corpus 15 times in a title field Corpus contains 56,450 words Title field contains 12,321 words Query Architecture: Query ArchitectureQuery Processing: Query Processing Parse query Perform query tree transformations Collect query statistics from servers Run the query on servers Retrieve document information from servers Query Parsing: Query Parsing #combine( white house #1(white house) )Query Optimization: Query OptimizationEvaluation: EvaluationOff the Shelf: Off the Shelf Indexing and retrieval GUIs API / Wrappers Java PHP Formats supported TREC (text, web) PDF Word, PowerPoint (Windows only) Text HTMLProgramming Interface (API): Programming Interface (API) Indexing methods open / create addFile / addString / addParsedDocument setStemmer / setStopwords Querying methods addServer / addIndex removeServer / removeIndex setMemory / setScoringRules / setStopwords runQuery / runAnnotatedQuery documents / documentVectors / documentMetadata termCount / termFieldCount / fieldList / documentCountOutline: Outline Overview Retrieval Model System Architecture Evaluation TREC Terabyte Track Efficiency Effectiveness ConclusionsTREC Terabyte Track: TREC Terabyte Track Initial evaluation platform for INDRI Task: ad hoc retrieval on a web corpus Goals: Examine how a larger corpus impacts current retrieval models Develop new evaluation methodologies to deal with hugely insufficient judgmentsTerabyte Track Summary: Terabyte Track Summary GOV2 test collection Collection size: 25,205,179 documents (426 GB) Index size: 253 GB (includes compressed collection) Index time: 6 hours (parallel across 6 machines) ~ 12GB/hr/machine Vocabulary size: 49,657,854 Total terms: 22,811,162,783 Parsing No index-time stopping Porter stemmer Normalization (U.S. => US, etc…) Topics 50 .gov-related standard TREC ad hoc topicsUMass Runs: UMass Runs indri04QL query likelihood indri04QLRM query likelihood + pseudo relevance feedback indri04AW phrases indri04AWRM phrases + pseudo relevance feedback indri04FAW phrases + fieldsindri04QL / indri04QLRM: indri04QL / indri04QLRM Query likelihood Standard query likelihood run Smoothing parameter trained on TREC 9 and 10 main web track data Example: #combine( pearl farming ) Pseudo-relevance feedback Estimate relevance model from top n documents in initial retrieval Augment original query with these term Formulation: #weight( 0.5 #combine( QORIGINAL ) 0.5 #combine( QRM ) )indri04AW / indri04AWRM: indri04AW / indri04AWRM Goal: Given only a title query, automatically construct an Indri query How can we make use of the query language? Include phrases in query Ordered window (#N) Unordered window (#uwN)Example Query: Example Query prostate cancer treatment => #weight( 1.5 prostate 1.5 cancer 1.5 treatment 0.1 #1( prostate cancer ) 0.1 #1( cancer treatment ) 0.1 #1( prostate cancer treatment ) 0.3 #uw8( prostate cancer ) 0.3 #uw8( prostate treatment ) 0.3 #uw8( cancer treatment ) 0.3 #uw12( prostate cancer treatment ) )indri04FAW: indri04FAW Combines evidence from different fields Fields indexed: anchor, title, body, and header (h1, h2, h3, h4) Formulation: #weight( 0.15 QANCHOR 0.25 QTITLE 0.10 QHEADING 0.50 QBODY ) Needs to be explore in more detailSlide56: T = title D = description N = narrative Indri Terabyte Track Results italicized values denote statistical significance over QLSlide57: 33 GB / hr 12 GB / hr 33 GB / hr 3 GB / hr 2 GB / hr Didn’t index entire collectionConclusions: Conclusions INDRI extends INQUERY and Lemur Off the shelf Scalable Geared towards tagged (structured) documents Employs robust inference net approach to retrieval Extended query language can tackle many current retrieval tasks Competitive in both terms of effectiveness and efficiencyQuestions?: Questions? Contact Info Email: metzler@cs.umass.edu Web: http://ciir.cs.umass.edu/~metzler