lars 120903

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Clustering for Large Result Sets: 

Clustering for Large Result Sets Proposed problem: large result sets, most of which are irrelevant to the “intended” meaning Our example: “cheap beautiful Jaguars for sale” Google returns Jaguar car parts, perfume, Jacksonville Jaguar tickets, … (animals? Mac OS 10.2?) Existing technology: teoma.com, vivismo.com, others? Many clusters appear identical (Classified ads from Fresno, Cars and trucks in San Francisco, Cars and trucks in Los Angeles) Others don’t appear (animals, even though top link was “Exotic animals for sale”) Some are just weird (Shop Slashdot? Real Estate? Mexico?)

Research Topics: 

Research Topics Context What makes these clustering engines work? What knowledge-base techniques could apply here? Natural language processing? Research directions On what basis do we create clusters? Dictionary definitions → document classification using classic IR techniques Fixed number (easier for user to visualize?) → minimize some sort of distance metric Neural-net classifiers…? How do we refine the query? assuming we’ve narrowed it down to cars, we may want to cluster on “cheap” or “beautiful” unstructured, semi-structured → structured How do we present the results? How do we describe a cluster? What document best represents a cluster?