Latent Semantic Indexing (Part 2)

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide 1: 

Keywords www.lsikeywords.com Latent Semantic Indexing

Slide 2: 

Keywords www.lsikeywords.com/ Quick recipe for generating a list of content words from a document:

Slide 3: 

Keywords www.lsikeywords.com/ Discard pronouns Discard common verbs (know, see, do, be) Discard articles, prepositions, and conjunctions Make a complete list of all the words that appear anywhere in the collection

Slide 4: 

Keywords www.lsikeywords.com/ Discard any words that appear in only one document Discard any words that appear in every document Discard frilly words (therefore, thus, however, albeit, etc.) Discard common adjectives (big, late, high)

Slide 5: 

Keywords www.lsikeywords.com/ Want to know MORE….Thinking inside the Grid ( find me images and example of this)

Slide 6: 

Keywords www.lsikeywords.com/ Using our list of content words and documents, we can now generate a term-document matrix. This is a fancy name for a very large grid, with documents listed along the horizontal axis, and content words along the vertical axis. For each content word in our list, we go across the appropriate row and put an 'X' in the column for any document where that word appears. If the word does not appear, we leave that column blank.

Slide 7: 

Keywords www.lsikeywords.com/ Doing this for every word and document in our collection gives us a mostly empty grid with a sparse scattering of X-es. This grid displays everything that we know about our document collection. We can list all the content words in any given document by looking for X-es in the appropriate column, or we can find all the documents containing a certain content word by looking across the appropriate row.

Slide 8: 

Keywords www.lsikeywords.com/ Notice that our arrangement is binary - a square in our grid either contains an X, or it doesn't. This big grid is the visual equivalent of a generic keyword search, which looks for exact matches between documents and keywords. If we replace blanks and X-es with zeroes and ones, we get a numerical matrix containing the same information.

Slide 9: 

Keywords www.lsikeywords.com/ The key step in LSI is decomposing this matrix using a technique called singular value decomposition. The mathematics of this transformation is beyond the scope of this article but we can get an intuitive grasp of what SVD does by thinking of the process spatially. An analogy will help.

Slide 10: 

Keywords www.lsikeywords.com/ Breakfast in Hyperspace (need examples and images)

Slide 11: 

Keywords www.lsikeywords.com/ You can graph the results of your survey by setting up a chart with three orthogonal axes - one for each keyword. The choice of direction is arbitrary - perhaps a bacon axis in the x direction, an eggs axis in the y direction, and the all-important coffee axis in the z direction. To plot a particular breakfast order, you count the occurrence of each keyword, and then take the appropriate number of steps along the axis for that word. When you are finished, you get a cloud of points in three-dimensional space, representing all of that day's breakfast orders.

Slide 12: 

Keywords www.lsikeywords.com/

Slide 13: 

Keywords www.lsikeywords.com/ If you draw a line from the origin of the graph to each of these points, you obtain a set of vectors in 'bacon-eggs-and-coffee' space. The size and direction of each vector tells you how many of the three key items were in any particular order, and the set of all the vectors taken together tells you something about the kind of breakfast people favor on a Saturday morning.

Slide 14: 

Keywords www.lsikeywords.com Singular Value Decomposition

Slide 15: 

Keywords www.lsikeywords.com Imagine you keep tropical fish, and are proud of your prize aquarium - so proud that you want to submit a picture of it to Modern Aquaria magazine, for fame and profit. To get the best possible picture, you will want to choose a good angle from which to take the photo. You want to make sure that as many of the fish as possible are visible in your picture, without being hidden by other fish in the foreground.

Slide 16: 

Keywords www.lsikeywords.com www.lsikeywords.com Check it at: