SET7

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Search Engine Technology #7 http://www.cs.columbia.edu/~radev/SET07.html: 

Search Engine Technology #7 http://www.cs.columbia.edu/~radev/SET07.html October 18, 2007 Prof. Dragomir R. Radev radev@umich.edu

Slide2: 

SET – Fall 2007 … 11. Lexical semantics and wordnet …

Lexical Networks: 

Lexical Networks Used to represent relationships between words Example: WordNet - created by George Miller’s team at Princeton Based on synsets (synonyms, interchangeable words) and lexical matrices

Lexical matrix: 

Lexical matrix

Synsets: 

Synsets Disambiguation {board, plank} {board, committee} Synonyms substitution weak substitution synonyms must be of the same part of speech

Slide6: 

$ ./wn board -hypen Synonyms/Hypernyms (Ordered by Frequency) of noun board 9 senses of board Sense 1 board => committee, commission => administrative unit => unit, social unit => organization, organisation => social group => group, grouping Sense 2 board => sheet, flat solid => artifact, artefact => object, physical object => entity, something Sense 3 board, plank => lumber, timber => building material => artifact, artefact => object, physical object => entity, something

Slide7: 

Sense 4 display panel, display board, board => display => electronic device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 5 board, gameboard => surface => artifact, artefact => object, physical object => entity, something Sense 6 board, table => fare => food, nutrient => substance, matter => object, physical object => entity, something

Slide8: 

Sense 7 control panel, instrument panel, control board, board, panel => electrical device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 8 circuit board, circuit card, board, card => printed circuit => computer circuit => circuit, electrical circuit, electric circuit => electrical device => device => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something Sense 9 dining table, board => table => furniture, piece of furniture, article of furniture => furnishings => instrumentality, instrumentation => artifact, artefact => object, physical object => entity, something

Antonymy: 

Antonymy “x” vs. “not-x” “rich” vs. “poor”? {rise, ascend} vs. {fall, descend}

Other relations: 

Other relations Meronymy: X is a meronym of Y when native speakers of English accept sentences similar to “X is a part of Y”, “X is a member of Y”. Hyponymy: {tree} is a hyponym of {plant}. Hierarchical structure based on hyponymy (and hypernymy).

Other features of WordNet: 

Other features of WordNet Index of familiarity Polysemy

Familiarity and polysemy: 

board used as a noun is familiar (polysemy count = 9) bird used as a noun is common (polysemy count = 5) cat used as a noun is common (polysemy count = 7) house used as a noun is familiar (polysemy count = 11) information used as a noun is common (polysemy count = 5) retrieval used as a noun is uncommon (polysemy count = 3) serendipity used as a noun is very rare (polysemy count = 1) Familiarity and polysemy

Compound nouns: 

Compound nouns advisory board appeals board backboard backgammon board baseboard basketball backboard big board billboard binder's board binder board blackboard board game board measure board meeting board member board of appeals board of directors board of education board of regents board of trustees

Overview of senses: 

Overview of senses 1. board -- (a committee having supervisory powers; "the board has seven members") 2. board -- (a flat piece of material designed for a special purpose; "he nailed boards across the windows") 3. board, plank -- (a stout length of sawn timber; made in a wide variety of sizes and used for many purposes) 4. display panel, display board, board -- (a board on which information can be displayed to public view) 5. board, gameboard -- (a flat portable surface (usually rectangular) designed for board games; "he got out the board and set up the pieces") 6. board, table -- (food or meals in general; "she sets a fine table"; "room and board") 7. control panel, instrument panel, control board, board, panel -- (an insulated panel containing switches and dials and meters for controlling electrical devices; "he checked the instrument panel"; "suddenly the board lit up like a Christmas tree") 8. circuit board, circuit card, board, card -- (a printed circuit that can be inserted into expansion slots in a computer to increase the computer's capabilities) 9. dining table, board -- (a table at which meals are served; "he helped her clear the dining table"; "a feast was spread upon the board")

Top-level concepts: 

Top-level concepts {act, action, activity} {animal, fauna} {artifact} {attribute, property} {body, corpus} {cognition, knowledge} {communication} {event, happening} {feeling, emotion} {food} {group, collection} {location, place} {motive} {natural object} {natural phenomenon} {person, human being} {plant, flora} {possession} {process} {quantity, amount} {relation} {shape} {state, condition} {substance} {time}

WordNet parameters: 

WordNet parameters wn reason -hypen - hypernyms wn reason -synsn - synsets wn reason -simsn - synonyms wn reason -over - overview of senses wn reason -famln - familiarity/polysemy wn reason -grepn - compound nouns

Slide17: 

SET – Fall 2007 … 12. Latent semantic indexing Singular value decomposition …

Problems with lexical semantics: 

Problems with lexical semantics Polysemy (sim < cos) Bar, bank, jaguar, hot Synonymy (sim > cos) Building/edifice, Large/big, Spicy/hot Relatedness Doctor/patient/nurse/treatment Sparse matrix Need: dimensionality reduction

Techniques for dimensionality reduction: 

Techniques for dimensionality reduction Based on matrix decomposition (goal: preserve clusters, explain away variance) A quick review of matrices Vectors Matrices Matrix multiplication

Eigenvectors and eigenvalues: 

Eigenvectors and eigenvalues An eigenvector is an implicit “direction” for a matrix where v (eigenvector) is non-zero, though λ (eigenvalue) can be any complex number in principle Computing eigenvalues:

Eigenvectors and eigenvalues: 

Eigenvectors and eigenvalues Example: Det (A-lI) = (-1-l)*(-l)-3*2=0 Then: l+l2-6=0; l1=2; l2=-3 For l1=2: Solutions: x1=x2

Matrix decomposition: 

Matrix decomposition If S is a square matrix, it can be decomposed into ULU-1 where U = matrix of eigenvectors L = diagonal matrix of eigenvalues SU = UL U-1SU = L S = ULU-1

Example: 

Example

Example: 

Example Eigenvalues are 3, 2, 0 x is an arbitrary vector, yet Sx depends on the eigenvalues and eigenvectors

SVD: Singular Value Decomposition: 

SVD: Singular Value Decomposition A=USVT U is the matrix of orthogonal eigenvectors of AAT V is the matrix of orthogonal eigenvectors of ATA The components of S are the eigenvalues of ATA This decomposition exists for all matrices, dense or sparse If A has 5 columns and 3 rows, then U will be 5x5 and V will be 3x3 In Matlab, use [U,S,V] = svd (A)

Term matrix normalization: 

Term matrix normalization D1 D2 D3 D4 D5 D1 D2 D3 D4 D5

Example (Berry and Browne): 

Example (Berry and Browne) T1: baby T2: child T3: guide T4: health T5: home T6: infant T7: proofing T8: safety T9: toddler D1: infant & toddler first aid D2: babies & children’s room (for your home) D3: child safety at home D4: your baby’s health and safety: from infant to toddler D5: baby proofing basics D6: your guide to easy rust proofing D7: beanie babies collector’s guide

Document term matrix: 

Document term matrix

Decomposition: 

Decomposition u = -0.6976 -0.0945 0.0174 -0.6950 0.0000 0.0153 0.1442 -0.0000 0 -0.2622 0.2946 0.4693 0.1968 -0.0000 -0.2467 -0.1571 -0.6356 0.3098 -0.3519 -0.4495 -0.1026 0.4014 0.7071 -0.0065 -0.0493 -0.0000 0.0000 -0.1127 0.1416 -0.1478 -0.0734 0.0000 0.4842 -0.8400 0.0000 -0.0000 -0.2622 0.2946 0.4693 0.1968 0.0000 -0.2467 -0.1571 0.6356 -0.3098 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 -0.3098 -0.6356 -0.3519 -0.4495 -0.1026 0.4014 -0.7071 -0.0065 -0.0493 0.0000 -0.0000 -0.2112 0.3334 0.0962 0.2819 -0.0000 0.7338 0.4659 -0.0000 0.0000 -0.1883 0.3756 -0.5035 0.1273 -0.0000 -0.2293 0.0339 0.3098 0.6356 v = -0.1687 0.4192 -0.5986 0.2261 0 -0.5720 0.2433 -0.4472 0.2255 0.4641 -0.2187 0.0000 -0.4871 -0.4987 -0.2692 0.4206 0.5024 0.4900 -0.0000 0.2450 0.4451 -0.3970 0.4003 -0.3923 -0.1305 0 0.6124 -0.3690 -0.4702 -0.3037 -0.0507 -0.2607 -0.7071 0.0110 0.3407 -0.3153 -0.5018 -0.1220 0.7128 -0.0000 -0.0162 -0.3544 -0.4702 -0.3037 -0.0507 -0.2607 0.7071 0.0110 0.3407

Decomposition: 

Decomposition s = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0.7100 0 0 0 0 0 0 0 0.5692 0 0 0 0 0 0 0 0.1977 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Spread on the v1 axis

Rank-4 approximation: 

Rank-4 approximation s4 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 1.1946 0 0 0 0 0 0 0 0.7996 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Rank-4 approximation: 

Rank-4 approximation u*s4*v' -0.0019 0.5985 -0.0148 0.4552 0.7002 0.0102 0.7002 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.1980 0.0514 0.0064 0.2199 0.0535 -0.0544 0.0535 -0.0728 0.4961 0.6282 0.0745 0.0121 -0.0133 0.0121 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008 0.0003 -0.0067 0.0052 -0.0013 0.3584 0.7065 0.3584 0.2165 0.2494 0.4367 0.2282 -0.0360 0.0394 -0.0360 0.6337 -0.0602 0.0290 0.5324 -0.0008 0.0003 -0.0008

Rank-4 approximation: 

Rank-4 approximation u*s4 -1.1056 -0.1203 0.0207 -0.5558 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.1786 0.1801 -0.1765 -0.0587 0 0 0 -0.4155 0.3748 0.5606 0.1573 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0 -0.5576 -0.5719 -0.1226 0.3210 0 0 0 -0.3348 0.4241 0.1149 0.2255 0 0 0 -0.2984 0.4778 -0.6015 0.1018 0 0 0

Rank-4 approximation: 

Rank-4 approximation s4*v' -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 -0.7150 0.5544 0.6001 -0.4686 -0.0605 -0.1457 -0.0605 0.1808 -0.1749 0.3918 -0.1043 -0.2085 0.5700 -0.2085 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Rank-2 approximation: 

Rank-2 approximation s2 = 1.5849 0 0 0 0 0 0 0 1.2721 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Rank-2 approximation: 

Rank-2 approximation u*s2*v' 0.1361 0.4673 0.2470 0.3908 0.5563 0.4089 0.5563 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.1057 0.1205 0.1239 0.1430 0.0293 -0.0341 0.0293 0.2272 0.2703 0.2695 0.3150 0.0815 -0.0571 0.0815 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048 -0.1457 0.1204 -0.0904 -0.0075 0.4358 0.4628 0.4358 0.2343 0.2454 0.2685 0.3027 0.0286 -0.1073 0.0286 0.2507 0.2412 0.2813 0.3097 -0.0048 -0.1457 -0.0048

Rank-2 approximation: 

Rank-2 approximation u*s2 -1.1056 -0.1203 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.1786 0.1801 0 0 0 0 0 -0.4155 0.3748 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0 -0.5576 -0.5719 0 0 0 0 0 -0.3348 0.4241 0 0 0 0 0 -0.2984 0.4778 0 0 0 0 0

Rank-2 approximation: 

Rank-2 approximation s2*v' -0.2674 -0.7087 -0.4266 -0.6292 -0.7451 -0.4996 -0.7451 0.5333 0.2869 0.5351 0.5092 -0.3863 -0.6384 -0.3863 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Documents to concepts and terms to concepts: 

Documents to concepts and terms to concepts A(:,1)'*u*s -0.4238 0.6784 -0.8541 0.1446 -0.0000 -0.1853 0.0095 >> A(:,1)'*u*s4 -0.4238 0.6784 -0.8541 0.1446 0 0 0 >> A(:,1)'*u*s2 -0.4238 0.6784 0 0 0 0 0 >> A(:,2)'*u*s2 -1.1233 0.3650 0 0 0 0 0 >> A(:,3)'*u*s2 -0.6762 0.6807 0 0 0 0 0

Documents to concepts and terms to concepts: 

Documents to concepts and terms to concepts >> A(:,4)'*u*s2 -0.9972 0.6478 0 0 0 0 0 >> A(:,5)'*u*s2 -1.1809 -0.4914 0 0 0 0 0 >> A(:,6)'*u*s2 -0.7918 -0.8121 0 0 0 0 0 >> A(:,7)'*u*s2 -1.1809 -0.4914 0 0 0 0 0

Cont’d: 

Cont’d >> (s2*v'*A(1,:)')' -1.7523 -0.1530 0 0 0 0 0 0 0 >> (s2*v'*A(2,:)')' -0.6585 0.4768 0 0 0 0 0 0 0 >> (s2*v'*A(3,:)')' -0.8838 -0.7275 0 0 0 0 0 0 0 >> (s2*v'*A(4,:)')' -0.2831 0.2291 0 0 0 0 0 0 0 >> (s2*v'*A(5,:)')' -0.6585 0.4768 0 0 0 0 0 0 0

Cont’d: 

Cont’d >> (s2*v'*A(6,:)')' -0.4730 0.6078 0 0 0 0 0 0 0 >> (s2*v'*A(7,:)')' -0.8838 -0.7275 0 0 0 0 0 0 0 >> (s2*v'*A(8,:)')' -0.5306 0.5395 0 0 0 0 0 0 0 >> (s2*v'*A(9,:)')‘ -0.4730 0.6078 0 0 0 0 0 0 0

Properties: 

Properties A*A' 1.5471 0.3364 0.5041 0.2025 0.3364 0.2025 0.5041 0.2025 0.2025 0.3364 0.6728 0 0 0.6728 0 0 0.3364 0 0.5041 0 1.0082 0 0 0 0.5041 0 0 0.2025 0 0 0.2025 0 0.2025 0 0.2025 0.2025 0.3364 0.6728 0 0 0.6728 0 0 0.3364 0 0.2025 0 0 0.2025 0 0.7066 0 0.2025 0.7066 0.5041 0 0.5041 0 0 0 1.0082 0 0 0.2025 0.3364 0 0.2025 0.3364 0.2025 0 0.5389 0.2025 0.2025 0 0 0.2025 0 0.7066 0 0.2025 0.7066 A'*A 1.0082 0 0 0.6390 0 0 0 0 1.0092 0.6728 0.2610 0.4118 0 0.4118 0 0.6728 1.0092 0.2610 0 0 0 0.6390 0.2610 0.2610 1.0125 0.3195 0 0.3195 0 0.4118 0 0.3195 1.0082 0.5041 0.5041 0 0 0 0 0.5041 1.0082 0.5041 0 0.4118 0 0.3195 0.5041 0.5041 1.0082 A is a document to term matrix. What is A*A’, what is A’*A?

Latent semantic indexing (LSI): 

Latent semantic indexing (LSI) Dimensionality reduction = identification of hidden (latent) concepts Query matching in latent space

Useful pointers: 

Useful pointers http://lsa.colorado.edu http://lsi.research.telcordia.com http://www.cs.utk.edu/~lsi

Readings: 

Readings For October 11: MRS18 For October 18: MRS17, MRS19 For October 25: MRS20

authorStream Live Help