logging in or signing up 1015 1 Arkwright26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... By: niu86niu0120 (13 month(s) ago) this one is cool Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Authoritative Sources in a Hyperlinked Environment: Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01Main Idea: Main Idea Return more authoritative information, using the algorithm based on link structure analysis, esp. the relationship between a set of relevant authoritative pages and the set of “hub” pages that join them together in the link structureThe types of queries: The types of queries Specific queries – scarcity problem “Does Netscape support the JDK 1.1 code-signing API” Broad-topic queries – abundance problem “Find information about the Java programming language.” Solution: find a small set of the most “authoritative” relevant pages. Similar-page queries “Find pages ‘similar’ to java.sun.comLinked Based Analysis: Linked Based Analysis Limitations of text based analysis Text-based ranking function Eg. Could www.harvard.edu be recognized as one of the most authoritative pages, since many other webpages contain “harvard” more often. Pages are not sufficiently self – descriptive Usually the term “search engine” doesn’t appear on search engine web pagesLimitations of Basic Link-based Analysis: Limitations of Basic Link-based Analysis Basic model: Of all pages containing the query string, return those with the greatest number of in-links Large number of links are created primarily for navigational purposes Difficulty in finding a balance between relevance and popularity Popular sites like www.yahoo.com would be considered as highly authoritative on any query strings it contained. P Q Conferred AuthorityHub-Authority method: Hub-Authority method A link-based model for the conferral of authority, using the method that consistently identifies relevant, authoritative WWW pages for broad search topics. Difference from Clustering Distinguishing pages related to different meanings or senses of a query termStep1: Constructing a Focused Subgraph of the WWW: Step1: Constructing a Focused Subgraph of the WWW Properties of a ideal collection S of pages S is relatively small S is rich in relevant pages S contains most(or many) of the strongest authorities Construction of S Collect the t highest-ranked pages for the query from a text-based search engine (Altavista, hotbot) as root set R Expand R to base set S a strong authority is quite likely to be pointed to by at least one page in R Keep only transverse linksStep2: Compute Hubs and Authorities: Step2: Compute Hubs and Authorities Authoritative pages should have large in-degree have a considerable overlap in the sets of pages that point to them, since they share a common topic “Hub” pages “pull together”(link to) authorities on a common topicAn Iterative Algorithm: An Iterative Algorithm G : the subgraph induced on the pages in S x(p) : authority weight of Page P y(p): hub weight of Page P Normalize: pS (x(p))2 = 1 pS (y(p))2 = 1 I operation: x(p) = q : (q,p) Ey(p) O operation: y(p) = q : (q,p) Ex(p) Hub and Authority exhibit “mutually reinforcing relationship”An Iterative Algorithm(cont.): An Iterative Algorithm(cont.) {x(p) } is a vector x with authority weight for each page in G {y(p) } is a vector y with hub weight for each page in G Filter out the top c authorities and top c hubsBasic knowledge of Matrix: Basic knowledge of Matrix M: symmetric n*n matrix :vector : a number If for some vector , M = , we say, The set of all such is a subspace of Rn Eigenspace associated with ; These 1(M), 2(M), … are eigenvalues, while 1(M), 2(M), … are eigenvectors i(M) belongs to the subspace of i(M) If we assume |1(M) > 2(M)|, we refer to 1(M) as the principal eigenvector, and all other i(M) as non-principal eigenvector. Convergence Proof of Iterate Procedure: Convergence Proof of Iterate Procedure Theorem1. The sequences x1, x2, x3, … and y1, y2, y3, … converge to x* and y* respectively. Proof: G=(V,E); V={p1, p2, …, pn}; A is the adjacency matrix of graph G; Aij = 1 if (pi, pj) is an edge of G. I & O operations can be written as: x ATy y Ax K loops, So, x (1) AT Ax (0); x(0) = AT z x* … x (k) (AT A)k-1 AT z y* … y (k) (AAT)k z “if is a vector not orthogonal to the principle eigenvector 1(M), the unit vector in the direction of Mk converges to 1(M) as k increases without bound”Convergence Proof of Iterate Procedure(cont.): Convergence Proof of Iterate Procedure(cont.) A is called an orthogonal matrix if AAT = AT A = E. Theorem2: x* is the principal eigenvector of ATA, and y* is the principal eigenvector of AAT. Experiment finds that k=20 is sufficient for the convergence of vectors.Experimental results: Experimental results Most of these authorities do not contain any occurrences of the initial query string , except http://www.roadhead.comSimilar-Page Queries: Similar-Page Queries The strongest authorities in the local region of the link structure near p, can server as a broad-topic summary of the pages related to p. A small difference in initial request to the search engine: “Find t pages pointing to p”, to get R Multiple sets of Hubs and Authorities: Multiple sets of Hubs and Authorities Why? The query string may have several very different meanings. Eg.“java” The string may arise as a term in the context of multiple technical communities. Eg. “randomized algorithms” The string may refer to a highly polarized issue, involving groups that are not likely to link to one another. Eg. “abortion” Idea: The NON-principle eigenvectors of ATA and AAT provide us with a natural way to extract additional densely linked collections of hubs and authorities from the base set S.Multiple sets of Hubs and AuthoritiesExperimental result1: Multiple sets of Hubs and Authorities Experimental result1 For the query “jaguar*”, the strongest collections of authoritative sources concerned the Atari Jaguar product, the NFL football team from Jacksonville, and the automobile.Multiple sets of Hubs and AuthoritiesExperimental result2: Multiple sets of Hubs and Authorities Experimental result2 For the query “randomized algorithms”, none of the strongest collections of hubs and authorities are precisely on the query topic. They include home pages of theoretical computer scientists, compendia of mathesmatical software and pages on wavelets.Multiple sets of Hubs and AuthoritiesExperimental result3: Multiple sets of Hubs and Authorities Experimental result3 For the query “abortion”, there is a natural separation.Conclusion: Conclusion A technique for locating high-quality information related to a broad search topic on the www, based on a structural analysis of the link topology surrounding “authoritative” pages on the topic. Related work. Standing, influence in social networks, scientific citations, etc. Hypertext and WWW rankings … You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
1015 1 Arkwright26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... By: niu86niu0120 (13 month(s) ago) this one is cool Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Authoritative Sources in a Hyperlinked Environment: Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01Main Idea: Main Idea Return more authoritative information, using the algorithm based on link structure analysis, esp. the relationship between a set of relevant authoritative pages and the set of “hub” pages that join them together in the link structureThe types of queries: The types of queries Specific queries – scarcity problem “Does Netscape support the JDK 1.1 code-signing API” Broad-topic queries – abundance problem “Find information about the Java programming language.” Solution: find a small set of the most “authoritative” relevant pages. Similar-page queries “Find pages ‘similar’ to java.sun.comLinked Based Analysis: Linked Based Analysis Limitations of text based analysis Text-based ranking function Eg. Could www.harvard.edu be recognized as one of the most authoritative pages, since many other webpages contain “harvard” more often. Pages are not sufficiently self – descriptive Usually the term “search engine” doesn’t appear on search engine web pagesLimitations of Basic Link-based Analysis: Limitations of Basic Link-based Analysis Basic model: Of all pages containing the query string, return those with the greatest number of in-links Large number of links are created primarily for navigational purposes Difficulty in finding a balance between relevance and popularity Popular sites like www.yahoo.com would be considered as highly authoritative on any query strings it contained. P Q Conferred AuthorityHub-Authority method: Hub-Authority method A link-based model for the conferral of authority, using the method that consistently identifies relevant, authoritative WWW pages for broad search topics. Difference from Clustering Distinguishing pages related to different meanings or senses of a query termStep1: Constructing a Focused Subgraph of the WWW: Step1: Constructing a Focused Subgraph of the WWW Properties of a ideal collection S of pages S is relatively small S is rich in relevant pages S contains most(or many) of the strongest authorities Construction of S Collect the t highest-ranked pages for the query from a text-based search engine (Altavista, hotbot) as root set R Expand R to base set S a strong authority is quite likely to be pointed to by at least one page in R Keep only transverse linksStep2: Compute Hubs and Authorities: Step2: Compute Hubs and Authorities Authoritative pages should have large in-degree have a considerable overlap in the sets of pages that point to them, since they share a common topic “Hub” pages “pull together”(link to) authorities on a common topicAn Iterative Algorithm: An Iterative Algorithm G : the subgraph induced on the pages in S x(p) : authority weight of Page P y(p): hub weight of Page P Normalize: pS (x(p))2 = 1 pS (y(p))2 = 1 I operation: x(p) = q : (q,p) Ey(p) O operation: y(p) = q : (q,p) Ex(p) Hub and Authority exhibit “mutually reinforcing relationship”An Iterative Algorithm(cont.): An Iterative Algorithm(cont.) {x(p) } is a vector x with authority weight for each page in G {y(p) } is a vector y with hub weight for each page in G Filter out the top c authorities and top c hubsBasic knowledge of Matrix: Basic knowledge of Matrix M: symmetric n*n matrix :vector : a number If for some vector , M = , we say, The set of all such is a subspace of Rn Eigenspace associated with ; These 1(M), 2(M), … are eigenvalues, while 1(M), 2(M), … are eigenvectors i(M) belongs to the subspace of i(M) If we assume |1(M) > 2(M)|, we refer to 1(M) as the principal eigenvector, and all other i(M) as non-principal eigenvector. Convergence Proof of Iterate Procedure: Convergence Proof of Iterate Procedure Theorem1. The sequences x1, x2, x3, … and y1, y2, y3, … converge to x* and y* respectively. Proof: G=(V,E); V={p1, p2, …, pn}; A is the adjacency matrix of graph G; Aij = 1 if (pi, pj) is an edge of G. I & O operations can be written as: x ATy y Ax K loops, So, x (1) AT Ax (0); x(0) = AT z x* … x (k) (AT A)k-1 AT z y* … y (k) (AAT)k z “if is a vector not orthogonal to the principle eigenvector 1(M), the unit vector in the direction of Mk converges to 1(M) as k increases without bound”Convergence Proof of Iterate Procedure(cont.): Convergence Proof of Iterate Procedure(cont.) A is called an orthogonal matrix if AAT = AT A = E. Theorem2: x* is the principal eigenvector of ATA, and y* is the principal eigenvector of AAT. Experiment finds that k=20 is sufficient for the convergence of vectors.Experimental results: Experimental results Most of these authorities do not contain any occurrences of the initial query string , except http://www.roadhead.comSimilar-Page Queries: Similar-Page Queries The strongest authorities in the local region of the link structure near p, can server as a broad-topic summary of the pages related to p. A small difference in initial request to the search engine: “Find t pages pointing to p”, to get R Multiple sets of Hubs and Authorities: Multiple sets of Hubs and Authorities Why? The query string may have several very different meanings. Eg.“java” The string may arise as a term in the context of multiple technical communities. Eg. “randomized algorithms” The string may refer to a highly polarized issue, involving groups that are not likely to link to one another. Eg. “abortion” Idea: The NON-principle eigenvectors of ATA and AAT provide us with a natural way to extract additional densely linked collections of hubs and authorities from the base set S.Multiple sets of Hubs and AuthoritiesExperimental result1: Multiple sets of Hubs and Authorities Experimental result1 For the query “jaguar*”, the strongest collections of authoritative sources concerned the Atari Jaguar product, the NFL football team from Jacksonville, and the automobile.Multiple sets of Hubs and AuthoritiesExperimental result2: Multiple sets of Hubs and Authorities Experimental result2 For the query “randomized algorithms”, none of the strongest collections of hubs and authorities are precisely on the query topic. They include home pages of theoretical computer scientists, compendia of mathesmatical software and pages on wavelets.Multiple sets of Hubs and AuthoritiesExperimental result3: Multiple sets of Hubs and Authorities Experimental result3 For the query “abortion”, there is a natural separation.Conclusion: Conclusion A technique for locating high-quality information related to a broad search topic on the www, based on a structural analysis of the link topology surrounding “authoritative” pages on the topic. Related work. Standing, influence in social networks, scientific citations, etc. Hypertext and WWW rankings …