logging in or signing up fast Dionigi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 67 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript AlltheWeb: AlltheWeb Torbjørn Kanestrøm January 30th, 2003Agenda: Agenda Who is FAST ? What do we do? Libraries; Relevant projects we have done What is AlltheWeb? Under the Hood: Phrasing & Lemmatization Take a tour of AlltheWeb Simple searches (Web, News, Multimedia, FTP) Advanced Web Search Results Page Q & A Who is FAST?: Who is FAST? San Francisco Tokyo Boston Norway Munich Rome London Paris Fast Search & Transfer (FAST) Founded 1997 Public company (Oslo Stock Exchange – June 2001) One of the fastest growing companies in Europe Profitable 200 employees 40+ Phd’s 12 offices world wide What we do…: What we do…FAST Solutions: //TECHNOLOGY Common Technology Platform FAST SolutionsFAST Customers & Partners: Enterprise Portals Partners //BACKGROUND FAST Customers & Partners FAST is the creator of the real-time integrated search and filter technology solutions that are behind the scenes at some of the world's best known companies with the world's most demanding search problemsSlide7: A few selected projects we have done - Relevant to every librarian Questia: QuestiaQuestia – the online library: Questia – the online libraryNordic Web Archive: Nordic Web Archive The Nordic Web Archive is a cooperation between the Nordic National Libraries (Finland, Sweden, Denmark, Norway, Iceland). Project started in 2000, datacenter built deep inside a mountain in northern Norway Collecting and archiving web documents of national interest and importance. Everything published in the national domains (.NO, .DK, .FI etc.) Everything written on the web in the respective languages Everything referring to one of the countries (city, company, person, etc.) Continuous project designed to scale indefinitely Available to the research community, not a public site.Elesevier Engineering Information: Elesevier Engineering Information Compendex® is the most comprehensive interdisciplinary engineering database in the world with almost seven million records referencing 5,000 engineering journals and conference materials dating from 1970. The database is updated weekly. Scirus.com – the web’s Science search: Combining scientific classification of the “deep web” and proprietary publications “FAST’s core search technology has enabled us to provide the best scientific search results, period” - John Regazzi - Managing Director, Elsevier Science Web Server XML //BUSINESS CASES 120M web pages 17M Elsevier Science publications Scientific classification Grouping and identification of related articles Leading science Index Understanding content Scientific navigation Scirus.com – the web’s Science searchSlide13: What is AlltheWeb? What is AlltheWeb?: What is AlltheWeb? Showcase for FAST technology Test new search features with real live audience Several milion queries per day 40% North America, 30% Europe, and 30% rest of World Integrated interface for searching 2.1+ billion web pages, PDF docs, MS Word docs, & Flash objects Continuously refreshed news from 5000+ global/local news sources 150 million images and videos 130 million ftp files 2 million mp3 files Targeted at advand searchesWhat makes AlltheWeb different?: What makes AlltheWeb different? Versatility Searching in 49 languages Six seperate catalogues (Web, News, Pictures, Videos, MP3, FTP) Fully customizable front-end (only major search site that is XHTML/CSS compl.) Solid Index 2.5 billion web objects (pages, pictures, videos, mp3s, etc.) One of the fastest refresh cycle (every 7 – 14 days) Advanced search features Boolean search Embedded content selectors Domain & IP filtering File format and size filtering Much more...Slide16: Under the Hood - Phrasing & Lemmatization Under the Hood: Phrasing/Anti-Phrasing: Under the Hood: Phrasing/Anti-Phrasing Phrasing: Known phrases are matched as a phrase New York “New York” Based on common phrases, names, movie names, geographic names, etc. Can detect multiple phrases within same query Anti-Phrasing: Remove words irrelevant to the query Who is… What is… Combines to create a better query Who is George Bush “George Bush” What is the age of the earth “the age of the earth” How do I get to train station in New York “get to” “train station” in “New York”Under the Hood:Lemmatization: Under the Hood: Lemmatization Lemmatization improves recall Literal matching only finds a fraction of candidates for a query Ratio between base and full forms English: 2 German, French, Spanish: 5 – 10 Russian, Polish: 40+ Typical Cases: Singular/plural variation, case marking, etc. Stemming vs. Lemmatization Traditional stemming Term is stemmed according to rules, e.g. walking walk Can easily result in “false” stemmings, e.g. Bobby Browning Bobby Brown Lemmatization Rewriting of terms are controlled by language-sensitive dictionaries Very comprehensive dictionaries; about 20 “man years”Slide19: Take a Tour AlltheWeb Home Page: AlltheWeb Home PageSimple Search (Web/News): Simple Search (Web/News) Web- and News Search Picture-, Video- and MP3 Search FTP Search ”WebSearch University”Simple Search (Rich Media): Simple Search (Rich Media) Web- and News Search Picture-, Video- and MP3 Search FTP Search Simple Search (FTP): Simple Search (FTP) Web- and News Search Picture-, Video- and MP3 Search FTP Search Advanced Web Search: Advanced Web Search Embedded Content Exclude or include pages based on embedded content on these pages Specific Date range and Document depth Advanced Web Search (cont.): File Type Limits results to PDF, MS Word, and Macromedia Flash files Advanced Web Search (cont.) Region Filter Limit results to different regions Presentation How many search results to list per pageThe Result Page: The Result Page Search Bar Click tabs to send query to other catalogs Query Rewriting Did we rewrite your query? Gives you full control!Slide27: www.AllTheWeb .com Has all the advanced search features and functions that you can find on all other major web search engines – combined... And we innovate at a faster pace and invest more in R&D than ever before. Slide28: AlltheWeb Q&A You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
fast Dionigi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 67 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 22, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript AlltheWeb: AlltheWeb Torbjørn Kanestrøm January 30th, 2003Agenda: Agenda Who is FAST ? What do we do? Libraries; Relevant projects we have done What is AlltheWeb? Under the Hood: Phrasing & Lemmatization Take a tour of AlltheWeb Simple searches (Web, News, Multimedia, FTP) Advanced Web Search Results Page Q & A Who is FAST?: Who is FAST? San Francisco Tokyo Boston Norway Munich Rome London Paris Fast Search & Transfer (FAST) Founded 1997 Public company (Oslo Stock Exchange – June 2001) One of the fastest growing companies in Europe Profitable 200 employees 40+ Phd’s 12 offices world wide What we do…: What we do…FAST Solutions: //TECHNOLOGY Common Technology Platform FAST SolutionsFAST Customers & Partners: Enterprise Portals Partners //BACKGROUND FAST Customers & Partners FAST is the creator of the real-time integrated search and filter technology solutions that are behind the scenes at some of the world's best known companies with the world's most demanding search problemsSlide7: A few selected projects we have done - Relevant to every librarian Questia: QuestiaQuestia – the online library: Questia – the online libraryNordic Web Archive: Nordic Web Archive The Nordic Web Archive is a cooperation between the Nordic National Libraries (Finland, Sweden, Denmark, Norway, Iceland). Project started in 2000, datacenter built deep inside a mountain in northern Norway Collecting and archiving web documents of national interest and importance. Everything published in the national domains (.NO, .DK, .FI etc.) Everything written on the web in the respective languages Everything referring to one of the countries (city, company, person, etc.) Continuous project designed to scale indefinitely Available to the research community, not a public site.Elesevier Engineering Information: Elesevier Engineering Information Compendex® is the most comprehensive interdisciplinary engineering database in the world with almost seven million records referencing 5,000 engineering journals and conference materials dating from 1970. The database is updated weekly. Scirus.com – the web’s Science search: Combining scientific classification of the “deep web” and proprietary publications “FAST’s core search technology has enabled us to provide the best scientific search results, period” - John Regazzi - Managing Director, Elsevier Science Web Server XML //BUSINESS CASES 120M web pages 17M Elsevier Science publications Scientific classification Grouping and identification of related articles Leading science Index Understanding content Scientific navigation Scirus.com – the web’s Science searchSlide13: What is AlltheWeb? What is AlltheWeb?: What is AlltheWeb? Showcase for FAST technology Test new search features with real live audience Several milion queries per day 40% North America, 30% Europe, and 30% rest of World Integrated interface for searching 2.1+ billion web pages, PDF docs, MS Word docs, & Flash objects Continuously refreshed news from 5000+ global/local news sources 150 million images and videos 130 million ftp files 2 million mp3 files Targeted at advand searchesWhat makes AlltheWeb different?: What makes AlltheWeb different? Versatility Searching in 49 languages Six seperate catalogues (Web, News, Pictures, Videos, MP3, FTP) Fully customizable front-end (only major search site that is XHTML/CSS compl.) Solid Index 2.5 billion web objects (pages, pictures, videos, mp3s, etc.) One of the fastest refresh cycle (every 7 – 14 days) Advanced search features Boolean search Embedded content selectors Domain & IP filtering File format and size filtering Much more...Slide16: Under the Hood - Phrasing & Lemmatization Under the Hood: Phrasing/Anti-Phrasing: Under the Hood: Phrasing/Anti-Phrasing Phrasing: Known phrases are matched as a phrase New York “New York” Based on common phrases, names, movie names, geographic names, etc. Can detect multiple phrases within same query Anti-Phrasing: Remove words irrelevant to the query Who is… What is… Combines to create a better query Who is George Bush “George Bush” What is the age of the earth “the age of the earth” How do I get to train station in New York “get to” “train station” in “New York”Under the Hood:Lemmatization: Under the Hood: Lemmatization Lemmatization improves recall Literal matching only finds a fraction of candidates for a query Ratio between base and full forms English: 2 German, French, Spanish: 5 – 10 Russian, Polish: 40+ Typical Cases: Singular/plural variation, case marking, etc. Stemming vs. Lemmatization Traditional stemming Term is stemmed according to rules, e.g. walking walk Can easily result in “false” stemmings, e.g. Bobby Browning Bobby Brown Lemmatization Rewriting of terms are controlled by language-sensitive dictionaries Very comprehensive dictionaries; about 20 “man years”Slide19: Take a Tour AlltheWeb Home Page: AlltheWeb Home PageSimple Search (Web/News): Simple Search (Web/News) Web- and News Search Picture-, Video- and MP3 Search FTP Search ”WebSearch University”Simple Search (Rich Media): Simple Search (Rich Media) Web- and News Search Picture-, Video- and MP3 Search FTP Search Simple Search (FTP): Simple Search (FTP) Web- and News Search Picture-, Video- and MP3 Search FTP Search Advanced Web Search: Advanced Web Search Embedded Content Exclude or include pages based on embedded content on these pages Specific Date range and Document depth Advanced Web Search (cont.): File Type Limits results to PDF, MS Word, and Macromedia Flash files Advanced Web Search (cont.) Region Filter Limit results to different regions Presentation How many search results to list per pageThe Result Page: The Result Page Search Bar Click tabs to send query to other catalogs Query Rewriting Did we rewrite your query? Gives you full control!Slide27: www.AllTheWeb .com Has all the advanced search features and functions that you can find on all other major web search engines – combined... And we innovate at a faster pace and invest more in R&D than ever before. Slide28: AlltheWeb Q&A