Contents :

Contents Introduction Searching Search Engine Components of search engine How does it work ? Some other Search Engines Conclusion 2

The Internet:

The Internet Internet An interconnected network of thousands of networks and millions of computers linking businesses, educational institutions, government agencies, and individuals together

Searching :

Searching A lot of information makes a site huge, complex and navigation difficult Search is the user's lifeline for mastering complex websites Search feature is essential for users when they revisit a site, looking for specific info 4

Types of Searching :

Types of Searching A search can be of various types: Internet Search: Search Engines like Yahoo, Info seek crawl the web gathering web pages or info on web pages, index them and retrieve them when the specific term is found Database search: Databases store their information neatly organized into fields. A search Interface is provided for this. 5

Types of Searching contd…:

Types of Searching contd … Intranet search: Search is restricted to a site or a group of sites. Text search engines store this information in one index and can find words in any field for a record. Many high-end search engines can also store field information, so searches can be limited to a specific field as well. 6


Parts of a LOCAL SITE SEARCHTOOL Search Indexer Search Index File Search Form SEARCH ENGINE Result Listing 7

Parts of Local Site Search Tool Contd… :

Parts of Local Site Search Tool Contd… Search Indexer: The program that recognizes and creates an index of all the documents on the site. The index is stored in a file called as the index file, where the search engine will find them. Search Index File: Created by the Search Indexer program, this file stores the data from the site in a special index or database, designed for very quick access. 8

Parts of Local Site Search Tool Contd…:

Parts of Local Site Search Tool Contd… Search Form: HTML interface to the site search tool, provided for visitors to enter their search terms and specify their preferences for the search Search Engine: The program (CGI, server module or separate server) that accepts the request from the form or URL, searches the index, and returns the results page to the server 9

Parts of Local Site Search Tool Contd…:

Parts of Local Site Search Tool Contd… Results Listing: HTML page listing the pages which contain text matching the search term(s). These are sorted in some kind of relevance order, with the closest match at the top. The format of this is often defined by the site search tool, but may be modified in some ways. 10


SEARCH ENGINE “A tool designed to search for information on the World Wide Web. The information may consist of web pages, images, information and other types of files.” Includes external engines like Google, Yahoo, MSN, AOL, Live. 11


Contd… In other words…. A Page on the web connected to the backend program Allows a user to enter words which characterize a required page Returns links to pages which match the query 12

Components of a search engine :

Components of a search engine Robot (or Worm or Spider) collects pages checks for page changes Indexer constructs a sophisticated file structure to enable fast page retrieval Searcher satisfies user queries 13

Spider :

Spider Program that roams the web from link to link identifying and scanning pages. Looking for new sites where information is likely to reside 14

Indexer :

Indexer Database that stores a copy of each web page gathered by a spider Could be hierarchical : from general to specific topic Alphabetical: contains sources with a focus on specific topic 15

Results given by searcher:

Results given by searcher Presented as links Supposedly ordered in terms of relevancy to the query Some Search Engines score results Normally organised if groups of ten per page 16

How search engines work ?:

How search engines work ? Search engines run automated programmes called ‘bots’ or ‘spiders’ They crawl your website and what they find goes into the search engine’s index Each search engine has their own way of deciding how to rank pages (algorithm) The content the bots have found is passed through this algorithm and pages are ranked. 17

How do pages get into a Search Engine?:

How do pages get into a Search Engine? Robot discovery (following links) Self submission Payments 18

Robot Discovery:

Robot Discovery Robots visit sites while following links The more links the more visits Make sure you don't exclude Robots from visiting public pages 19


Payments Some search engines only index paying customers The more you pay the higher you appear on answers to queries 20

Self submission:

Self submission Register your page with a search engine Pay for a company to register you with many search engines Get registration with many search engines for free! 21

Pictorial Representations 22

Types of Search Engines :

Types of Search Engines CGI Programs Server Plug-Ins Search Servers Remote Searching 27

CGI Programs:

CGI Programs The Common Gateway Interface (CGI) standard allows a web server to communicate with external programs. CGI Programs run as Search Engines. 28

Server Plug-Ins:

Server Plug-Ins For better data interchange, less overhead and more flexibility, web server companies have defined APIs (Application Programmer Interfaces) to their servers. This allows third-party developers to create modules for the servers which run inside the server process 29

Search Servers:

Search Servers Some search engines run as separate servers. The form data is passed as part of the URL, just like a URL, but the search engine application runs as a separate HTTP server on a different machine. This reduces the load on the main web server. 30

Remote Searching :

Remote Searching It is also possible to outsource search to a remote site search service. The indexer and search engine run on the remote server. using a web indexing robot, or spider, they follow links on the site and read the pages, then store every word in the index file on that server. When it comes time to search, the form on the site Web page send a message to the remote search engine which sends results back to the site. 31

Generation of Search Engines :

Generation of Search Engines First Generation search engine Return results in a schematic order “On the page” Ranking 32

Generation of Search Engine Contd…:

Generation of Search Engine Contd… Second Generation search engine Organize search results by peer ranking, domain, or site rather than relevancy “Off the Page” Ranking More reliable in the ranking of the results A web page become highly ranked if it is connected to other highly ranked pages Google derives its result from the behavior and judgment of millions of web users 33

Some Search Engines :

Some Search Engines AltaVista ( ) Google ( ) HotBot ( ) Lycos ( ) Northern Light ( ) Yahoo ( ) 34

35 Search Engine Number of Web pages in database Percentage of web in database Google 1.2billion 57% Yahoo(powered by Google ) 1.2 billion 57% Lycos 575 million 27% HotBot 500 million 24% AltaVista 350 million 17% Northern Light 330 million 16%

Conclusion :

Conclusion Search engines are the mortar of the Internet. As important as they are, their implementation must be given high priority with the necessary time allotted for research and development 42

43 QUERIES ???

