Searching the Internet More Effectively


Presentation Description

No description available.


Presentation Transcript

Searching the Internet More Effectively Barnsley 29th February 2012:

Searching the Internet More Effectively Barnsley 29 th February 2012 Karen Blakeman RBA Information Services Slides are available at [email protected] Twitter: @karenblakeman This presentation is licensed under a Creative Commons Attribution 3.0 License

How it all started:

How it all started Before 1992 priced electronic databases - for example Lexis (legal), Nexis (news), technical/scientific data – and print (government Daily Lists, Annual Reports, directories, local newspapers, official statistics) 1992 – the Internet can be accessed by anyone but 2-3 years before significant information started appearing on the web Increase in amount of data and information led to the development of tools that indexed and searched the content of web pages Lycos, Excite, AltaVista, Hotbot 19/11/2013 2

How the search tools worked (and still do in part):

How the search tools worked (and still do in part) "Crawl" the internet looking for new and updated pages by following links Copies of pages and documents added to a database that is publicly searchable Results sorted according to: how often the words you looked for appear in the page where they appear (words in the title and first few sentences given higher ranking) and many other criteria not disclosed by the search engines They do not cover: password protected sites databases or sites where you have to fill in a form to find the information, for example Companies House 19/11/2013 3

Then along came.....:

Then along came..... 19/11/2013 4 11 November 1998 The Internet Archive

How was Google different?:

How was Google different? 19/11/2013 5 Links (citations) a major part of ordering search results

Where is Google now?:

Where is Google now? 19/11/2013 6 2001 Revenues $86,426 thousands Net Income $10,964 thousands 2011 Revenues $37,905 millions Net Income $9,737 millions 2011 – 96% of revenues are from advertising Google is mass market consumer oriented. Serious researchers wanting reliable, structured search are a miniscule fraction of their customer base.

How Google organises and sorts information:

How Google organises and sorts information Has a primary index of higher "quality" documents and a secondary index. Only the primary index is searched when running straightforward searches. Secondary index comes into play with more complex searches and if a small number of results are found. “Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google” Over 200 hundred “signals” and each may have over 50 variations 19/11/2013 7

How Google ranks and organises your results:

How Google ranks and organises your results 19/11/2013 8 Google personalizes and tailors your results depending on your location, computer/device, browser, past searches, what you have looked at in the past, your +1s, your Google+ account, what you had for breakfast...and anything else it can find by rummaging around in your Google dashboard To see what's in your dashboard log in to your Google account and go to Also see Google personalisation: web history isn’t the only problem


What I see on my screen for a search is not what you’ll see on yours. 19/11/2013 9

Google knows best! :

Google knows best! 19/11/2013 10 Hewish mild Google decided to change my search to Jewish mild without asking Placing a phrase within quote marks – "Hewish mild" – will usually force an exact match Google automatically looks for variations of your search terms

For 10 days in February 2011: coots = lions:

For 10 days in February 2011: coots = lions 19/11/2013 11 Google decides that coots are really lions Update on coots vs. lions

Coots = lions:

Coots = lions 19/11/2013 12

Three search tricks:

Three search tricks These three techniques can change what Google (and other search engines) decides to give you and also the order of the results. Repeat important search terms coots coots mating behaviour (found coots) Change the order of your terms mating behaviour coots (found coots) Change one of your search terms coots mating behaviour (found lions) coots courtship behaviour (found coots) coots mating ritual ( found coots) 19/11/2013 13

Excluding pages containing words :

Excluding pages containing words Want to exclude pages containing a term? Place a - (minus sign) before the term Use with care as may miss important material Excluding lions from our bizarre coots search coots mating behaviour –lions gave us: 19/11/2013 14

PowerPoint Presentation:

Coots=lions was an extreme example of how Google can work We think Google was doing the following: - assumed a typing error or was running a mobile/smartphone predictive text algorithm (coots=cats) - ran an automatic variation/synonym search on cats - used a search frequency rule and found that lions mating behaviour was requested more than cats 19/11/2013 15

PowerPoint Presentation:

Dear Google, stop messing with my search 19/11/2013 16 Google no longer looks for all of your terms in a page

See what Google sees:

See what Google sees 19/11/2013 17 Hover over a result and a "preview" of the page should appear to the right together with a Cached link – this is Google's copy

PowerPoint Presentation:

“When you do a multi-term query on Google (even with quoted terms), the algorithm sometimes backs-off from hard ANDing all of the terms’s clear that people will often write long queries (with anywhere from 5 to 10 terms) for which there are no results. Google will then selectively remove the terms that are the lowest frequency to give you some results (rather than none)....Soft AND is a way to reduce the overall frustration and give the searcher something to examine (and with luck, a chance to reformulate their query).” Dan Russell 19/11/2013 18


Verbatim Forces Google to run an exact match search. Run your search first and then select Verbatim from the left hand menu on your results page Cannot be combined with time options in the side bar Google: Verbatim for exact match search 19/11/2013 19

Google doing its own thing can be good:

Google doing its own thing can be good 19/11/2013 20

PowerPoint Presentation:

Google's new(ish) social network Google Plus (Google+) Google trying forcing people to create a Google+ profile Search Plus Your World (SPYW) referred to as Search+ now available in and is the default. Gives priority to content from people in your Google+ network if you are signed in to your account. (And the next Google killer is….Google! ) 19/11/2013 21

PowerPoint Presentation:

19/11/2013 22 Before After SPYW Currently being tested on

PowerPoint Presentation:

19/11/2013 23 SPYW Currently being tested on

Google results side bar:

Google results side bar These help you focus your search Vary depending on type of search e.g. web, news, images Open up the "more" options to see everything 19/11/2013 24

Google side bars:

Google side bars 19/11/2013 25 Images Videos News Books Blogs

Google images – not always what you expect:

19/11/2013 26 Google images – not always what you expect Search for patent and select the colour red from the side bar (Thanks to Arthur Weiss for the example)

Related searches:

Related searches 19/11/2013 27

Translated foreign pages for a different perspective:

Translated foreign pages for a different perspective Google suggests languages from context of search but you can choose your own Your search is translated and the results are translated into your language 19/11/2013 28

Problems finding information on a particular site?:

Problems finding information on a particular site? Use Google's site: command For example, trying to find information on Reading Borough Council's recycling policy by searching 19/11/2013 29

PowerPoint Presentation:

Go to Google and type in recycling policy 19/11/2013 30

PowerPoint Presentation:

Or if you are interested in all government (central, departmental and local) recycling policies: recycling policy 19/11/2013 31

Combine with date option in the side bar:

Combine with date option in the side bar 19/11/2013 32


LGSearch Google Custom Search Engine (CSE) 19/11/2013 33

Create your own Google custom search engine:

Create your own Google custom search engine For regularly searched sites selected sites on a subject or type of organisation Cannot include password protected sources or sites where you have to fill in a form to access the information Information on setting up a Google Custom Search Engine (CSE) Google's blog on custom search 19 November 2013 Karen Blakeman 34

PowerPoint Presentation:

Looking for a particular type of information for example statistics, research report, expert presentation? Use the filetype : command For statistics car ownership UK filetype:xls car ownership UK filetype:xlsx For government, research, industry reports UK oil consumption forecasts filetype:pdf For conference presentations or trying to locate an expert renewable energy UK filetype:ppt renewable energy UK filetype:pptx 19/11/2013 35

PowerPoint Presentation:

Can combine commands renewable energy UK filetype:ppt Advanced search screen with more options at Selected Google Commands 19/11/2013 36

Google alternatives - Bing and Yahoo:

Google alternatives - Bing and Yahoo Yahoo now uses’s database and ranking Many of the Advanced Search commands are similar to Google’s, see Search Tools Summary and Comparison Most of the interesting developments and features are only available in the US version Results tend to be more consumer/retail focused unless using advanced search features Coverage not identical to Google’s - sometimes yields important unique content Sometimes more up to date than Google 19/11/2013 37


DuckDuckGo DuckDuckGo – silly name but a neat little search tool No tracking, no “filter bubble” Commands site: filetype: sort:date to sort by date (uses results from Blekko) Syntax and keyboard shortcuts at 19/11/2013 38

Flickr to search for images:

Flickr to search for images Use the default search box or Flickr Creative Commons or advanced search screen 19/11/2013 39

Statistics :

Statistics 19/11/2013 40

MySociety :

MySociety 19/11/2013 41


MySociety 19/11/2013 42 - Local crime and policing information for England and Wales : : - Local crime and policing information for England and Wales : 19/11/2013 43 Professional network For people and companies For identifying experts in a field Boolean Black Belt-Sourcing/Recruiting 19/11/2013 44


Facebook Personal and business pages relatively easy to find No easy way to search content within pages 19/11/2013 45

Local "stuff":

Local "stuff" Web pages, local papers, "what's on", local forums/discussion boards, Facebook pages, Twitter Twitter search Socialmention Topsy Icerocket Set up 'lists' (can be kept private) - view through, desktop program or mobile app 19/11/2013 46

My local stuff on Tweetdeck:

My local stuff on Tweetdeck 19/11/2013 47 - create your own newspaper: - create your own newspaper 19/11/2013 48 19/11/2013 49


Copyright Always check the copyright of anything that you want to use or incorporate into a document or web page Always, always check and double check the copyright of images - may have a digital watermark and be tracked e.g. Digimarc Creative Commons does not mean you can do what you like with the text/image six licences “Open- licencing your images. What it means and how to do it.” Andy Mabbett aka pigsonthewing Karen Blakeman's Blog “Free-to-use images might not be” 19/11/2013 50

Evaluating resources:

Evaluating resources Type of web site for example:,, .gov, .edu Who is really behind the site? use a domain name register such as you do NOT want to see that the domain name is hosted by an organisation such as this: 19/11/2013 51

Evaluating resources:

Evaluating resources Date of publication, 'last updated' Check text for clues of publication date Stated date for a web page or document may be automatically generated when it is put onto the web site After a web site redesign pages are re-uploaded and are given a new publication date Some pages are generated "on the fly" so will always have today's date 19/11/2013 52

Quoting and referencing:

Quoting and referencing Make it clear when you are quoting someone else and always quote the source of data Give at least the title of the article and URL in the text of a document Full reference: author (and/or organisation), title of page/document, URL (web address – do not use shortened URLs), date of publication (if known), date you accessed the document George Monbiot , In Praise of Distrust , 27 th February 2012, [Accessed 28 th February 2012] organisations and publishers may have their own preferred format If the information is critical make a local copy 19/11/2013 53

Keeping up to date:

Keeping up to date Inside Search Official Google Blog Google Scholar Blog Search Engine Land Search Engine Watch Boolean Black Belt-Sourcing/Recruiting Karen Blakeman’s Blog Phil Bradley's weblog 19/11/2013 54

PowerPoint Presentation: 19/11/2013 55

PowerPoint Presentation:

19/11/2013 56 When are road works not road works? When they are classified as Network Rail bridge works! CC 3.0 Attribution Non-commercial