Presentation Transcript
Exploring the Deep Web: Exploring the Deep Web University of Utah Government Documents Librarians
Amy Brunvand
Kate Holvoet
Peter Kraus
David Morrison
What is the Deep Web?: What is the Deep Web? The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users.
How big is the Deep Web?: How big is the Deep Web? 550 billion documents
500 times the content of the surface Web
Google has identified 1.2 billion documents
An Internet search typically searches .03% (1/3000) of available content.
What’s in the Deep Web?: What’s in the Deep Web? Searchable databases
Downloadable files & spreadsheets
Image and multi-media files
Data sets
Various file formats such as .pdf
Lots of government information
Why use the Deep Web?: Why use the Deep Web? Higher quality sources
Selected and organized by subject experts
Dynamic display
Customized data sets
Some data is visual, and not word searchable
Regular search engines miss vast resources available in the Deep Web
Why are we talking about Government Sites in the Deep Web?: Why are we talking about Government Sites in the Deep Web? Governments have the mandate and the capacity to gather information that individuals don’t
Most government information is copyright free
Government information is authoritative
Governments have the financial and human resources to maintain Deep Web sites
The Deep Web for Federal Information: The Deep Web for Federal Information Peter L. Kraus
Federal Documents Librarian
Marriott Library – University of Utah
The Web Today: The Web Today Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of “The Deep Web”.
The content of these web sites include items with either an .html or .pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is “Grey Literature”.
Definition of “Grey Literature”: Definition of “Grey Literature” “That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers”
Growth and Life of Federal Information: Growth and Life of Federal Information On federal web sites the amount of information grew 13-fold between 1992-2003
The average life expectancy of federal web resource is 4 months (2003)
What can libraries do?: What can libraries do? LOCKSS-DOCS project (BYU and UU are members) (Archival project)
Cooperative efforts in specific subject areas (Western Waters Digital Library)
Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants)
The Deep Web for Health and Science Information: The Deep Web for Health and Science Information Amy Brunvand – Government Information Librarian
Marriott Library – University of Utah
Slide25: Finding Naked People - Forsyth, Fleck (1996) (Correct) (54 citations)
This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin.
http.cs.berkeley.edu/~daf/newo2.ps.Z
Slide26: Graph showing number of citations to “Finding Naked People”
Slide28: Arches National Park : NASA Landsat 7 10/3/99
Slide31: Development and Evaluation of Stitched Sandwich Panels Larry E. Stanley; Daniel O. Adams NASA Langley Research Center NASA/CR-2001-211025 , June 2001; 20010702
….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center……
http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/NASA-2001-cr211025.pdf
Slide37: Marriott Library, Salt Lake City, Utah, United States 9/18/2003 (TerraServer)
Slide39: Utah Seismic Hazards (National Atlas)
The Deep Web for International Information: The Deep Web for International Information Kate Holvoet –Interim Head, Government Documents and Microforms
Marriott Library – University of Utah
International Deep Web Resources: International Deep Web Resources International organizations collect an amazing amount of data
Statistical data is often best organized in database and spreadsheet format
Like the US Government, individual countries post data files and databases
This information may not be available in print sources in schools and libraries
United Nations Official Documents System: United Nations Official Documents System http://documents.un.org/
Why use the ODS?: Why use the ODS? Full-text Official United Nations Documents (1993 -) online, free
Retrospective digitization in process
Highly relevant material for almost any international topic
Timely and authoritative
United Nations Statistical Databases: United Nations Statistical Databases Value of the information:
Authoritative
Comparative
Time series
Compact Database topics include:
Commodity trade
Demographics
Disability statistics
Social indicators
Statistics on men and women
Slide48: http://unstats.un.org/unsd/databases.htm
Individual Country Statistics: Individual Country Statistics http://www.census.gov/main/www/stat_int.html
Why use this kind of information?: Why use this kind of information? Aggregate statistical sources are often not as up-to-date
Individual countries are often more specific in their indicators than aggregate sources
Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers
Patents, Trademarks and the Deep Web: Patents, Trademarks and the Deep Web Dave Morrison
Documents and Microforms Division
Marriott Library - University of Utah
Slide81: For Further Information USPTO Information Line
800-PTO-9199
Marriott Library, University of Utah
801-581-8394
www.lib.utah.edu/documents
Slide82: Any Questions?
Thanks!: Thanks!