Terra Server SigMod

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

Tom Barclay, Microsoft Research Don Slutz, Microsoft Research Jim Gray, Microsoft Research

Agenda: 

Agenda What is TerraServer? Why we built it? How we did it? User Experiences? What we have learned? What we are doing next?

Slide3: 

Size Matters... Terra Server is one of the Largest Databases on the Web IIS 5.0 ASP Site Server EE 3.0 … and it runs on Microsoft Backoffice 2.8 TB DB Today / 2.5 TB User data 2.4 TB “cooked” 324 9GB drives 8 x 440 mhz 10 GB RAM 31M Visitors, 621M Pages to Date Supports 15M+ hits &3M+ Pages/day Early Adopter Reference Site SQL Server Internet Demo Application Validate VLDB features on new SQL releases Popular with the Press CNN/Headline News Major Papers (NY Times, SF Chronicle) Popular Press (Siskel & Ebert, Economist) Local Press Industry Trades Status 0.6 TB “in-house” ready to load http://terraserver.microsoft.com Launched on the Web 6/24/98 at FED Scalability Day Peak 30M+ hits & 6+M Pages / day

Project Mission Statement: 

Project Mission Statement To Demonstrate and Advertise the unique contribution and capabilities of each Terra-Server partner. Distribute DOQs to a wider audience Lower cost of distribution Demo Scalability of NT and SQL Server Increase credibility with large users Demo scope & quality of Spin-2 imagery Open new markets for imagery sales Demo Scalability of DEC Alphas. Recognized as superior system vendor Demonstrate superior automated tape technology to the NT market Test large SQL on-line/off-line backups with Networker, SQL 7.0 and Timberwolf

Terra-Server Requirements: 

Terra-Server Requirements BIG —1 TB of data including catalog, temporary space, etc. PUBLIC — available on the world wide web INTERESTING — to a wide audience ACCESSIBLE — using standard browsers (IE, Netscape) REAL — a LOB application (users can buy imagery) FREE —cannot require NDA or money to a user to access FAST — usable on low-speed (56kbps) and high speeds(T-1+) EASY — we do not want a large group to develop, deploy, or maintain the application

The CRADA: 

The CRADA Microsoft’s contribution: Build an “internet UI” to DOQ data Provide e-commerce software for USGS’ use in creating a “Internet Store” Run a “robust”web site 18 months USGS contribution: Deliver entire DOQ asset within a quarter Provide technical advice Operate and maintain DOQ “electronic store” Defines Microsoft’s and USGS’ role within the Terra-Server project

TerraServer “Image Tile Server”: 

Coverage: USGS: Conterminous U.S., Hawaii (topo maps), SPIN-2: US, Europe, Asia Source Imagery: 12.0 TB 1sq meter/pixel Aerial (USGS - 60,000 46Mb B&W- 151Mb Color IR files) 1.0 TB scanned color topographical maps (USGS DRG 1:24000, 1:100000, 1:250000) Display Imagery: 273 m 200 x 200 pixel images, 7 layer image pyramid Nav Tools: 1.8 m place names “Click-on” Coverage map Latitude & Longitude Concept: User navigates an ‘almost seamless’ image of earth TerraServer “Image Tile Server”

Image Data: 

USGS “DOQ” 12 TB DRG Topo Maps Spin-2 500 GB WorldWide LoB App Hosted on Separate Web site Dec ‘00 Image Data 1 TB 100% U.S. Coverage Added Jan ‘00 70 % U.S. Coverage 100 % U.S. Coverage by 2001 Hosted on 2nd DB server

Logical Schema: 

Logical Schema External Link External Group External Geo Famous Category Famous Place

Software Architecture: 

IE 3.0/4.0 Netscape 4.0 HTML Java Viewer The Internet Web Client Image Delivery Application SQL Server 7.0 SPIN-2/USGS Store Active Server Pages Microsoft Site Server EE 3.0 Image Commerce Site(s) SQL Server 7.0 Terra-Server DB Terra-Server Stored Procedures Internet Information Server 4.0 Terra-Server Active Server Pages Active Data Object ODBC Terra-Server Web Site 22 9 App 13 Load 22 13 56 (30 Img) (12 Place) Software Architecture Key: Standard Software Custom Software Comm Protocols

Slide11: 

Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 To the Web Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Hardware Configuration COMPAQ ProLiant 8500 4 GB Ram 1.5 TB “Classic Raid 5” DRG DOQ

Slide12: 

TerraServer Activity

TerraServer Statistics: 

TerraServer Statistics

TerraServer V2.0 Statistics 2.5 Months into Year 2: 

TerraServer V2.0 Statistics 2.5 Months into Year 2 TerraServer V2.0 launched on 6/24/98 and delivered service thru 9/22/99. TerraServer V3.0 launched on 9/22/99. V3.0 has a significantly database format and application structure. Statistics cannot be easily merged due to the differences.

TerraServer Database Server: 

TerraServer Database Server

Site Configuration: 

Enterprise Storage Array 4 NTFS Stripe Sets (600 gb) 28 11-Disk Raid 5 Stripe sets 324 9 GB Seagate Disks 7 HSZ70 Ultra-SCSI Dual redundant Controllers Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 To the Web Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Compaq Proliant 5500 4x200mhz 512mb RAM 20GB Raid5 Site Configuration

File System Config: 

Use StorageWorks to form 28 RAID5 sets Each raid set has 11 disks (16 spare drives) Use NTFS to form 4 595GB NT volumes Each striped over 7 Raid sets on 7 controllers DB is File Group of 80 20,000 MB files (1.5TB) File System Config

Image Update Process: 

Monitor Progress Image Update Process Terra Cutter Terra Scale TerraServer SQL DBMS Active Server Pages Cut & Load Scheduling System Extracts New Tiles Merge with Existing Tiles Insert Hi-Res Tiles Read Hi-Res Sub-Sample To Lo-Res TerraServer Administrator Administrative Web Site 30 Gb tape 45 mb – 151 mb files BIL / GeoTIFF formats 30 Gb tape 1 Gb files Tiff format 6 Mb/min Db Insert 630 Mb/min Image proc. 3.8 Mb / min Db Insert 13.7 Mb/min Image proc. 350 Gb staging Area Compaq 8500

Merging Tile Process (UTM Projection): 

Merging Tile Process (UTM Projection) Meridian 500,000 1,000,000 (max) 0 (min) 1 2 3 4 5 6 13 14 15 16 17 18 25 26 27 28 29 30 37 38 39 40 41 42

Image Load Statistics: 

Image Load Statistics Read approximately 6 28 GB tapes per day (data uncompressed) Load 4 to 5 tapes per day (10 GB compressed / day) Load remotely, 30 miles from data center (via back-channel connection)

SQL 7 Backup Features: 

Fast Online Backup Under Load Minimal Impact Just the Data Backup Part of the Database Minimize Recovery Time Differential Backups, Log Backups Restore Only Damaged Files SQL 7 Backup Features

Backup Strategy: 

Backup Strategy Weekly Full Backups Full SQL Backup to 8 drives on STK 9710 Recycle tapes after 3 backups Backup is “Online”, within 8 hours Runs Sunday morning at 9:00 am Backup Runs in 10 hours Database set to “Truncate Log on Checkpoint”, thus SQL logs are not backed up Note, this is “NOT” what you would do in a e-commerce situation Restore time was 12 hours (8 drives)

What We Have Learned: 

What We Have Learned Relational DBMS is the way to handle millions of images tiles/files Multiple indices to data Better concurrency than file system Alternative to exotic compression techniques for large image data-sets No client-side controls/viewers required Minimal data transfer to client, satisfactory in low-bandwidth conditions Simpler to update small tiles than large files

What We Have Learned: 

What We Have Learned Intra-network bandwidth is the bottleneck Bandwidth between DB Server and Web Server limited by h/w configuration limits Encourages a scale out approach as we add data The application is interesting 30,000 daily visitors 600,000 unique visitors per month Working with government agencies to host TerraServer permanently

What We Have Learned: 

What We Have Learned Users look at data where they live and when they are work TerraServer peak usage is 8am in each time zone we have imagery Weekend volume is half the volume of Monday & Tuesdays TerraServer pages build faster the closer to Redmond you are Cannot control topology of public internet Number of network hops dominates perceived performance Lots of imagery data becoming available End of cold war reducing restrictions on data distribution Standard resolution moved from 5m / pixel to 1m / pixel “Retired spies” are launching satellites rapidly Encourages a distributed, multi-web site design. . .

Operation Statistics: 

Operation Statistics Year 1 Through 18 Months Down 30 hours in July (hardware stop, auto restart failed, operations failure) Down 26 hours in September (Backplane failure, I/O Bus failure)

Future Plans and Ideas “TerraServer V4.0”: 

Future Plans and Ideas “TerraServer V4.0” More data… Complete DOQ coverage of “lower 48” Digital Elevation Model (hi-res shaded relief) National hydrology data-set “Cool” Features Direct links to on-line Gauging station feeds Focus on Availabilility Windows 2000 Data Center Cluster (N+1 cluster) SQL Server 2000 “cluster failover” model

Slide28: 

TerraServer “N+1” 3 x 2TB Cluster 1 2 3 1850 1850 1850 6400r 4 1850 1850 1850 1850 1850 1850 8500 8500 8500 8500 6.5 9 11.5 Configuration Summary 4 Server Racks 15 Disk Storage Racks (ESA 12000) 8 Reserved 3 ESL9000 or ATL P3000 mods 27 Total Rack space

Slide29: 

TerraServer N(3) + 1 Cluster

Thank You!: 

Thank You!

TerraServer V3.0 Bandwidth Calculations: 

TerraServer V3.0 Bandwidth Calculations

Logical Schema: 

Lookup by Theme, Scale, X, Y, and Zone Lookups are fast. Indices are in DRAM (auto-magically by SQL) SQL manages all the tiles and indices Images are brought in on demand Index on • image, place, type • image, state, type • image, state, country, type • image, place, state, type • image, place, country, type all lookups are fast Logical Schema Country Name State Name Place Name Place Type Gazetteer SourceMeta Image Image Search Scale Job Load Job External Link External Group Small Place Name Pyramid Imagery Load Mgmt Image Type Search Job Search Dest Famous Category Famous Place

Sensitive Area Functionality: 

Sensitive Area Functionality Database records have DisplayStatus Field: 0 = Image available and can be displayed 2 = Image present but unavailable (Red Unavailable GIF/link) 5 = Image superceded by a new image tile Effects immediate search/display of image tiles Created initially to support demands of SOVINFORMSPUTNIK (Russian satellite image provider) Used by load system to replace a tile in the database with a “better” tile without locking any rows

TerraServer Administrator Web Site: 

TerraServer Administrator Web Site Accessible by Microsoft Operators Web browser forms to: Edit Famous Places list Edit Imagery (Modify Image Status fields) Define new TerraServer Administrators Monitor image loads Monitor image scaling (creating the image pyramid) Monitor the load of external Image links

Management & Maintenance: 

Management & Maintenance Backup and Recovery Using Legato Networker integrated with SQL Server Backup/Restore Utility Tested Seagate Backup Exec Tested Cheyenne (CA) ArcServe SQL Server Enterprise Manager DBA Maintenance SQL Performance Monitor

TerraServer Backup Factoids: 

Offline Backup TerraServer Backup Factoids

TerraServer Backup Factoids: 

TerraServer Backup Factoids On-Line Backup

Electronic Commerce: 

USGS Store Built by USGS MSCS V2.0 Based Standard Shopping Basket approach Purchase Digital Ortho Quads used by MS to build TerraServer Pricing subject to quantity Image you were viewing given away for free (public domain data) Spin-2 Store Microsoft SP built MSCS V3.0 Based Buy Small, Medium, Large Digital image Can get Photographic print thru SPIN-2 relationship with Kodak Digital images are “sized” to make photographic prints look good Microsoft does not collect or share in the revenues generated by TerraServer image sales! Electronic Commerce

How We Did It: 

How We Did It Edited big images into small “tiles” Sub-sampled tiles to create zoom levels Tile sizes map to Lat/Lon system Unique ID assigned to each Tile location ID clusters adjacent tiles onto same DB page Wrote Load Management program Runs image cutting job Loads meta and image data into SQL Multiple Loaders can run in parallel

TerraServer Grid System: 

TerraServer Grid System -90º 90º 0º -180º 0º 180º -18000 18000 -9000 9000 Search Cell System Physical Tile Grid System .01 Longitude X .01 Latitude “Cell” 180002 total Search Cells Each Search Cell mapped to best Tile per data Theme Place/Address tables have foreign key to Search Cell rows 200 pixel by 200 pixel GIF or Jpeg Tiles Five field primary key – X, Y offset into image scene (Z), for each Theme (T) and multiple resolutions (S). Base (highest) resolution X & Y values iterate from 1 Each lower resolution 2x from Base resolution

USGS DOQ / DRG Editing Process: 

USGS DOQ / DRG Editing Process Origin Point Input image cut into 200m x 200m Jpeg “tiles” 1 Quadrangle (7.5’ x 7.5’) 1 “DOQ QUAD” DOQQ Photo (3.75’ x 3.75’) 48MB - 150 MB Merge Pixels with overlapping tiles found in database 200x200 Subsampled Jpeg UTM Projection Summary 200x200 images Seamless mosaic 7 level pyramid Jpeg format 200x200 200x200 200x200 200x200 4 higher resolution Sub-sampled 2:1 to form each lower resolution tile.

USGS Grid System: 

USGS Grid System USGS DOQ ortho-rectified to UTM projection World divided into 60 “zones” Zone logically is 1,000,000 meters by 10,000,000 meters USGS tiles are uniform size 200 X 200 pixels relative to zone boundary (left and bottom edges) USGS X = int(LowerLeftEasting / (200 * MetersPerPixel)) USGS Y = int(LowerLeftNorthing / (200 * MetersPerPixel))

SPIN-2 Relationship: 

SPIN-2 Relationship Microsoft’s contribution: Build an “internet UI” to SPIN-2 data Build an “electronic store”for USGS’ use selling/distributing DOQ data Run a “robust”web site 18 months Provide “TerraServer web site” S/W at the end of the project SPIN-2 contribution: Deliver 2 square T-meters of 2m resolution satellite imageryProvide technical advice Operate and maintain SPIN-2 “electronic store”

Spin-2 Image Editing: 

Spin-2 Image Editing Re-sampled to 2 meter Image aligned to left corner of grid system Non-image squares (all white) are discarded 200x200 Tiles extracted Browse Thumb Merged with related tiles in File System 200x200 200x200 200x200 200x200 Base tiles loaded into Db Image Pyramid created from tiles in Db

Spin-2 Meta Data : 

Spin-2 Meta Data Scene name City1 State1 Country Number of Columns Number of Rows Shooting Height Height of Sun Date of survey (mm/dd/yyyy) Time of survey (GMT) (hr:mn:ss) Upper Left Latitude Upper Left Longitude Lower Right Latitude Lower Right Longitude Camera System1 Pixel size1 Copyright1 # of Physical Files Physical Filename X Pixel offset from top-left Y Pixel offset from top-left Last 3 fields for # of physical files 1Field is not required, if not present, then a blank field is present Semi-colon delimited fields, ASCII encoding 1 records per line

Image Delivery and Load: 

Terra Cutter Image Delivery and Load DLT Tape \Drop’N’ 384 4.3 GB Drives 1.25 TB ESA 3 Alpha Server 4100 NT Backup Enterprise Storage Array 9 HSZ70 Ultra-SCSI Dual redundant Controllers 324 9.1 Seagate Disks 9710 TimberWolf Intel 4x200 mhz Intel 4x200 mhz Clariion 110 9GB 1 TB Intel 4x200 mhz Intel 4x200 mhz EMC Sym- metrix 800 GB TimberWolf 9714 DLT Tape “tar” Terra Cutter Terra Cutter Terra Scale Terra Scale Terra Scale Reads source image files, Creates 200 x 200 tiles, And inserts into SQL DB. Reads TerraCutter tiles, Re-samples low resolution tiles, And inserts into SQL DB.

Things we did right...: 

Things we did right... Simple X, Y, Z-Grid navigation system Used ImgStatus to control logical “presence” of the image in the app “Stitching tiles together” from multiple input images to form seamless mosaic Offering two forms of seamless -- time based (SPIN-2) and theme based (DOQ) Using a fixed tile size on DOQ data Can dynamically load data into tables while viewing application is active

Things we would do differently... and doing in V3: 

Things we would do differently... and doing in V3 Square Tiles, power of 2 size (256x256) Power of 2 zoom levels (2:1, 4:1, 8:1, etc.) Uniform tile size on each zoom (variable ground size per tile) Search system independent of tile size Support multiple projection systems Support multiple databases per web site Support inter-web site search and cross linking

Future Plans: 

Future Plans Integrate with Encarta Online Add additional data sets Digital Raster Graphics (Topo maps) World Relief (1 km) Interconnect with other TerraServer “sites” Layered Maps Allow Topo maps to overlay DOQ, DEM, ... Integrate with other apps and services

TerraServer V3.0 Major Features: 

TerraServer V3.0 Major Features Tiling Enhancements: Fixed Size Tiles (200 x 200) Fixed Resolutions (1/1024 m -> 4096 m) Multi-Scene per data theme Grayscale, Natural Color, ColorMap (GIF) New User Interface Based on Encarta On-line Integrated links to/from Encarta Online articles More screen area dedicated to imagery

TerraServer V3.0 New Data Sets: 

TerraServer V3.0 New Data Sets New Data Sets USGS “DRG” (Topographic maps, U.S.) 1KM seamless World photo (natural color) And can add others... Layered Maps Host Java applet developed by UC Berkeley Digital Library project Can display overlay different data-sets that share a common projection, e.g. DOQ & DRG

Example DRG : 

Example DRG

TerraServer V3.0 Improved Search & Indexing: 

TerraServer V3.0 Improved Search & Indexing Improved Searching Address search (U.S. addresses only) [on-hold] Simple, one field find (replaces 4 field page) Advanced Search for exotic searches (4 field find place, latitude & longitude) External Links Display geo-reference links to external web sites from imagery pages (Encarta articles, MS business partners, others)

TerraServer V3.0 Simpler Image Loading: 

TerraServer V3.0 Simpler Image Loading Single program, TerraCutter, to process complete input tape (job) Restartable, detects & avoids duplicate runs Stitches together image through database tiles Single program, TerraScale, to create image pyramid from tiles in database Requires less temporary disk space

TerraServer V3.0 Schedule: 

TerraServer V3.0 Schedule March 26, 1999: 1.1 TB Encarta Online “look” Simplified search on home page (find, coverage map, famous places) Links to Encarta articles from image pages September 30, 1999 (Jul 30): 1.1 TB Re-load DOQ & SPIN-2 data-set New Tiling scheme November 15, 1999: 2.0 TB [Dec 15 1999] DRG data-set (topo maps) Layered Maps