disaster 2001

Uploaded from authorPOINT
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Progress Database Repair & Recovery: 

Progress Database Repair andamp; Recovery Dan Foreman United Systems, Inc. Email: danf@prodb.com

Introduction- Dan Foreman: 

Introduction- Dan Foreman Progress user since 1984 (V2.1) Guest speaker at every USA Progress Users Conference from 1990 to 1998 Quest speaker at QAD Conference: 1997, 1999-2001 Author of: Progress Performance Tuning Guide Progress Database Administration Guide (Nov 2001) Progress Virtual System Tables Progress V9 Database Administration Jumpstart A paper purchase now allows free online access ProMonitor - Performance Monitoring Tool ProDandamp;L - Accelerated Dump/Load Utility

Introduction - Who Are We?: 

Introduction - Who Are We? BravePoint Inc. (www.BravePoint.com) Formerly United Systems, Inc. 125+ Employees Three of us have used Progress since 1984 Progress Service Provider Performance Tuning Database Administration and Recovery Progress Training Internet, Webspeed, Java Custom Programming Services

Introduction - Who Are You: 

Introduction - Who Are You Progress Version: V6, V7, V8.2, V8.3, V9 Unix? NT? Is there anything else? Largest Single Database? Highest User Count?

Special Request: 

Special Request Phasers on Stun Isn't it interesting that the Microsoft spell checker can’t find the word 'Phaser' Cell Phones on Mute Please!

Goals: 

Goals Can I teach you database brain surgery in only one hour?

Preventative Maintenance: 

Preventative Maintenance Backups (yes, I know you’ve heard of them but have you tested them recently?) Test your Entire Recovery Plan Warm Spare Database - A database on another machine with a recent copy of the production DB This is easy to do in Progress...covered soon Unix: don’t logon as root unless you really need to Use O/S security to protect the DB, BI, and AI files from accidental/casual/intentional deletion

Preventative Maintenance: 

Preventative Maintenance Unix: Don’t use kill -9 to terminate a Progress session; You might bring the database DOWN! if you kill a session that is holding a Latch Always have an up-to-date Structure (.st) file available Make sure your Sequences are not about to overflow (at approx. 2 billion)

Preventative Maintenance: 

Preventative Maintenance Monitor the BI file High Water Mark to avoid the 2gb size limit bug which will corrupt your DB (_dbstatus-bisize Virtual System Table); not using VSTs? Turn Them On! Monitor 'Delinquent' Transactions (transactions longer than 30-60 minutes) Monitor Large Transactions (-L Overflow) longtrx4.p Progress program on the BravePoint website to Automatically disconnect Delinquent Transactions Use the V8.3 -bithold parameter as an extra safeguard; Set to 999 or less; Set even in V9 V9 supports Terabyte sized BI Files

Preventative Maintenance: 

Preventative Maintenance Monitor the database High Water Mark to avoid growing into the Variable Length Extent Could hit the 2gb size limit Unbuffered I/O to the Variable Extent Sync call issued each time the Extent grows Size is requested from the OS in increasingly larger chunks (16-128 DB Blocks) Could run out of disk space

Quiz Question: 

Quiz Question Who are the Smartest People in the Room?

Answer : 

Answer Those running After Imaging on their Mission Critical Databases Or stated in another way, if you’re not using AI, you shouldn’t be responsible for your company’s databases If your company can’t afford to lose a day’s worth of data you Absolutely need After Imaging

After Imaging: 

After Imaging Who is currently using After Imaging? If no, why not? The PSC documentation says it offers protection against media failure Disk fails 5 minutes before the backup starts on the final day of your year end close No paper trail Ouch! Time to work on your resume (C.V.) After Image File(s) + Last (Good) Backup = State of DB at time of crash

After Imaging - Why Use It?: 

After Imaging - Why Use It? But you say…'I have disk mirroring (also known as RAID 1) so I’m protected against a disk failure' BUT Mirroring does NOT protect against all database evils

After Imaging - Why Use It?: 

After Imaging - Why Use It? True Horror Story #1 A DBA FTP’d a test database into the directory where the production database resided... unfortunately they had the same name Disk Mirroring worked just fine….. After Imaging would have probably saved the day

After Imaging - Why Use It?: 

After Imaging - Why Use It? True Horror Story #2 A user ran an archiving program on live data that wasn’t ready to be archived Once again the mirroring performed perfectly After Imaging might have improved the situation as it is possible to Roll Forward to a specific point in time

After Imaging - Why Use It?: 

After Imaging - Why Use It? True Horror Story #3 BI file hit the 2GB bug (true even with MV BI files) @ 4PM on a busy day (300+ users) 10GB Database was corrupted Progress Tech Support Recommendation: dump andamp; load or restore from backup which meant substantial down time or data loss Fortunately the customer called me and I was able to temporarily patch the database I had recommended AI to this customer over one year prior to this event

After Imaging - Why Use It?: 

After Imaging - Why Use It? Avoid the Problems related to probkup online Transaction Activity is Frozen while the BI File is Backed Up The Buffer Cache is 'polluted' with Database Blocks that may not be Commonly Accessed The Overhead of Running a Streaming Tape Drive on the Same Machine as the DB Possible Solution Backup the AI Files Restore/Copy to Another System (or disk) Apply the AI Files (Roll Forward) Backup the Replicated Database

After Imaging - Why Use It?: 

After Imaging - Why Use It? Easy generation of a Warm Spare DB A Warm Spare DB is: A standby database On another machine That can be brought online quickly in case of failure to the production system It’s ‘warm’ because it is not 100% current A HOT spare is not possible using AI but is possible using Replication Triggers

Online Backup  Warm Spare: 

Online Backup  Warm Spare Production System Warm Spare Production DB Warm Spare DB AI Files

After Imaging - Why Use It?: 

After Imaging - Why Use It? Easy generation of a Report Server DB A Report Server DB is: A database on another system Used for reporting only To relieve the production system of the load imposed by reporting Doesn’t require same level of hardware or Progress license

Preventative Maintenance: 

Preventative Maintenance Corruption Checks proutil dbanalys Blocks, Records, andamp; Indexes probkup/procopy Blocks only proutil dbrpr Blocks and Records proutil dbscan Same as dbrpr (cmd line) proutil idxfix Indexes andlt;-andgt; Records Run periodically to make sure you don’t have hidden or unreported corruption

Log File Checking: 

Log File Checking Check the Database log (.lg) file for errors DAILY. Look for words such as: kill* drastic warn* error system abnormal exceed* fail* wrong unexpected* invalid died damage* dead overflow* violation fatal insufficient missing disappear* corrupt* allow* attempt* cannot enough illegal impossible increase unknown unable stop* beyond There are programs on the our Web Site to assist with log file checking or products such as ProMonitor

1124 Errors: 

1124 Errors SYSTEM ERROR: wrong dbkey in block….. Indicates (with 99.9999% probability) a hardware problem Don’t limit your search to disks; also consider: Disk Controllers, SCSI termination, RAM (parity errors), Firmware, etc. Don’t Let the Hardware Technician Blame Progress or the Application Don’t let the Hardware Technician escape without solving the problem - every component might need to be swapped out

Common Problems: 

Common Problems Corrupt Blocks Running out of disk space Deleted/Damaged Portion of the Database Database brought down by the Watchdog or the Broker itself Upgrade Progress (V7.3E with Patches, V8.2C with Patches) Use Loopback Client connections (-H andamp; -S)

Corrupt Blocks: 

Corrupt Blocks What Kind of Block? Index or Data Block Type 2 (Index - IX) or 3 (Record Mgr - RM) If IX block, try rebuilding the indexes If RM block, consider the following Reformat a block as a Free block (proutil dbrpr) Replace the block with the same block from another DB (probably restored from a backup) proutil dbrpr 5. Dump Block (from the good database) 4. Load Block (into the bad database)

Emergency Dump: 

Emergency Dump ‘Front and Back’ 4GL Dump for each customer by cust-num (until you hit the bad spot) for each customer by cust-num descending If the Primary Index is Damaged, try Dumping using a non-Primary Index RECID Dump Doesn’t require an Index Very Slow on a Large Database Last Resort

Miscellaneous Hints: 

Miscellaneous Hints If you can’t get into the database with -F or any other way, try the Read Only (-RO) option

Deleted Extents: 

Deleted Extents First, backup the remaining pieces of the database This may seem like a useless step but if your backup is defective you may need to repair the broken DB and that’s difficult if it’s deleted The Backup gives you time to: Prepare a plan of action Call outside resources (like me) for help Calm down Lock your door Prepare a new Resume (C.V.)

Deleted AI Extents: 

Deleted AI Extents If an AI Extent is Deleted, simply disable AI and... What? You’re not running AI? BAD IDEA! Disable AI (rfutil aimage end) Fix the problem that caused the lost Extent Recreate the Extent with prostrct add Restart AI (rfutil aimage begin) If this doesn’t work, go to the next slide

Deleted AI Extents: 

Deleted AI Extents Disable AI with rfutil aimage end. You may get an error message regarding the missing AI Extent but typically AI is still disabled Truncate the BI file with proutil truncate bi. You may get an error message regarding the missing AI Extent but typically the BI file is still truncated Remove all AI Extents with prostrct remove Recreate the original AI Extents with prostrct add Restart After Imaging with rfutil aimage begin Reformat the truncated BI file with proutil bigrow

Deleted BI Extents: 

Deleted BI Extents Force Access with -F V8.2 and later -F only forces access on proutil truncate bi If you Force Access, consider the DB damaged! Forcing Access THROWS AWAY the BI file Forcing Access sets the ‘Tainted Flag’ Even if you fix the Tainted Flag, consider the DB damaged! Dump andamp; Load (this is if AI is not enabled)

Deleted DB Extents: 

Deleted DB Extents Restore the the DB and BI from Backup Apply the AI files Re-enable AI BI Grow Done! Still not running AI?, next slide please

Deleted DB Extents: 

Deleted DB Extents Use prostrct unlock if the deleted Extent was Empty (above the High Water Mark) prostrct unlock will recreate missing Extents However unlock also changes the time stamps on the AI files and they can’t be used any longer

Deleted DB Extents: 

Deleted DB Extents Backup Extent Substitute This technique is for Extents that contain Data except the .d1 Extent (.d1 contains Master Block) Restore a copy of the deleted Extent from a Backup The Extent’s ‘Last Opened’ time stamps won’t match Use prostrct unlock to sync the time stamps (broken in V8.2 and V9) The data in the Extent might not match but… Use the -miracle option to re-create the Data 

Deleted DB Extents: 

Deleted DB Extents If the database Broker is still running: DON’T Shutdown the Database That ‘closes’ the database extents and you won’t be able to re-open them If a Client is still attached to the DB and that Client can access the Progress Editor, simply Dump the Database from the Dictionary Even if they can’t get into the Editor, put dict.p (renamed as a menu item) into their PROPATH

Deleted DB Extents - Unix: 

Deleted DB Extents - Unix Warm Boot the System Immediately Don’t Shut Down the DB First When Unix fsck runs, it will *probably* recover the deleted Extent Why? A file is not finally deleted until every process that has it open is gone (the Broker still has it open)

Overlay Database: 

Overlay Database .db File is lost Create a Void Multi-volume Structure that matches the problem Database (hopefully you have a current .st file Take the .db from the Void Structure and use it for the problem DB Use prostrct unlock to Sync the Extent Timestamps The above steps apply if you are using probkup otherwise restore the .db from 3d party backup

Overlay Database: 

Overlay Database On V9 it is much easier to restore the .db file prostrct builddb Requires an up-to-date Structure File (remember that from the Preventative Maintenance list?)

Disk Full: 

Disk Full Use prostrct repair to relocate Extents to a location with more space Copy the Extent Create a new Structure File (.st) that reflect the current location of the Database Extents (one good reason to have a current one) Run prostrct repair new.st Done! There is a bug in prostrct repair. Don’t misspell the name of the Structure File or your .db file will vanish forever (fixed in V8.3C)

BI Disk Full: 

BI Disk Full Do Not Run out of Space on the BI Disk if: You can’t make more space for the BI file You can’t relocate the BI file To Perform Crash Recovery, the BI file must grow If there is no space for the BI file to grow, there is no Crash Recovery Force Access (-F) is the only option (if you don’t have AI enabled)

Progress HA Challenges: 

Progress HA Challenges Online Backups probkup online AI Backup (already covered) Split Mirror Multi-DB backups cannot be restored to exactly the same point in time unless you are using Two Phase Commit

Online Backup Options: 

Online Backup Options probkup online Transaction Freeze during BI Backup Usually High I/O Volume Only DB and BI are Backed Up Performs an AI Extent Switch If backup to disk, backup file can’t be greater 2gb (except on NT) until V9.1B Use the Volume Size (-vs) Parameter if backup file is greater than 2gb

Online Backup Options: 

Online Backup Options Split Mirror Backup Quiet the DB with proquiet enable Break the Disk Mirror Unquiet the DB with proquiet disable Backup the Offline Mirror Resync the Mirrors

Online Backup Options: 

Online Backup Options Split Mirror Backup Database is vulnerable during the backup unless you have triple mirroring or similar feature Resynchronization of the Disks can be a big Performance Problem Still Can’t Backup (and restore) Multiple DBs Simultaneously

Progress HA Challenges: 

Progress HA Challenges Schema Changes. It is not possible to make Schema changes online, even in V9.1 Adding New Extents. It is not possible to add Extents online, even in V9.1 The BI Notes counter is limited to 2 billion until V8.3D and V9.1C Promon/VST Statistics. The numbers recorded by promon are limited to 4 billion Updating the Progress Version. It is necessary to shutdown the DB to change Progress versions The DB log (.lg) file is limited to 2gb. A system that runs for months could approach this limit. There is no way to ‘truncate’ the log file online.

Progress HA Challenges: 

Progress HA Challenges Database Reorg. In V8 is possible to build indexes online. In V9 it is possible move tables (and indexes) from one Area to another but the tablemove option has the following disadvantages: The table being moved is EXCLUSIVEly locked for the duration of the move. Because the RECIDs will be different in the new Area, all indexes on the table must be rebuilt. The table is moved in one transaction. This means that the BI file can grow quite large. If AI is enabled, the notes generated by the table move is written to the AI file the same way that other transaction related activities are

Sources of Help: 

Sources of Help Progress Documentation Progress Database Administration Guide dba@peg.com Progress Knowledgebase (techweb.progress.com) My Home Number: 541-754-2116 My Mobile Number is: 541-829-7741 For those weekend emergencies when you need expert assistance This is not a free call

Slide49: