Presentation Description

No description available.


Presentation Transcript

PowerPoint Presentation:

A Brave New Frontier: Testing Live Production Applications Dr Kelvin Ross, Steve Woodyatt, Dr Steven Butler SMART Testing Technologies Pty Ltd Presentation at: AsiaSTAR2004, Canberra, Australia, 7 Sep 2004


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up

Why Test On Production:

Why Test On Production Despite best efforts to test an application prior to deployment there are still post-deployment problems that frequently occur Server offline No response Functions not available Incorrect response Slow response Security breach Data out-of-date

The user experience:

The user experience What is it that the user will experience in dealing with our application E.g. Airline Reservation business process: Search for flights Make a reservation Pay with credit card Obtain electronic ticket reservation code Confirmation by email with matching details Reservation details reported in frequent flyer Information Flow

Distributed Architecture:

Distributed Architecture Airline External Systems Email Gateway Remote Prices Web Application Mainframe ERP Payment Gateway Payment Internet Email Internet Firewall


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up

Service Level Management:

Service Level Management Service Level Management (SLM) “set of people and systems that allows the organisation to ensure that SLAs are being met and that the necessary resources are being provided efficiently” Service Level Agreement (SLA) “ contracts between service providers and customers that define the services provided, the metrics associated with these services, acceptable and unacceptable service levels, liabilities on the part of the service provider and the customer, and actions to be taken in specific circumstances” Definitions from IEC, “Service Level Management” tutorial,

SLM in the context of ITIL:

SLM in the context of ITIL


SLA KPIs Availability End-to-end, not just components No. and duration of outages, Total uptime/downtime Security Exposure No. of breaches Vulnerabilities detected Viruses Accuracy Correct results Processes followed Performance Responsiveness Response time for web request Data transfer / throughput MTTR No. of incidents Service degradation


Approaches Passive Listen into transactions and analyse logs Active Transactions are synthesised End User Observes user experience End-to-End Topaz NetIQ … SMART Cat Topaz Keynote Netmechanic … Component Focuses on servers and backend processes Web Trends … HP OpenView IBM Tivoli CA Unicentre BMC Patrol ….

Business Process Auditing (BPA):

Business Process Auditing (BPA) B usiness P rocess A uditing Performance Availability Security Functionality Correctness Accuracy Completion Reporting Alerting Fault Diagnosis & Remedy S ervice L evel M anagement  Automated  Real-time

Post-Deployment Testing and SLM:

Post-Deployment Testing and SLM Testing can be used to synthesise business transactions Interact with system through various interfaces Collect and report metrics Transfer of technology predominantly used pre-deployment


Problems detected End-To-End processes not available Responses slow Incorrect data Problems not detected Issues localised to individual clients Actual response times to all clients Problems Detected

Who Owns Production Testing:

Who Owns Production Testing The testing group? The support group? The operations group? The application owners? Marketing? Marriage of skills and technology required for efficiency “We don’t call that testing” syndrome

Which applications most benefit:

Which applications most benefit Those with real time dependence for completion of vital business processes High risk & dependence Financial Market reputation Probity, Accountability and Liability Potentially unreliable or difficult to manage technology dependencies increasingly complex linkages distributed application architectures history of failure, problems Risk assessment: SEVERITY X LIKELIHOOD


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up


Objectives BPA Planning checklist: What are the critical business processes Who are the users What is the user experience How can success be determined How can the test be automated

Airline Reservation Case Study:

Airline Reservation Case Study Critical business processes Search available flights Make online booking Change booking Cancel booking Etc. Users Consumers Travel agents Call centre

Airline Reservation Case Study:

Airline Reservation Case Study What is the user experience Search for flights Available Function accessible Response returned Correct Correct flights, source and destination, time, etc. Complete No missing flights with available seats Responsive With tolerable response times How can success be determined What is the source of truth

Airline Reservation Case Study:

Airline Reservation Case Study Choose what to monitor based on risk SEVERITY x LIKELIHOOD Previous operational reliability problems, complex dynamic behaviour What was previously tested and will continue to function Are there problems with distributed components continuing to run appropriately, e.g. tuxedo services, LDAP authentication, payment gateway not accessible Are there problems with timely propagation/retrieval of data, e.g. flight data not retrieved consistently, bookings not updated in timely manner

Test Frameworks:

Test Frameworks Outcomes have to be reported at business level, not application object level Object level – Too Low Level for Audience getURL search.jsp saveForm, submitflight setParam, submitflight, startime, 200412011100 … submitForm, submitflight Business level – Appropriate for Audience searchFlight, return, 20041201110000, SYD, … “Action Word” approaches recommended See Carl Nagle or Hans Buwalda’s work

Dynamic behaviour:

Dynamic behaviour searchFlight, return, 200412011100, SYD, … Won’t remain useful for long as production data is dynamic Dynamic input data searchFlight Type = return DepartTime = today()@10am + 1 month ReturnTime = today()@10am + 1 month + 5 days Depart = Sydney Arrive = Melbourne May even want to randomise data Vary depart and arrive on successive runs

The Test Oracle:

The Test Oracle Mechanisms for determining correct response Get any response Get a response containing predefined expected values Expected values are checked using an oracle E.g. formula determining whether valid date returned Results are compared to reference data 3 rd party data feed Trusted internal source, e.g. Mainframe

3rd Party Reference:

3 rd Party Reference Trending against price data × Prices frozen  Price trend matches

Airline Reservation Case Study:

Airline Reservation Case Study Verification failures for searchFlight response


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up

Scheduling the test:

Scheduling the test How often 1 minutes, 5 minutes, hourly, daily, weekly Depends on how quickly support can respond What business hours 24x7, 9 to 5, higher frequency at certain events What about scheduled outages Planned outages, public holidays Coordinating tests Locking to prevent simultaneous tests E.g. don’t check prices or submit orders unless logged in Semaphores

Sensitive Data:

Sensitive Data Frequently there may be sensitive information stored in scripts and test logs Logins and passwords Credit card ids Personal details, e.g. phone numbers, ABNs, etc Where possible avoid Use dummy accounts Don’t log sensitive information Can be difficult to control, eg. failure may save screen shot that then displays credentials Use encryption Sensitive data is stored in encrypted, but test engine still required key to send At least it is obfuscated

Where tests should be run from    :

Where tests should be run from      Many tools allow tests to be run from multiple locations Simulate users of different geographies Different connection speeds to report on a variety of user experiences Inside/outside firewall Probably the largest concern Consumer users outside, Corporate users inside To provide end-to-end scenarios, may need combination Scenario initiated internally, and end results are propagated to external, or vice-versa External view of web may be verified using Test Oracle data that is internal Agents may be deployed internal and external to run tests

Problems to Avoid:

Problems to Avoid Need to be aware of impact of testing: Performance hits Volatile features Intrusive tests Biased results Compliance restrictions Impact on Business KPIs Taking measurements may distort the system being measured

Minimising the Effect of Transactions       :

Minimising the Effect of Transactions         Cost of Transaction Financial – purchase flight may incur credit card merchant fee Resource – seats unavailable until refund provided, searching places additional load on resource pool Reversing the transaction Providing a refund, merchant fee may still apply What if the transaction is incomplete What happens if refund process doesn’t occur/complete Compliance issues Corporate Legislative

Managing the Test Impact:

Managing the Test Impact Modifications to the application under test to cleanup data or control test effects Manual fallback may be convenient option Test Objects Dummy frequent flyer accounts Dummy cost centres Testing the tests Access to test environment pre-deployment Endurance test that can be part of application test strategy Transfer of load, stress and endurance test scripts


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up

Effective Reporting:

Effective Reporting Who are the users of the reports, different expectations on presentation/content Business/Application Manager Operations Development Support SLM How do they access reports? Web, email, Thick client Which reports are real-time or batched Is data summarised, or is original data accessible

Historic Reporting:

Historic Reporting Service level reports Trends Progress Post Mortem Analysis Counts Count = 525 Pass = 513 (97.71%) Fail = 12 (2.29%)     Latency Min = 4.339 sec Avg = 8.253 sec Max = 87.708 sec

Realtime Reporting:

Realtime Reporting Alerts Current status Diagnosis

Diagnosing root cause and remedies:

Diagnosing root cause and remedies Accessing fault and failure data for multiple components Pinpoint failures Correlation is a skill manual, expert analysis required Variety of support: Saved actual results Unattended collection for debugging Correlation with component performance analysis Automated correlation with component failure modes Sophisticated “expert system” Rules that correlate tested events to arrive at diagnosis of root cause(s)

Fault Analysis:

Fault Analysis


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up

Tool Requirements:

Tool Requirements Evaluation Checklist Test script can interact with a variety of systems GUI, Terminal, APIs, HTTP, SOAP, POP/SMTP, etc. Test script can respond to dynamic behaviour Agents can be deployed internal/external of the WAN Ability to control frequency Time based functions can be used to control execution Functions available for data manipulation for dynamic responses (time, extraction, etc.) Inter-process coordination between tests using locking/semaphores Test steps can be reported on business process steps, object actions can be hidden in reports Test outcomes saved to repository for later analysis Ability to export data for other purposes, e.g. trending, visualisation, etc. Reporting capability on stored data Online ability to drill into test data for problem diagnosis Alerting mechanisms to email, SMS, online dashboards Alerting can be controlled, ie. escalation, filtering Apply weighting to each criteria according to need

Implementation Choices:

Implementation Choices Available Commercial Tools/Services SmartTestTech - SMARTCat Mercury – Topaz Compuware –Vantage Keynote Lesser extent, enterprise monitoring tools: BMC Patrol, Tivoli, HP Openview Home Brew Tools Extensive support for testing protocols in open source frameworks E.g. Java/Junit, .Net/Nunit, Perl/Ruby/Python Extend Existing In-house Regression Test Suites Automated scripts may be adapted Robot, QARun, WinRunner, Silk Post results to Database Provide reporting capability e.g. Crystal Reports, Cognos, etc


Roadmap Avoiding Production Problems Testing for Service Level Management Case Study Considerations Unique to Production Testing Information for SLM Implementation Choices Wrap-Up


Wrap-up Strong business case Benefit in bringing testing to the production world Small %age availability increase translates to large $ Manages reputational risk with user base Large investment in SLM SLAs very ad-hoc and not measured Uses tests to provide SLM reports to Business / Application Managers Leveraging the investment in test resources Protects overall investment

Questions & Answers:

Questions & Answers Contact details: Dr Kelvin Ross SMART Testing Technologies Pty Ltd PO Box 131, West Burleigh, Q4219 AUSTRALIA Ph: +61 7 5522 5131 Email:

authorStream Live Help