Performance and Reliability 101

Performance and Reliability 101 Brent Cromarty Ping Identity bcromarty@pingidentity.com

A little about me • Like • Long walks on the beach • Red wine • Dislike • Mean people • Early mornings • Encourage questions throughout presentation • Although I may hold off if I am going to address it with material in a future slide. • You may have to ask again

OK Seriously… • Spent bulk of my career (14 years) at SAP • By way of Business Objects acquisition • By way of Crystal Decisions acquisition • By way of Seagate Software IMG • 5 years experience in customer support • Discovered impatience, dislike for people • 9 years of performance and reliability (P&R) testing for Crystal Reports in Crystal/Business Objects Enterprise product • Currently in my second year with Ping Identity

Why are we here? • Types of testing that make up P&R • Design • What is the goal of each test type? • What does it prove/disprove? • Execution • How is the test run? • Results Analysis • How to figure out if the test passed or failed? • Best Practices (Tips/Tricks/Suggestions/Filler) • Suggestions for root cause analysis

So… What are these test types that I speak of? • Types of P&R tests • Load • Scalability • Endurance • Stress • Reliability

Load

Load Testing • Performance equivalent of functional “smoke” test • Functional test/workflow executed under “load” • Typically “load” is in the form of concurrent users • Executed with a Load Generator tool • Load Runner, JMeter, QALoad, Grinder, etc… • Does the component stand up? • Does the test pass functionally? For all users? • Does it crash? Does the system grind to a halt? • Metrics to consider • Response time (average, 90th percentile, min, max) • Throughput • CPU and memory utilization on the target system

Scalability

Scalability Testing • Executed as a series of Load Tests • Workload Scalability • Vary the user load from test to test • Resource Scalability • Vary the resources from test to test • Functional success • Error rate “too high”, scalability results are meaningless • How does performance change from test to test? • Response time (average, 90th percentile, min, max) • Throughput • CPU and memory utilization on the target system • Do not discount single user performance • A system can exhibit linear scalability, but still perform poorly

Endurance

Endurance Testing • Also know is “Soak” testing • Load test executed over an extended duration • Typically overnight or over the weekend • Proves “reliability” of the system • Consistency of functional results • Very first result same as very last and all those in between? • Depending on requirements, error rate > 0 can be acceptable • Consistency of performance • Does response time or throughput degrade over time? • Consistency of resource utilization • Are we leaking memory? • How does CPU usage look over the duration?

Stress

Stress Testing • Often mistakenly referred to as “Load” testing • Best thought of as “extreme” load testing • Resiliency of the system when pushed beyond limits • 150% to 200% of the “nominal” load for the system • Half the system resources suggested for a given load • CPUs, memory, network bandwidth, etc… • Looking for “graceful failure” • Best: System returns “Too Busy” • Acceptable: System slows down, maybe some requests time out • Better: effective error messaging so that uses know system is maxed out • Bad: Crash • Worst: Unpredictable results, misleading error messages

Reliability

Reliability Testing • Negative condition Load Testing • Test resiliency under error conditions • Error condition code paths typically don’t get the same coverage as the “happy path” • Is the system consistent under constant error conditions? • Are results consistent and predictable over time? • Consistency of resource utilization • Error conditions are notorious for resource leaks • Security tests • i.e.: Denial of Service

Random Suggestions (Time Filler) • Choose workflows that fit the “80/20 rule” • Some workflows need P&R testing, others don’t. Choose wisely. • Use sufficient hardware for your Load Generator application • Size your client hardware like you would your target system • Don’t use “intrusive” validation in your test cases • Heavy test validation will slow down your test and affect concurrency • Avoid use of “intrusive” monitoring when possible • Beware of logging • Logging is useful, but can kill performance • Visualize your results • A picture is worth a thousand words. Who doesn’t like charts? • Include context (resource utilization of the systems under test)

So what do I do if I think there is a problem? • Too slow? • Is your system tuned? • Ensure you have not configured a bottleneck in your deployment • Try a profiling tool • Can show which areas of the code are taking the most time • Add some lightweight logging to code • Add “timing code” to log out elapsed time in functions/paths • Use a stack dumping utility • Repeated stack dumps can show where you are “stuck” • Using too much memory or leaking? • Try a profiling tool • Can show • Add “size” logging for container classes • Can show you if your containers are growing unbounded

Questions?

Performance and Reliability 101