Thursday 7 January 2016

Performance testing - a real world experience and day-zero considerations

Welcome to 2016. Why not kick off the new year with an old story. Some years back I was a test manager for a large project aiming at launching a business critical system. The test strategy recommended a few structured performance test activities aiming at proving that the system would actually be able to deal with the expected number of users, that peak log on (during the mornings) could be handled and that system resources would be freed up as users log off.

All of these recommendations were approved by project management and a separate team was set up to design, implement and execute the necessary tests. This would be done late in the project phase and was as such totally normal. Do the testing at a point where the software is sufficiently mature to actually be able to trust and use the test results for a go/no-go decision. So far, so good.

Since this was a project that would implement a solution that would replace an existing solution we didn't have to guess too much on the user behaviour for normal use. Just look in the log files and find patterns. A large part of the testing was designed around this knowledge.

Then we consulted the implementation team to figure out how they expected to roll out the solution to the organisation. We returned with the knowledge of a "big bang" implementation. There were no alternatives so we also needed this as a scenario. How would the solution scale and behave on the first day when everybody had an email in their inbox saying "please log on to this brand new and super good system"?

No problems so far. Knowing that the organisation was located in two different time zones that took some of the expected peak load off and we didn't have to have this cruel "100% users at the same time"-scenario. Emails to different parts of the organisation could be sent out to groups of users with say 10-15 minutes intervals to avoid a tidal wave of concurrent log ons. Good and pragmatic idea and that was agreed in the project and executed by the implementation team.

Billedresultat for bomb

The one thing we didn't take into account was how organisations and especially middle management works. Middle management tend to send a lot of mails around these days. In ways not always known or controlled by a project like ours. So in the real world we succeeded with our performance testing but failed on day-zero.

As soon as middle management started to get the "Important information- log on to this new system" they did what they always do with this kind of information - passed it on. Not only to their own organisation but across the organisation. using different mail groups that would hit 30, 50 or 100 persons at a time. They were used to this in their daily operational life, and to them this was just another operational morning.

The result was that the peaks of log ons were completely different from what we had expected and planned - and tested. Not to the extent that there was a complete meltdown but there was short outages during the first couple of hours - and of course some angry and concerned users who needed feedback and assurance that they could trust the system which was mission critical for them.

Lesson learned: Think a bit outside the box. Not always worst case scenario, but closer than you might think. Even though you have a lot knowledge to build on always consider performance testing for day-zero scenarios as something truly special. First impressions last, especially for real-life users.

No comments:

Post a Comment