I was working on the fourth part of the online banking series when this article caught my eye: Singapore Bank Suffers Massive IT Failure
The fact that there was an IT failure at a leading bank (DBS Bank) is by itself not entirely surprising considering the complexity of an IT organisation in most large financial services institutions. But the fact that it took seven hours and more to set it right with a possible virtual band-aid was surprising indeedespecially when one considers the quality of IT management at this bank's disposal. I should know as I have worked closely with one of the senior IT executives and know first hand the quality of systems, processes and people he had put together at his previous organisation.
So why does it take so long to identify root causes, peripheral causes and affect a quick band-aid solution initially to ensure the show can go on? Root cause analysis, permanent solution options and regression testing all take time in complex, interdependent environments and perhaps this can wait for a few days. But how organisations can identify quick wins is a key question and process many of us havent as yet perfected.
How do such outages occur? The most common reason I have encountered is performance and scalability. It is a rare functional usage that would not have been tested several times over rendering outages due to standard functional usage being a non-starter almost. But systems can rarely be totally tested for scalability, performance and stressprimarily because the usage patterns, numbers and demographics are changing so rapidly.
I have often been askedhow can outages occur when banks spend such huge sums on information technology systems and platforms and when a significant portion of that sum is often spent on maintenance as opposed to fresh projects. The answer is quite simpleno one is perfect and even a most experienced driver can make a mistake and accidently injure someone. IT systems are often like thatit is next to impossible to test every possible eventuality before systems are deployed into production.
The complexity is often enormous, most full service banks in the US$100-300 billion assets under management range would have 1500+ applications, intertwined with information packets in a state of continuous flux, databases overflowing with data that needs to be constantly verified for accuracy, indexed for access and protected for security.
So why is performance, stress and salability so often a problem? In my experience, this is essentially due to three reasonschange/addition to usage patterns causing increased stress (previously little-stressed areas are especially susceptible), investments in infrastructure often gets ignored in pursuit of better/modern applications and the sheer complexity of database/ information management/ systems access & security.
Sign up for Computerworld eNewsletters.