Banks gain ground after customers’ confidence hits rock bottom
The global financial crisis devastated the reputation of the UK banking industry and it is not hard to understand why public trust in banks is at a low ebb. Since 2008, there have been at least five major scandals involving one or more banks operating in the UK, writes Peter Duffy
Along with the reputational damage and massive fines imposed by regulators to fall out of the financial crisis, the banks also face huge reputational and financial costs as a result of their IT failures.
To restore trust, UK banks need to demonstrate to the Financial Conduct Authority and their customers that they are paying more care, ensuring robust risk and service management practices and assuring end user performance. Having a strong capacity planning capability is key to this.
There are a wider range of platforms to be considered by banks when it comes to capacity planning, compared to companies in other sectors, where environments are typically more standardized with a smaller set of platforms.
In such complex operating environments, where demand for different services can be volatile and high levels of change are a way of life, maintaining an accurate handle on capacity headroom and potential issues is a real challenge.
If a disk fills up on a database server and can no longer process transactions, the repercussions can be substantial. The common reaction is to throw money at the issue, adding resources, and turning to things like additional storage and virtualization to solve the problem.
This can offer a quick fix, but using new technologies like predictive capacity analytics can help banks make smarter decisions, and address any potential IT issues well before they impact service performance and customer perception.
Predictive capacity planning can help to manage the challenges associated with initiatives such as M&A, growth in business demand and consolidation of existing services and platforms. Banks need to ensure stability of service by managing assets in a smarter way across the IT landscape. Specifically, by:
- Improving the management of operational risk
- Reducing service-impacting incidents on critical systems.
From firefighting to fire prevention
Had the banks that experienced outages over the past few years better understood how their IT estate was performing ahead of time, and had advance insight into things that might impact performance in the future, disaster may have been avoided.
It’s not going to be possible to avoid all outages, but by leveraging the predictive trending provided by the latest capacity analytics solutions many outages, and performance degradations, can be avoided – no fortune-teller needed. And for those outages that can’t be avoided, the same technology can also be used to assess and de-risk disaster recovery plans.
Treat the symptoms not the illness
IT outages don’t usually happen out of the blue. Like an ill person, IT infrastructure downtime will show symptoms long before system failure occurs.
Fairly standard in most banks are tools that set threshold limits for the consumption of resources in real-time, and if usage breaches the threshold, the tool will send out an alert to take immediate action to prevent it from cascading.
Setting thresholds throws up two issues that are worth noting. Firstly, at what level should a threshold be set? A setting of 70% might be relevant for some systems and resources, but absolutely useless for others (for example, database servers typically grab all the memory they can, and so usage of 95% or more is not uncommon – therefore, how do you detect problems here with a threshold crossing?).
Secondly, setting a threshold is a reactive solution, as the bank would only discover and take action when a threshold is about to be breached and an issue is imminent. The bank would find itself on the back foot, meaning it would be in a disadvantaged negotiating position if it had to approach a vendor to acquire more storage, say, at the eleventh hour. (And if your only tool is threshold alerts, the only real solution to this is to set your thresholds lower in order to give you a longer response time – but this leaves you dealing with many more false positives as systems cross thresholds during normal operations, and ultimately leads to overworked IT teams ignoring threshold alerts altogether.)
This is where predictive analytics come in. Banks need to adopt a longer range proactive strategy that helps them avoid problems in the first place. Predictive capacity analytics can identify the underlying, fluctuating usage patterns across infrastructure resources. For instance, distinguishing between patterns of behaviour during busy business hours and batch processing at night. By applying advanced algorithms to accurately extrapolate these trends and patterns, predictive analytics enhances visibility and control, meaning banks can take proactive action to treat the symptoms before any illness strikes.
Map it out
Bank infrastructures can be very complex and dynamic, with users across geographies, time zones, cultures and languages. As such, complexity translates as disaster when an outage strikes.
In such complex environments, establishing the cause of any failure, and the area responsible for providing resolution, can be resource intensive and time consuming, resulting in loss of earnings and reputational damage.
Once identified, any fix needs to be tested and implemented with consideration given to other running services that have remained operational.
Understanding the links to nodes in both logical and physical formats is essential in this process and an absolute necessity before announcing the green light to stakeholders.
Similarly, new or changed services being introduced should be mapped out, allowing the bank’s IT team time to confidently assess the impact the new service will have on existing capacity levels and lay down plans to deal with growth and/or failure in a controlled manner. This provides for faster and more efficient trouble shooting.
Ensure smooth sailing
Managing day-to-day IT operations at a bank is like steering a large ship; some days, the trip can be smooth sailing, while other days are fraught with stress and choppy seas to navigate. Why does the website slow down at 10:00 am every Tuesday? How can the bank support the roll-out of a new service without adding more hardware?
Having the analytics needed to accurately model and understand the complex inter-relationships at play across the environment and give plenty of advance visibility of any potential hazards helps to cut through those choppy seas and navigate a steady course. Only then can banks regain the confidence of the regulators and its customers.