How machine learning can power your business
An unprecedented volume of data is currently being generated across the globe with no less than an estimated 2.5 quintillion (1018) bytes of data each day at our current pace.
The variety of formats in which this data is being produced, and its structural complexity, are also on the rise. Collectively, these factors are driving demand among institutions for advanced analytics to generate actionable insights.
Enter machine learning. At the most basic level, machine learning encompasses the use of computational algorithms more advanced than the analytics methods (data mining approaches, for example) traditionally employed to deliver insights into large datasets. Machine learning techniques are firmly rooted in the science of statistics and have valuable applications not least in financial services. These applications include detecting financial crime, predicting loan repayment defaults, and providing personalised customer engagement.
While machine learning is undeniably a powerful data analytics tool, the key question a business must address is how to get the best out of these new methods while avoiding potential pitfalls. Without a well-considered business case, machine learning projects can yield disappointing results. It is tempting to think that machine learning can solve any business question, but for many projects more traditional and less technically challenging analytical methods, such as statistical modelling, can provide a comparable level of insight.
To best understand the value of machine learning, and how to use it, one needs to know the kind of programmes that are best suited to these techniques and the data required to allow them to deliver optimal results. When it comes to answering a number of general questions, for which machine learning algorithms provide the best tools, so let’s examine four of them: ‘yes or no,’ ‘how much,’ ‘where does this belong’ and ‘what to suggest.’
Classification: Yes or no?
Decision tree-based algorithms are used as the basis for tackling many classification and regression problems. The generalised classification and regression tree (CART) is the catch-all term used to describe the general use of decision trees to solve such problems. These methods typically have the benefit of allowing the importance of a variable to be measured and the decision tree to be visually charted, making decisions easy to understand and explain.
This feature of the machine learning models is of key importance to the data science team when solving business challenges, as it allows the machine learning model to be easily understood from a non-technical perspective. Machine learning models that can be intuitively explained are key to ensuring that data science teams and stakeholders can work in parallel, rather than diverging.
Classification algorithms have been proposed as a key element for countering global money laundering. Typically, however, money laundering is not an individual activity, but rather involves groups of people working together as a collective. Savage et al. (2017) recently proposed an approach to this problem involving machine learning techniques. This uses a combination of network analysis, involving an analysis of the groups of people and their accounts that are potentially involved in money laundering activities, alongside a classification algorithm. An evaluation of this combined approach concluded that it is able to correctly detect suspicious activity with a low rate of false positives, and hence has a high potential for real-world deployment.
Regression: How much?
Regression algorithms on the other hand predict a numerical value.
The simplest type of regression is linear regression, which seeks to relate two variables by using the value of one variable to predict the value of the other variable. An example of a simple linear regression is predicting the salary of an employee based on his/her age. In this type of regression, the data you need to set up the algorithm would include historical data containing values of both variables. Although regression can be performed with relatively little data, the quality of the output will generally increase with the amount of training data available. It is best practice to use regression when large training datasets are available.
Regression has been used extensively for applied stress testing in financial institutions. Regression is often well suited to analyses where the ability to extrapolate to unknown situations is required. In terms of stress-testing, this means predicting how financial institutions will cope under extreme market conditions. Specifically, regression analysis allows the extrapolation of historical data during non-extreme market conditions to extreme conditions.
An example of the use of regression for stress-testing is provided by the Federal Reserve Bank of New York. In order to estimate possible future capital shortfalls, linear regression models are used. These calculations form part of the bank’s balance sheet projections that feed into a wider banking stress-test.
In the case of stress-testing specifically, the ability to communicate to senior business stakeholders the results, methods, and assumptions of a model is of paramount importance. Given the increased difficulty associated with building persuasive arguments via more sophisticated models, linear models are typically preferred.
Clustering: Where does this belong?
Clustering algorithms seek to group data points into distinct groups based on their features. A successful clustering method will provide support for specific hypotheses, such as the data being separable into high or low risk groups.
Clustering is useful in exploratory analysis because it can automatically identify structure within data. In situations where it is either impossible or impractical for a human to identify trends in the data, clustering algorithms can provide initial insights that can then be used to test individual hypotheses. For instance, clustering methods can be applied in a straightforward fashion in order to group customers into sets, which can offer insights that in turn prompt further analysis.
This makes clustering an effective technique for anomaly detection. In financial services, this is useful in anti-money laundering to identify unusual or fraudulent transactions. Citibank, for example, have entered into a strategic partnership with Feedzai, a machine learning solutions business, to provide real-time fraud risk management. Feedzai’s solution transforms data streams to create risk profiles for fraud detection, using machine learning to process client transactions automatically. Feedzai is able to do this in millisecond timescales, providing Citibank with a rapid and powerful fraud detection product.
Collaborative filtering: What to suggest?
Recommendation engines are all around us, from Amazon purchasing suggestions through to Facebook friend recommendations. If recommendation engines have been slow to gain traction in the financial services sector, they have the potential to change the way that portfolios are optimised and how products are cross-sold.
Collaborative filtering is the most common machine learning method underlying recommendation engines. The name derives from the idea that data from many similar users can be combined to generate product recommendations to an individual customer in the way that real-world friends would collectively recommend purchases to one another.
InCube is currently developing bespoke recommendation engines for private banks. These engines use several AI techniques, including collaborative filtering, to recommend products for clients to add to their existing portfolios.
Safeguarding your reputation: Ethics of machine learning
All these different algorithms can be potentially transformative and offer techniques for harnessing the enormous power of data to detect crime or to provide a better service to customers. Some are still in development stages, while others are up and running. In all cases businesses need to understand the existing structure of their data and appreciate how it maps onto the different algorithmic techniques available if they are to get the best results.
However, caution is most certainly required. In recent years, some well-designed machine learning projects have led to unintended reputational damage for institutions. This is due to their work has been judged “unethical” due to use of sensitive personal data or the automation of key personal decisions without the necessary degree of human oversight.
Microsoft’s Twitter chatbot, Tay, released in 2016, is a case in point. Tay was designed to mimic the speech patterns of a ‘typical’ millennial by learning from the Twitter conversations in which it engaged. This ultimately proved to be to Tay’s detriment, as it was soon making derogatory comments.
With advanced analytics techniques becoming a key competitive advantage in the digital age, businesses must quickly learn to adapt to such new ethical concerns. The best defence against ethical breaches is to ensure that all members of a machine learning project are aware of their responsibilities and are sufficiently aware and empowered to raise concerns.
It is no longer acceptable for data scientists to behave like the naïve, impartial computers they programme; instead, it is imperative that they properly reflect upon the impacts of adding sensitive information to their models. Likewise, stakeholders and managers cannot treat their data science team as a black box that will simply return an insight within a mere few weeks after their initial project briefing. They too must take an active interest in what data is being used, and how this could be viewed by an external audience.
Machine learning can be a powerful addition to any data analytics tool kit, but it demands careful planning and a high-level understanding by all stakeholders of the techniques involved. It is also important that data scientists and organisational stakeholders bear ethical considerations in mind. Adherence to best practices that can mitigate these challenges and harness the power of machine learning to generate enormous value for the business.
By Jibran Ahmed, managing principal, capital markets at Capco