Breakthrough of Modern Data Visualization Tools in The Banking Industry
Data visualization has been heavily utilized in the banking industry for years and has been inseparable part of the analytical processes that take place in the risk and reporting departments. Rarely has this process been automated in the existing banking solutions for financial and risk management. Producing some data visualization content has always been a relatively slow process, dependent on data, derived from various sources, extracted and processed through different techniques. An old fashioned tools such as Excel is still heavily used in the banking industry, and it has been the main instrument for producing meaningful graphs and charts, based on large spreadsheet masses of data.
Recently, the banking worlds is increasingly talking about new web based solutions and software platforms for data visualization that allow export, processing, modelling and analysis of large databases. One of the best approaches to present data is by utilizing modern data visualization components that are very efficient in delivering the key message and identifying the performance indicators existing in the analyzed data.
Image that you are the chief risk officer in a bank that has disbursed seventy thousand loans in the second quarter of 2017. Today, a year and a quarter since the initial loans disbursal there is a clearer picture about what portions of the loans are bad. The bad rate is around 2.38% or 1524 loans out of total sixty-four thousand disbursed loans.
Before any multivariate analysis or credit scoring is conducted, the bad rate across several variables should be analyzed. Based on experience it can be derived that borrower’s age is usually an important distinguishing factor for bad rates. Accordingly, the loans can be divided based on the age of the borrowers as shown in the table below.
Using that table, we can create a histogram and zoom in the area of interest.
The following trends can be noticed
- Age often displays some notable pattern for most products. The distribution of loans across age groups is a relatively smooth normally distributed curve.
- The largest percentage of bad loans is in the bucket 42/45 years, and the largest amount of disbursements also falls in this bucket. This does not necessarily mean the risk is also the highest in this bucket. Not enough information is provided by those numbers, so a normalized plot needs to be created.
- There is very few data on the buckets 60. While we are developing a model we should apply sound business knowledge to modify this scarce data. For example, we may know that loans could be of very high risk for age above 60, but in this case we do not have sufficient evidence as we do not have enough data in order to validate this hypothesis. In a situation like this a high risk weight should be supplemented.
Normalized Plot
The normalized plat can be easily constructed, by scaling down each age group to 100% and then overlaying good and bad percentage records on top. The table can be extended in order to get the values for the normalized plot as can be seen below.
These plots are different than the original frequency plot and present the information differently. The following conclusions can be derived from the plot:
- There is some correlated trend between age groups and bad rates. As borrowers advance with their age it is less likely that they will default on their loans.
- Buckets <21 and >60 yeats have scarce data and this information cannot be obtained from the normalized plot. If we use the rule of thumb, which says that a minimum of 10 records (of both good and bad) are required to consider this information statistically significant, then we can take this data as insufficient to conclude on it.
It can be concluded that data visualization is crucial part of the modelling process, which is greatly facilitated by the big data, data analysis tools and technologies.