IHS Markit: FRTB - Sparking New Approaches for Big Data Analytics
Posted: 21 March 2017 | Author: Paul Jones | Source: IHS Markit
The introduction of the Basel Committee’s Fundamental Review of the Trading Book (FRTB) standards involves a comprehensive overhaul of banks’ market risk capital frameworks. The move from value-at-risk (VaR) to scaled expected shortfall (ES) in order to capture tail risk will significantly increase the number and complexity of the capital calculations that banks need to undertake, as well as the sheer volume of data to be managed.
From a computation perspective, this means that P&L vectors need to be generated per risk class, per liquidity horizon and per risk set. Removing the redundant permutations brings the total number of P&L runs to 63 (some of which can be done weekly), compared to two (VaR and Stress VaR) in the current approach.
Firms are faced with the challenge of performing a significantly increased range of FRTB capital calculations at scale while also managing their costs and risk. The question is: are banks’ current IT risk infrastructures up to the task ahead?
If banks want to achieve proactive and intraday risk management while also effectively managing their capital over the long-term, they will require high-performing IT infrastructure that can handle the intensive calculations required. However, many banks today rely on technologies such as relational databases and in-memory data grids (IMDGs) to conduct risk analytics, aggregation and capital calculations.
IMDGs work by replicating data or logging updates across machines. This requires copying large amounts of data over the cluster network, which has a far lower bandwidth than that of RAM. As a result, IMDGs incur substantial storage overheads, are sub-optimal when applied to pure analytics use cases, such as FRTB analytics, and are expensive to run.
In short, banks’ legacy IT architectures will need a significant overhaul when it comes to FRTB and firms are looking for alternative options. One of those options is Apache Spark, an open source processing engine built around speed, ease of use and sophisticated analytics.
Spark has a distributed programming model based on an in-memory data abstraction called Resilient Distributed Datasets (RDDs) which is purpose built for fast analytics. RDDs are immutable, support coarse-grained transformations and keep track of which transformations have been applied to them. RDD immutability rules out a big set of potential problems due to updates from multiple threads at once and lineages that can be used for RDD reconstruction. As a result, check pointing requirements are low in Spark. This makes caching, sharing and replication easy. These are significant design wins. There are other advantages over IMDGs too:
- Memory optimisation: IMDGs require the entire working set in memory only and are limited to the physical memory available. Spark can spill to disk when portfolios do not fit into memory making it far more scalable and resource efficient.
- Efficient joins: IMDGs have fixed cubes and cannot do joins across datasets which limits flexibility. Spark supports joining of multiple datasets natively. This allows reporting using different hierarchies and analytics using other reference data without the need for a new cube and additional memory. Joins are very performant as Spark does a broadcast behind the scenes of smaller datasets. Broadcasts are based on a peer-to-peer BitTorrent-like protocol.
- Polyglot analytics: Spark supports custom aggregations and analytics which can be implemented in a variety of languages: Python, Scala, Java or R compared to the limited SQL or OLAP expressions possible with IMDGs.
- Multi-tenant support: Spark supports dynamic resource allocation, resource management, queues and quotas, allowing multiple users and processes such as operations reporting, decision support, what-if and back testing to be supported on the same cluster.
- Frugal hardware requirements: The immutable nature of RDDs enables Spark to scale and provide fault tolerance efficiently. A Spark cluster is highly available without the need for Active-Active hardware.
In fact, our own studies have demonstrated many of these capabilities, highlighting the power of Spark in terms of performance, scalability and flexibility. For example, we recently completed a proof-of-concept with a European bank, which showed that our capital analytics and aggregation engine can support the FRTB capital charges for IMA and SA in single digit seconds. This is based on a portfolio of one million trades with 9 million sensitivities, 18 million P&L vectors and on hardware costing just USD20k.
As one of the most active projects on the Apache platform, Spark benefits from thousands of contributors continuously enhancing the platform. In fact, we’ve seen a 20% improvement in Spark aggregation performance year-on-year since we started building our solutions on the platform in 2016. We’re excited to see the improvements that are bound to come in the year ahead!