Fund Operations Data Challenges
The success of any algorithm depends on the nature and reliability of the data it processes. Because data are often unreliable, success depends on recognizing the inherent weaknesses of data then developing creative approaches to overcoming them. There are a number of challenges in applying AI to asset management operations:
- No Reliable Count of Errors. The foundation for any predictive algorithm is an accurate count of the events it is predicting. Unfortunately, the asset management industry does not systematically track errors. This is because the industry allows errors below one cent of the NAV to be tolerated. Since most errors are small, the majority of errors are not tracked, including the larger errors.
- Multi Party Causality. Any trade in the industry must be coordinated with multiple counter-parties and then processed by service providers. This means that an error could have occurred anywhere along the chain. This makes the identification of the root cause of a break or error difficult.
- Threshold Breaks and False Positives. The industry has come to rely on a method of overseeing transactions that applies a threshold (e.g., 5 basis points or 1.5 standard deviations from an historical mean) to transactions for flagging. Because the error rate, even when known, is so low, the false positive rate on threshold breaks is extremely high. Most of the breaks end up being false positives, resulting in an ‘eye glaze syndrome’ by those overseeing breaks.
To address these challenges, OnCorps set on an ambitious plan to track millions of transactions with multiple asset managers.
Reducing false positives has proven a difficult challenge. We have applied classification based algorithms to isolate a combination of events that increase the prevalence rate. In addition, we are developing behavioral AI algorithms that sense “eye glaze syndrome” where analysts are clearing exceptions too quickly because they see too many false positives. These algorithms have proven to reduce the odds of missing an error by 90 percent.
Fund Operations Data Challenges
Data sourcing. Data sourcing presents two challenges. The first is conclusively identifying the state of the data. For example, is the price of a security being validated compared at the exact same time? Differences in the state of data are contributing to a number of errors in asset managers. The second challenge is accessing the data from the right source. As data are both manually and electronically transposed from one system to another, there is the potential for errors.
Threshold breaks. Since manually overseeing every transaction is impossible, firms apply threshold breaks to reduce the transactions needing review. A threshold is set (e.g., 5 basis points or 1.5 standard deviations from the mean of a 90 day average). The problem with this method is it produces too many alerts when markets are volatile. Ironically, volatility is a condition when legitimate errors can occur. The method used to flag errors produces so many exceptions that it becomes difficult to isolate the true positives from the false positives. The chart below illustrates the threshold break problem during volatile markets.
High false positives. As illustrated below, any dataset of transactions may possess a small (albeit unknown) prevalence rate of errors. If the threshold adequately covers most true positives, the false discovery rate will be large. This phenomenon is well studied in medical testing, where wide screenings of low occurrence diseases are discouraged because of the false positive rate. We have measured the mean resolve times of analysts searching for errors and have identified “eye glaze syndrome” as a by-product of exposure to high false positives.
Building a Reliable Dataset
In order to predict and preempt the root causes of errors, it is critical to develop a reliable count of errors under different conditions. To accomplish this, we have started multi-year projects with major asset managers where we track transactions, anomalies and root causes. Most asset managers and their service providers conduct only periodic checks on their transactions, more commonly referred to as trial balances. We have setup two major algorithms that scan every single transaction posted to the sub-ledgers and general ledgers of our clients. The chart below illustrates daily checks our algorithms make to ensure the postings to the sub-ledgers sum to the same figures in their corresponding general ledger controlling accounts. When the account amounts do not match, the system automatically flags the transaction or entry and alerts the analyst if needed.
In addition to the algorithms that check summations and reconcile them against the general ledger, we apply an unsupervised algorithm that tracks and classifies postings. The isolation forest stochastically simulates hundreds of thousands of trees until it isolates anomalous transactions. This method has been used successfully in detecting fraud and we have found it very useful for tackling transaction errors.
The daily scanning our algorithms performs establishes a solid count for the denominator needed to determine the odds of an error. The summation checks and isolation forests then improve our ability to identify the variables that likely increase the odds of an error. The final step is to establish whether these anomalies are true or false positives.