Random Thoughts and Observations on Data Science and Beyond

NeurIPS 2020, Finance and Business Analytics.


First Published on December 13, 2020

NeurIPS is one of the largest, oldest and most renowned AI conferences. I attended 30+ talks/posters/tutorials/panel discussions/expo sessions in the main conference and the Machine Learning for Economic Policy workshop at NeurIPS 2020 (6th – 12th December 2020). While there were many important and significant incremental research results presented at NeurIPS 2020 (including the paper on GPT3) : in this article I am going to share my thoughts about the State-of-the-Art in adoption of AI/ML in Finance and Business Analytics in general, and the single common theme is these favorable developments and a major breakthrough that was described in the fantastic keynote by Christopher Bishop.

The following are impressive:

1. The use of unstructured customer responses to hypothetical potential losses for risk tolerance assessments using NLP and Machine Learning as mentioned in the keynote by Professor Michael Kearns).

2. The use of surrogates built using Machine Learning as mentioned by Professor Susan Athey.

3. tspDB: A new time-series forecasting approach which claims to outperform Amazon Deep AR and Facebook Prophet.

4. The use of the LMDiff tool for assessing language models. (While this is generic it can have a big impact in Finance and Business Analytics in general.)

I am also excited about:

A. There are early signs of a potentially major breakthrough risk assessment of customers based on the use of networks. Some high-level details were mentioned by Alipay Ant. This seems like a fundamentally different approach at using non-numeric data (and not an attempt to beat traditional Statistics by tweaking some under-developed heuristic). Given the lack of details (and the work hasn’t been open-sourced and that this was in an Expo session) it’s hard to be sure but the approach looked very compelling to me. I hope Alipay Ant open-source their work.

B. Possible improvements in efficiency for payment settlements. Very interesting initial work seems to have been done in “Estimating Policy Functions in Payment Systems using Reinforcement Learning” P. Castro, A. Desai, H. Du, R. Garratt, F. Rivadeneyra.

C. Use of open-sourced satellite daytime images and others surveys (with mostly non-numeric data) in the prediction of the Asset Wealth Index. Yeh, C., Perez, A., Driscoll, A. et al. Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. 

I am disappointed by the very little progress—based on what I saw (happy to be corrected)— in:

I. Causal Inference: It doesn’t seem ready for use by practitioners just yet. This is the sense I get from my conversations at the conference and its not an independent thoroughly researched assessment.

II. Entity extraction from text documents (example from SEC documents).

III. Link prediction between nodes in a graphs for analyzing and using unstructured data better.

In his keynote, Christopher Bishop said that in his 35 years of work in Machine Learning the best time is now! He described how his joint work along with multiple hospitals is being taken to production in the UK in a few weeks. The problem being addressed is for assisting doctors is segmenting tumors in scans —something that would typically take 20 minutes to a few hours—with a first candidate segmentation that is highly accurate and can be corrected by the doctor—if required in the case of minor errors— saving precious time and making the doctor available for other work. As Christopher Bishop said radiologists assisted with ML will replaces radiologists without ML assistance. The solution is robust to variations in hospitals (different ML models)! It’s notable that this problem statement is more realistic from an applied perspective than say attempts at equaling or even outperforming doctors that have spectacularly failed and better than some relatively less ambitious AI projects on self-driving cars and knee MRI scans which are a long way away from production (am happy to be corrected if these have been deployed in production at scale). It’s more realistic because it is narrower (the pursuits of pure research demands that these endeavors be undertaken but these are not of interest to the practitioner in the near future). I believe the problem statement is narrower because of the inter-disciplinary collaboration that is so often inadequate.

AI/ML for Finance and Business Analytics will need to approach the formulation of problem statements in a similar manner. Genuine inter-disciplinary work would be essential. Unfortunately, Computer Scientists and Finance professionals don’t always seems to have mutual respect for each other’s professions. This could be seen during at least two panel discussions at NeurIPS 2020 and is also what I have observed over a few years. This will need to change at the earliest for creating high-impact breakthroughs (of the kind that will be seen in a few weeks in AI/ML for Healthcare) in AI/ML for Finance and Business Analytics in general!

Aniruddha M Godbole has interests in Computer Science, Applied Statistics and Finance. These are his personal views. This was first published at https://www.linkedin.com/pulse/neurips-2020-finance-business-analytics-aniruddha-godbole/