# Intensive School in ML/AI

## The School

**Dates:** 27-31 March, 2019

**Venue: **Christ Church, Oxford

The delegates joined our unique School in AI and ML. Following an introduction to data science and a refresher on Python programming on Level39, Canary Wharf, they went up to Oxford to study the theory and practise machine learning on real financial examples.

The delegates networked with other data scientists, fintech industry leaders, and Oxford academics.

They reviewed, learned, and mastered:

- probability and statistics
- linear regression methods
- dimensionality reduction
- unsupervised machine learning
- bias-variance tradeoff
- model and feature selection
- classification
- neural nets
- deep learning
- recurrent neural networks, including LSTM
- reinforcement learning
- current frontiers in AI and ML

## Schedule

__Day 1__

**08:30 – 09:00** **Registration and welcome**

**09:00 – 10:00** **Lecture 1:** Introduction to data science

**10:00 – 10:30** **Tutorial 1**

**10:30 – 11:00** *Coffee break*

**11:00 – 12:00** **Lecture 2:** Probability theory

12:00 – 12:30 Tutorial 2

**12:30 – 13:30 Lunch**

13:30 – 14:30 Lecture 3: Linear algebra

14:30 – 15:00 **Tutorial 3**

15:00 – 15:30 *Coffee break*

15:30 – 16:30 Lecture 4: Optimization theory

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

**18:00 – 19:30 Tour of Christ Church**

**19:30 – 21:00** *Dinner at the Dining Hall*

__Day 2__

08:30 – 09:00 *Registration and welcome*

09:00 – 10:00 Lecture 1: Statistical inference and estimation theory

10:00 – 10:30 Tutorial 1

10:30 – 11:00 *Coffee break*

11:00 – 12:00 Lecture 2: Linear regression

12:00 – 12:30 Tutorial 2

12:30 – 13:30 *Lunch*

13:30 – 14:30 Lecture 3: Debugging linear regression

14:30 – 15:00 Tutorial 3

15:00 – 15:30 *Coffee break*

15:30 – 16:30 Lecture 4: Principal Components Analysis (PCA) and dimensionality reduction

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 *Tour of Oxford City Centre*

19:30 – 21:00 *Dinner at the Dining Hall*

__Day 3__

08:30 – 09:00 *Registration and welcome*

09:00 – 10:00 Lecture 1: From statistics to supervised Machine Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 *Coffee break*

11:00 – 12:00 Lecture 2: Model and feature selection

12:00 – 12:30 Tutorial 2

12:30 – 13:30 *Lunch*

13:30 – 14:30 Lecture 3: Classification methods

14:30 – 15:00 Tutorial 3

15:00 – 15:30 *Coffee break*

15:30 – 16:30 Lecture 4: Unsupervised Machine Learning

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 *Break*

19:30 – 21:00 *Dinner at the Dining Hall*

__Day 4__

08:30 – 09:00 *Registration and welcome*

09:00 – 10:00 Lecture 1: Deep Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 *Coffee break*

11:00 – 12:00 Lecture 2: Recurrent Neural Networks

12:00 – 12:30 Tutorial 2

12:30 – 13:30 *Lunch*

13:30 – 14:30 Lecture 3: Applications of Neural Networks in finance

14:30 – 15:00 Tutorial 3

15:00 – 15:30 *Coffee break*

15:30 – 16:30 Lecture 4: Prediction from financial time series

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 *Visit to Oxford’s historic pub The Eagle & Child, home of The Inklings*

19:30 – 21:00 *Dinner at the Dining Hall*

__Day 5__

08:30 – 09:00 *Registration and welcome*

09:00 – 10:00 Lecture 1: Reinforcement Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 *Coffee break*

11:00 – 12:00 Lecture 2: Inverse Reinforcement Learning

12:00 – 12:30 Tutorial 2

12:30 – 13:30 *Lunch*

13:30 – 14:30 Lecture 3: Deep Reinforcement Learning

14:30 – 15:00 Tutorial 3

15:00 – 15:30 *Coffee break*

15:30 – 16:30 Lecture 4: Particle filtering

16:30 – 17:00 Tutorial 4

17:00 – 18:00 *Graduation and leaving drinks at the Buttery*

## The Course

## Course Design

The course was designed for **practitioners by practitioners** with a mathematical and computational background. It was meant to build a **solid theoretical foundation**, which is vital for understanding data science and machine learning.

The emphasis, however, was not on theory but on getting results in **practice**. As George Pólya put it, mathematics is not a spectator sport!

For this very reason, we used **active learning**. Practical exercises were provided in unassessed tutorials, and **Jupyter-based laboratory sessions**.

## Overview

There were no formal prerequisites for the course, although some familiarity with **linear algebra**, **probability**, and **optimization theory** was a plus. We refreshed the delegates’ memory whenever they needed a refresher.

Those who wished to read up on AI/ML before starting the course were recommented ** Intelligent Data Analysis** by Berthold and Hand,

**by Hastie, Tibshirani, and Friedman, and**

*The Elements of Statistical Learning: Data Mining, Inference, and Prediction***by Goodfellow, Bengio, and Courville. Some of these books are freely available online.**

*Deep Learning*The course began with a review of **Python** programming, visualization, and libraries.

We reviewed **probability** and **statistics** needed for machine learning, examined **linear regression** methods as a basic example of a supervised machine learning technique, considered **dimensionality reduction**, unsupervised machine learning, **bias-variance tradeoff**, **model** and **feature selection**, **classification**, until focussing on **neural nets** and **deep learning** (the emphasis of this course).

Deep learning architectures such as **deep neural networks**, **deep belief networks**, and **recurrent neural networks** had been applied to many fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, and board game programmes, where they had produced results **comparable** to and, in some cases, **superior** to human experts.

## Syllabus

**Introduction to data science**

- Data, information, knowledge, understanding, wisdom
- Analysis and synthesis
- Data analysis and data science
- The process of data science
- Artificial Intelligence and Machine Learning
- The language of Machine Learning
- Machine Learning and statistics

** **

**Probability theory**

- Random experiment and the sample space
- The classical interpretation of probability
- The frequentist interpretation of probability
- Bayesian interpretation of probability
- The axiomatic interpretation of probability
- Kolmogorov’s axiomatization
- Conditional probability
- The Law of Total probability
- Bayes’s theorem
- Random variables
- Expectations
- Variances
- Covariances and correlations

** **

**Linear algebra**

- Vectors and matrices
- Matrix multiplication
- Inverse matrices
- Independence, basis, and dimension
- The four fundamental spaces
- Orthogonal vectors
- Eigenvalues and eigenvectors

** **

**Optimization theory**

- The oprimization problem
- Optimization in one dimension
- Optimization in multiple dimensions
- Grid search
- Gradient-based optimization
- Vector calculus
- Quasi-Newton methods
- Gradient descent (stochastic, batch)
- Evolutionary optimization
- Optimization in practice

** **

**Statistical inference and estimation theory**

- Point estimation
- Maximum likelihood estimation
- Loss functions
- Bias-variance tradeoff (dilemma)
- Standard error
- Fisher information
- Cramér-Rao lower bound (CRLB)
- Consistency
- Hypothesis testing
- p-values
- Bayesian estimation

** **

**Linear regression**

- Linear regression model in matrix form
- Disturbance versus residual
- The Ordinary Least Squares (OLS) approach
- Relationship with maximum likelihood estimation
- The geometry of linear regression
- The orthogonal projection with/without the intercept

** **

**Debugging linear regression**

- Variance partitioning
- Coefficient of determination (R2)
- An estimation theory view of linear regression
- How many data points do we need?
- Chi-squared distribution
- Chochran’s theorem
- Degrees of freedom
- Student’s t-distribution test
- Adjusted coefficient of determination
- The F-statistic
- The p-value

** **

**Principal Components Analysis (PCA) and dimensionality reduction**

- Modelling data as a random variance
- The covariance matrix
- Key properties of covariance matrices
- The sample covariance matrix
- The sample covariance matrix is unbiased
- The correlation matrix
- The sample correlation matrix
- Centred data
- The application of PCA
- The interpretation of PCA
- The advantages of PCA

** **

**From statistics to supervised Machine Learning**

- The supervised machine learning problem
- Train/test error and overfitting
- Underfitting
- The bias-variance tradeoff revisited
- Multicollinearity
- Polynomial regression

** **

**Model and feature selection**

- Model selection and averaging, drop-out
- Cross-validation: leave-N-out, K-fold
- Cross-validation for time series
- Sliding window for time series
- Bootstrap
- Ridge regression, L2 regularization
- Derivation and statistical properties of the ridge estimator
- Examples of ridge regression
- LASSO regression, L1 regularization
- Examples of LASSO regression
- Applications to market impact models

** **

**Classification methods**

- The classification problem
- Generalized Linear Models (GLM)
- Logistic regression as an example of a GLM
- Odds
- Evaluation of a classification model: confusion matrix
- Evaluation of a classification model: ROC chart
- Decision tree models
- Random forests

** **

**Unsupervised Machine Learning**

- K-means clustering
- Enhancements to K-means, K++
- Automatic specification of number of clusters
- Partitioning Around Medoids (PAM)
- Hierarchical clustering
- Agglomerative hierarchical clustering
- Divisive hierarchical clustering
- Linkage methods

** **

**Deep Learning**

- The perceptron
- Feed-forward Neural Networks
- Other networks
- Convergence results
- Stochastic gradient descent
- Variants of the stochastic gradient descent
- Weight decay scheduling
- Weight initialization
- A geometric approach to model interpretability
- A statistical approach to model interpretability

** **

**Recurrent Neural Networks**

- Autoregressive Neural Networks
- Gated Recurrent Networks
- Long-Short Term Memory Networks
- Semi-parametric Neural Networks for panel data
- Hybrid GARCH-ML models

** **

**Applications of Neural Networks in finance**

- Machine Learning in algorithmic finance
- Momentum strategies
- Predicting portfolio returns
- Data preparation
- Price impact models
- Limit Order Book updates
- Adverse selection
- Predictive performance comparisons

** **

**Prediction from financial time series**

- Time series data
- Classical time series analysis
- Autoregressive and moving average processes
- Stationarity
- Parameteric tests
- In-sample diagnostics
- Time series cross-validation
- Predicting events
- Entropy
- Confusion matrix
- ROC chart
- Performance terminology
- Practical prediction issues
- Confusion matrices with oversampling
- ROC curves with oversampling
- Kernel regression
- Kernel estimators

** **

**Reinforcement Learning**

- Inverse Reinforcement Learning
- Reinforcement Learning and Inverse Reinforcement Learning: differences and similarities
- Inverse Reinforcement Learning and Imitation Learning
- Constraints-based IRL
- Maximum entropy IRL
- IRL for linear-quadratic-Gaussian regulation
- Examples of IRL problems in finance

** **

**Deep Reinforcement Learning**

- Combination of Reinforcement Learning techniques with Supervised Learning approaches of Deep Neural Networks
- Combination of Reinforcement Learning techniques with unsupervised learning approaches of Deep Neural Networks
- Deep Reinforcement Learning techniques for partially observable Markov decision processes
- State of the art in Deep Reinforcement Learning

** **

**Particle filtering**

- State-space models
- Particle filtering methods
- Applying the particle filter to stochastic volatility model with leverage and jumps
- The Kalman filter
- Some examples of linear-Gaussian state-space models: the Newtonian system, the autoregressive moving average models, continuous-time stochastic processes (the Wiener process, geometric Brownian motion (GBM), the Ornstein-Uhlenbeck process)
- The extended Kalman filter
- An example application of the extended Kalman filter: modelling credit spread
- Outlier detection in (extended) Kalman filtering
- Gaussian assumed density filtering
- Parameter estimation
- Relationship with Markov chain Monte Carlo methods
- Prediction
- Diagnostics

## Datasets

The course examines several real-life datasets, including:

- S&P 500 stock data
- High-frequency commodity future price data
- Cryptocurrency order book data
- FINRA TRACE corporate bond data
- Car insurance claim data
- The National Institute of Diabetes and Digestive and Kidney Diseases data about the Pima group diabetes tendency

## Bibliography

The course was designed to be self-contained. However, those wishing to read up on Machine Learning / Artificial Intelligence before starting the course were recommended:

- Michael R. Berthold (ed.), David Hand (ed.).
*Intelligent Data Analysis: An Introduction*, second edition. Springer, 2006. - Trevor Hastie, Robert Tibshirani, Jerome Friedman.
*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*, second edition. Springer, 2009. - Ian Goodfellow, Yoshua Bengio, Aaron Courville.
*Deep Learning*. MIT Press, 2017.

- Murray R. Spiegel, John Schiller, R. Alu Srinivasan.
*Schaum’s Outlines: Probability and Statistics*, second edition. McGraw-Hill, 2000. - John B. Fraleigh, Raymond A. Beauregard.
*Linear Algebra*, third edition. Addison Wesley, 1995. - Gerard Cornuejols, Reha Tütüncü.
*Optimization Methods in Finance*. Cambridge University Press, 2007. - Philip E. Gill, Walter Murray, Margaret H. Wright.
*Practical Optimization*. Emerald Group Publishing Limited, 1982.

We also recommended the following video lectures:

- Gilbert Strang.
*Linear Algebra*, course 18.06 MIT, Fall of 1999: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/

## Instructors

## Paul Bilokon

**CEO and Founder of Thalesians Ltd**. Previously served as Director and Head of global credit and core e-trading quants at Deutsche Bank, the **teams that he helped set up** with Jason Batt and Martin Zinkin. Having also worked at Morgan Stanley, Lehman Brothers, and Nomura, Paul **pioneered electronic trading in credit** with Rob Smith and William Osborn at Citigroup.

Paul has graduated from Christ Church, University of Oxford, with a **distinction and Best Overall Performance prize**. He has also graduated twice from Imperial College London.

Paul’s lectures at Imperial College London in machine learning for MSc students in mathematics and finance and **his courses consistently achieve top rankings among the students**.

Paul has made **contributions to mathematical logic, domain theory, and stochastic filtering theory**, and, with Abbas Edalat, has published a prestigious LICS paper. Paul’s books are being published by Wiley and Springer.

Dr Bilokon is a Member of the British Computer Society, Institution of Engineering and Technology, and European Complex Systems Society.

Paul is a **frequent speaker at premier conferences** such as Global Derivatives/QuantMinds, WBS QuanTech, AI, and Quantitative Finance conferences, alphascope, LICS, and Domains.

## Ivan Zhdankin

Quantitative researcher with experience in diverse areas of quantitative finance, including risk modelling, xVA, and electronic trading across asset classes. Ivan has consulted at many different banks in London, including JP Morgan, Citigroup, Jefferies, Nomura, HSBC, and BNP Paribas.

Ivan **has generated convincing results in electronic trading alpha with neural nets**. Ivan has **developed a trading platform for the cryptocurrency for electronic market making**.

Ivan is an author of several machine learning articles and appears regularly in QuantNews. Ivan regularly delivers guest lectures on artificial intelligence and machine learning at Imperial College and at Thalesians’ seminars.

Ivan has graduated from new Economic School with a Masters degree in economics. He has a solid mathematical background from Moscow State University, where he **studied under the celebrated Albert Shiryaev, one of the developers of modern probability theory**.

Ivan is an accomplished sportsman.

## Prof. Matthew Dixon

Assistant Progessor in the Applied Math Department at the Illinois Institute of Technology. **His research in computational methods for finance is funded by Intel**.

Matthew began his career in structured credit trading at Lehman Brothers in London before pursuing academics and consulting for financial institutions in quantitative trading and risk modelling.

He holds a Ph.D. in Applied Mathematics from Imperial College (2007) and has held postdoctoral and visiting professor appointments at Stanford University and UC Davis respectively.

He has published **over 20 peer reviewed publications on machine learning and financial modelling**, has been **cited in Bloomberg Markets and the Financial Times as an AI in fintech expert**.

## Platinum Sponsor

We’re living in the most interesting time of humanity: We learned to communicate world-wide using the Internet. Now we’re approaching the next evolutionary step: Digitizing trade. The first two fully digital currencies currently have a market capitalization of over $100 billion and $20 billion, respectively. In terms of the global economy, this is small – in the region of 0.1%. So this is only the beginning. CBA Finance AG set itself the goal to develop software, systems, and tools to participate in this budding new market, in order to help building the new, all-digital financial world.

## Venue

## The University of Oxford

Our trainings take place at one of the constituent colleges of the University of Oxford, the **oldest university in the English-speaking world** and the world’s second-oldest university in continuous operation. Teaching at Oxford goes back as far as 1096.

There are 38 constituent colleges at Oxford and a full range of academic departments organized into four divisions. Christ Church, or Ædes Christi in Latin, is a constituent college of the University of Oxford. It is colloquially known as The House. The college, especially its dining hall, have been **featured in the Harry Potter movies**.

Sixty-nine Novel Prize winners, four Fields Medalists, and six Turing Award winners have studied, worked, or held visiting fellowships at the University of Oxford.

For all participants, accommodation on Oxford’s campus was provided. They joined a **distinguished company of scholars** who lived in **these very rooms**: Lewis Carroll, Albert Einstein, William Ewart Gladstone, Robert Hooke, John Locke, Sir Robert Peel, and others.