The Second Thalesian

Intensive School in ML/AI

Christ Church, Oxford

Theory

Practice

Novelty

Education

The School

Dates: 27-31 March, 2019

Venue: Christ Church, Oxford

The delegates joined our unique School in AI and ML. Following an introduction to data science and a refresher on Python programming on Level39, Canary Wharf, they went up to Oxford to study the theory and practise machine learning on real financial examples.

The delegates networked with other data scientists, fintech industry leaders, and Oxford academics.

They reviewed, learned, and mastered:

probability and statistics
linear regression methods
dimensionality reduction
unsupervised machine learning
bias-variance tradeoff
model and feature selection
classification
neural nets
deep learning
recurrent neural networks, including LSTM
reinforcement learning
current frontiers in AI and ML

Schedule

Day 1

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Introduction to data science

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Probability theory

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Linear algebra

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Optimization theory

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Tour of Christ Church

19:30 – 21:00 Dinner at the Dining Hall

Day 2

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Statistical inference and estimation theory

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Linear regression

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Debugging linear regression

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Principal Components Analysis (PCA) and dimensionality reduction

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Tour of Oxford City Centre

19:30 – 21:00 Dinner at the Dining Hall

Day 3

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: From statistics to supervised Machine Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Model and feature selection

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Classification methods

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Unsupervised Machine Learning

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Break

19:30 – 21:00 Dinner at the Dining Hall

Day 4

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Deep Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Recurrent Neural Networks

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Applications of Neural Networks in finance

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Prediction from financial time series

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Visit to Oxford’s historic pub The Eagle & Child, home of The Inklings

19:30 – 21:00 Dinner at the Dining Hall

Day 5

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Reinforcement Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Inverse Reinforcement Learning

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Deep Reinforcement Learning

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Particle filtering

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Graduation and leaving drinks at the Buttery

The Course

School

Course Design

The course was designed for practitioners by practitioners with a mathematical and computational background. It was meant to build a solid theoretical foundation, which is vital for understanding data science and machine learning.

The emphasis, however, was not on theory but on getting results in practice. As George Pólya put it, mathematics is not a spectator sport!

For this very reason, we used active learning. Practical exercises were provided in unassessed tutorials, and Jupyter-based laboratory sessions.

School

Overview

There were no formal prerequisites for the course, although some familiarity with linear algebra, probability, and optimization theory was a plus. We refreshed the delegates’ memory whenever they needed a refresher.

Those who wished to read up on AI/ML before starting the course were recommented Intelligent Data Analysis by Berthold and Hand, The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and Friedman, and Deep Learning by Goodfellow, Bengio, and Courville. Some of these books are freely available online.

The course began with a review of Python programming, visualization, and libraries.

We reviewed probability and statistics needed for machine learning, examined linear regression methods as a basic example of a supervised machine learning technique, considered dimensionality reduction, unsupervised machine learning, bias-variance tradeoff, model and feature selection, classification, until focussing on neural nets and deep learning (the emphasis of this course).

Deep learning architectures such as deep neural networks, deep belief networks, and recurrent neural networks had been applied to many fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, and board game programmes, where they had produced results comparable to and, in some cases, superior to human experts.

Syllabus

Introduction to data science

Data, information, knowledge, understanding, wisdom
Analysis and synthesis
Data analysis and data science
The process of data science
Artificial Intelligence and Machine Learning
The language of Machine Learning
Machine Learning and statistics

Probability theory

Random experiment and the sample space
The classical interpretation of probability
The frequentist interpretation of probability
Bayesian interpretation of probability
The axiomatic interpretation of probability
Kolmogorov’s axiomatization
Conditional probability
The Law of Total probability
Bayes’s theorem
Random variables
Expectations
Variances
Covariances and correlations

Linear algebra

Vectors and matrices
Matrix multiplication
Inverse matrices
Independence, basis, and dimension
The four fundamental spaces
Orthogonal vectors
Eigenvalues and eigenvectors

Optimization theory

The oprimization problem
Optimization in one dimension
Optimization in multiple dimensions
Grid search
Gradient-based optimization
Vector calculus
Quasi-Newton methods
Gradient descent (stochastic, batch)
Evolutionary optimization
Optimization in practice

Statistical inference and estimation theory

Point estimation
Maximum likelihood estimation
Loss functions
Bias-variance tradeoff (dilemma)
Standard error
Fisher information
Cramér-Rao lower bound (CRLB)
Consistency
Hypothesis testing
p-values
Bayesian estimation

Linear regression

Linear regression model in matrix form
Disturbance versus residual
The Ordinary Least Squares (OLS) approach
Relationship with maximum likelihood estimation
The geometry of linear regression
The orthogonal projection with/without the intercept

Debugging linear regression

Variance partitioning
Coefficient of determination (R2)
An estimation theory view of linear regression
How many data points do we need?
Chi-squared distribution
Chochran’s theorem
Degrees of freedom
Student’s t-distribution test
Adjusted coefficient of determination
The F-statistic
The p-value

Principal Components Analysis (PCA) and dimensionality reduction

Modelling data as a random variance
The covariance matrix
Key properties of covariance matrices
The sample covariance matrix
The sample covariance matrix is unbiased
The correlation matrix
The sample correlation matrix
Centred data
The application of PCA
The interpretation of PCA
The advantages of PCA

From statistics to supervised Machine Learning

The supervised machine learning problem
Train/test error and overfitting
Underfitting
The bias-variance tradeoff revisited
Multicollinearity
Polynomial regression

Model and feature selection

Model selection and averaging, drop-out
Cross-validation: leave-N-out, K-fold
Cross-validation for time series
Sliding window for time series
Bootstrap
Ridge regression, L2 regularization
Derivation and statistical properties of the ridge estimator
Examples of ridge regression
LASSO regression, L1 regularization
Examples of LASSO regression
Applications to market impact models

Classification methods

The classification problem
Generalized Linear Models (GLM)
Logistic regression as an example of a GLM
Odds
Evaluation of a classification model: confusion matrix
Evaluation of a classification model: ROC chart
Decision tree models
Random forests

Unsupervised Machine Learning

K-means clustering
Enhancements to K-means, K++
Automatic specification of number of clusters
Partitioning Around Medoids (PAM)
Hierarchical clustering
Agglomerative hierarchical clustering
Divisive hierarchical clustering
Linkage methods

Deep Learning

The perceptron
Feed-forward Neural Networks
Other networks
Convergence results
Stochastic gradient descent
Variants of the stochastic gradient descent
Weight decay scheduling
Weight initialization
A geometric approach to model interpretability
A statistical approach to model interpretability

Recurrent Neural Networks

Autoregressive Neural Networks
Gated Recurrent Networks
Long-Short Term Memory Networks
Semi-parametric Neural Networks for panel data
Hybrid GARCH-ML models

Applications of Neural Networks in finance

Machine Learning in algorithmic finance
Momentum strategies
Predicting portfolio returns
Data preparation
Price impact models
Limit Order Book updates
Adverse selection
Predictive performance comparisons

Prediction from financial time series

Time series data
Classical time series analysis
Autoregressive and moving average processes
Stationarity
Parameteric tests
In-sample diagnostics
Time series cross-validation
Predicting events
Entropy
Confusion matrix
ROC chart
Performance terminology
Practical prediction issues
Confusion matrices with oversampling
ROC curves with oversampling
Kernel regression
Kernel estimators

Reinforcement Learning

Inverse Reinforcement Learning
Reinforcement Learning and Inverse Reinforcement Learning: differences and similarities
Inverse Reinforcement Learning and Imitation Learning
Constraints-based IRL
Maximum entropy IRL
IRL for linear-quadratic-Gaussian regulation
Examples of IRL problems in finance

Deep Reinforcement Learning

Combination of Reinforcement Learning techniques with Supervised Learning approaches of Deep Neural Networks
Combination of Reinforcement Learning techniques with unsupervised learning approaches of Deep Neural Networks
Deep Reinforcement Learning techniques for partially observable Markov decision processes
State of the art in Deep Reinforcement Learning

Particle filtering

State-space models
Particle filtering methods
Applying the particle filter to stochastic volatility model with leverage and jumps
The Kalman filter
Some examples of linear-Gaussian state-space models: the Newtonian system, the autoregressive moving average models, continuous-time stochastic processes (the Wiener process, geometric Brownian motion (GBM), the Ornstein-Uhlenbeck process)
The extended Kalman filter
An example application of the extended Kalman filter: modelling credit spread
Outlier detection in (extended) Kalman filtering
Gaussian assumed density filtering
Parameter estimation
Relationship with Markov chain Monte Carlo methods
Prediction
Diagnostics

Datasets

The course examines several real-life datasets, including:

S&P 500 stock data
High-frequency commodity future price data
Cryptocurrency order book data
FINRA TRACE corporate bond data
Car insurance claim data
The National Institute of Diabetes and Digestive and Kidney Diseases data about the Pima group diabetes tendency

Bibliography

The course was designed to be self-contained. However, those wishing to read up on Machine Learning / Artificial Intelligence before starting the course were recommended:

Michael R. Berthold (ed.), David Hand (ed.). Intelligent Data Analysis: An Introduction, second edition. Springer, 2006.
Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition. Springer, 2009.
Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press, 2017.

Those wishing to read up on the mathematical foundations of their course (covered on the first day) were recommended:

Murray R. Spiegel, John Schiller, R. Alu Srinivasan. Schaum’s Outlines: Probability and Statistics, second edition. McGraw-Hill, 2000.
John B. Fraleigh, Raymond A. Beauregard. Linear Algebra, third edition. Addison Wesley, 1995.
Gerard Cornuejols, Reha Tütüncü. Optimization Methods in Finance. Cambridge University Press, 2007.
Philip E. Gill, Walter Murray, Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.

We also recommended the following video lectures:

Gilbert Strang. Linear Algebra, course 18.06 MIT, Fall of 1999: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/

Instructors

Instructor

Paul Bilokon

CEO and Founder of Thalesians Ltd. Previously served as Director and Head of global credit and core e-trading quants at Deutsche Bank, the teams that he helped set up with Jason Batt and Martin Zinkin. Having also worked at Morgan Stanley, Lehman Brothers, and Nomura, Paul pioneered electronic trading in credit with Rob Smith and William Osborn at Citigroup.

Paul has graduated from Christ Church, University of Oxford, with a distinction and Best Overall Performance prize. He has also graduated twice from Imperial College London.

Paul’s lectures at Imperial College London in machine learning for MSc students in mathematics and finance and his courses consistently achieve top rankings among the students.

Paul has made contributions to mathematical logic, domain theory, and stochastic filtering theory, and, with Abbas Edalat, has published a prestigious LICS paper. Paul’s books are being published by Wiley and Springer.

Dr Bilokon is a Member of the British Computer Society, Institution of Engineering and Technology, and European Complex Systems Society.

Paul is a frequent speaker at premier conferences such as Global Derivatives/QuantMinds, WBS QuanTech, AI, and Quantitative Finance conferences, alphascope, LICS, and Domains.

Instructor

Ivan Zhdankin

Quantitative researcher with experience in diverse areas of quantitative finance, including risk modelling, xVA, and electronic trading across asset classes. Ivan has consulted at many different banks in London, including JP Morgan, Citigroup, Jefferies, Nomura, HSBC, and BNP Paribas.

Ivan has generated convincing results in electronic trading alpha with neural nets. Ivan has developed a trading platform for the cryptocurrency for electronic market making.

Ivan is an author of several machine learning articles and appears regularly in QuantNews. Ivan regularly delivers guest lectures on artificial intelligence and machine learning at Imperial College and at Thalesians’ seminars.

Ivan has graduated from new Economic School with a Masters degree in economics. He has a solid mathematical background from Moscow State University, where he studied under the celebrated Albert Shiryaev, one of the developers of modern probability theory.

Ivan is an accomplished sportsman.

Instructor

Prof. Matthew Dixon

Assistant Progessor in the Applied Math Department at the Illinois Institute of Technology. His research in computational methods for finance is funded by Intel.

Matthew began his career in structured credit trading at Lehman Brothers in London before pursuing academics and consulting for financial institutions in quantitative trading and risk modelling.

He holds a Ph.D. in Applied Mathematics from Imperial College (2007) and has held postdoctoral and visiting professor appointments at Stanford University and UC Davis respectively.

He has published over 20 peer reviewed publications on machine learning and financial modelling, has been cited in Bloomberg Markets and the Financial Times as an AI in fintech expert.

Platinum Sponsor

We’re living in the most interesting time of humanity: We learned to communicate world-wide using the Internet. Now we’re approaching the next evolutionary step: Digitizing trade. The first two fully digital currencies currently have a market capitalization of over $100 billion and $20 billion, respectively. In terms of the global economy, this is small – in the region of 0.1%. So this is only the beginning. CBA Finance AG set itself the goal to develop software, systems, and tools to participate in this budding new market, in order to help building the new, all-digital financial world.

Venue

The University of Oxford

Our trainings take place at one of the constituent colleges of the University of Oxford, the oldest university in the English-speaking world and the world’s second-oldest university in continuous operation. Teaching at Oxford goes back as far as 1096.

There are 38 constituent colleges at Oxford and a full range of academic departments organized into four divisions. Christ Church, or Ædes Christi in Latin, is a constituent college of the University of Oxford. It is colloquially known as The House. The college, especially its dining hall, have been featured in the Harry Potter movies.

Sixty-nine Novel Prize winners, four Fields Medalists, and six Turing Award winners have studied, worked, or held visiting fellowships at the University of Oxford.

For all participants, accommodation on Oxford’s campus was provided. They joined a distinguished company of scholars who lived in these very rooms: Lewis Carroll, Albert Einstein, William Ewart Gladstone, Robert Hooke, John Locke, Sir Robert Peel, and others.

Presenters

Delegates

Countries