The Second Thalesian

Intensive School in ML/AI

Christ Church, Oxford





The School

Dates: 27-31 March, 2019

Venue: Christ Church, Oxford

The delegates joined our unique School in AI and ML. Following an introduction to data science and a refresher on Python programming on Level39, Canary Wharf, they went up to Oxford to study the theory and practise machine learning on real financial examples.

The delegates networked with other data scientists, fintech industry leaders, and Oxford academics.

They reviewed, learned, and mastered:

  • probability and statistics
  • linear regression methods
  • dimensionality reduction
  • unsupervised machine learning
  • bias-variance tradeoff
  • model and feature selection
  • classification
  • neural nets
  • deep learning
  • recurrent neural networks, including LSTM
  • reinforcement learning
  • current frontiers in AI and ML


Day 1

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Introduction to data science

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Probability theory

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Linear algebra

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Optimization theory

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Tour of Christ Church

19:30 – 21:00 Dinner at the Dining Hall

Day 2

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Statistical inference and estimation theory

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Linear regression

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Debugging linear regression

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Principal Components Analysis (PCA) and dimensionality reduction

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Tour of Oxford City Centre

19:30 – 21:00 Dinner at the Dining Hall

Day 3

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: From statistics to supervised Machine Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Model and feature selection

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Classification methods

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Unsupervised Machine Learning

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Break

19:30 – 21:00 Dinner at the Dining Hall

Day 4

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Deep Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Recurrent Neural Networks

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Applications of Neural Networks in finance

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Prediction from financial time series

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Lab

18:00 – 19:30 Visit to Oxford’s historic pub The Eagle & Child, home of The Inklings

19:30 – 21:00 Dinner at the Dining Hall

Day 5

08:30 – 09:00 Registration and welcome

09:00 – 10:00 Lecture 1: Reinforcement Learning

10:00 – 10:30 Tutorial 1

10:30 – 11:00 Coffee break

11:00 – 12:00 Lecture 2: Inverse Reinforcement Learning

12:00 – 12:30 Tutorial 2

12:30 – 13:30 Lunch

13:30 – 14:30 Lecture 3: Deep Reinforcement Learning

14:30 – 15:00 Tutorial 3

15:00 – 15:30 Coffee break

15:30 – 16:30 Lecture 4: Particle filtering

16:30 – 17:00 Tutorial 4

17:00 – 18:00 Graduation and leaving drinks at the Buttery

The Course


Course Design

The course was designed for practitioners by practitioners with a mathematical and computational background. It was meant to build a solid theoretical foundation, which is vital for understanding data science and machine learning.

The emphasis, however, was not on theory but on getting results in practice. As George Pólya put it, mathematics is not a spectator sport!

For this very reason, we used active learning. Practical exercises were provided in unassessed tutorials, and Jupyter-based laboratory sessions.



There were no formal prerequisites for the course, although some familiarity with linear algebra, probability, and optimization theory was a plus. We refreshed the delegates’ memory whenever they needed a refresher.

Those who wished to read up on AI/ML before starting the course were recommented Intelligent Data Analysis by Berthold and Hand, The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and Friedman, and Deep Learning by Goodfellow, Bengio, and Courville. Some of these books are freely available online.

The course began with a review of Python programming, visualization, and libraries.

We reviewed probability and statistics needed for machine learning, examined linear regression methods as a basic example of a supervised machine learning technique, considered dimensionality reduction, unsupervised machine learning, bias-variance tradeoff, model and feature selection, classification, until focussing on neural nets and deep learning (the emphasis of this course).

Deep learning architectures such as deep neural networks, deep belief networks, and recurrent neural networks had been applied to many fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, and board game programmes, where they had produced results comparable to and, in some cases, superior to human experts.


Introduction to data science

  • Data, information, knowledge, understanding, wisdom
  • Analysis and synthesis
  • Data analysis and data science
  • The process of data science
  • Artificial Intelligence and Machine Learning
  • The language of Machine Learning
  • Machine Learning and statistics


Probability theory

  • Random experiment and the sample space
  • The classical interpretation of probability
  • The frequentist interpretation of probability
  • Bayesian interpretation of probability
  • The axiomatic interpretation of probability
  • Kolmogorov’s axiomatization
  • Conditional probability
  • The Law of Total probability
  • Bayes’s theorem
  • Random variables
  • Expectations
  • Variances
  • Covariances and correlations


Linear algebra

  • Vectors and matrices
  • Matrix multiplication
  • Inverse matrices
  • Independence, basis, and dimension
  • The four fundamental spaces
  • Orthogonal vectors
  • Eigenvalues and eigenvectors


Optimization theory

  • The oprimization problem
  • Optimization in one dimension
  • Optimization in multiple dimensions
  • Grid search
  • Gradient-based optimization
  • Vector calculus
  • Quasi-Newton methods
  • Gradient descent (stochastic, batch)
  • Evolutionary optimization
  • Optimization in practice


Statistical inference and estimation theory

  • Point estimation
  • Maximum likelihood estimation
  • Loss functions
  • Bias-variance tradeoff (dilemma)
  • Standard error
  • Fisher information
  • Cramér-Rao lower bound (CRLB)
  • Consistency
  • Hypothesis testing
  • p-values
  • Bayesian estimation


Linear regression

  • Linear regression model in matrix form
  • Disturbance versus residual
  • The Ordinary Least Squares (OLS) approach
  • Relationship with maximum likelihood estimation
  • The geometry of linear regression
  • The orthogonal projection with/without the intercept


Debugging linear regression

  • Variance partitioning
  • Coefficient of determination (R2)
  • An estimation theory view of linear regression
  • How many data points do we need?
  • Chi-squared distribution
  • Chochran’s theorem
  • Degrees of freedom
  • Student’s t-distribution test
  • Adjusted coefficient of determination
  • The F-statistic
  • The p-value


Principal Components Analysis (PCA) and dimensionality reduction

  • Modelling data as a random variance
  • The covariance matrix
  • Key properties of covariance matrices
  • The sample covariance matrix
  • The sample covariance matrix is unbiased
  • The correlation matrix
  • The sample correlation matrix
  • Centred data
  • The application of PCA
  • The interpretation of PCA
  • The advantages of PCA


From statistics to supervised Machine Learning

  • The supervised machine learning problem
  • Train/test error and overfitting
  • Underfitting
  • The bias-variance tradeoff revisited
  • Multicollinearity
  • Polynomial regression


Model and feature selection

  • Model selection and averaging, drop-out
  • Cross-validation: leave-N-out, K-fold
  • Cross-validation for time series
  • Sliding window for time series
  • Bootstrap
  • Ridge regression, L2 regularization
  • Derivation and statistical properties of the ridge estimator
  • Examples of ridge regression
  • LASSO regression, L1 regularization
  • Examples of LASSO regression
  • Applications to market impact models


Classification methods

  • The classification problem
  • Generalized Linear Models (GLM)
  • Logistic regression as an example of a GLM
  • Odds
  • Evaluation of a classification model: confusion matrix
  • Evaluation of a classification model: ROC chart
  • Decision tree models
  • Random forests


Unsupervised Machine Learning

  • K-means clustering
  • Enhancements to K-means, K++
  • Automatic specification of number of clusters
  • Partitioning Around Medoids (PAM)
  • Hierarchical clustering
  • Agglomerative hierarchical clustering
  • Divisive hierarchical clustering
  • Linkage methods


Deep Learning

  • The perceptron
  • Feed-forward Neural Networks
  • Other networks
  • Convergence results
  • Stochastic gradient descent
  • Variants of the stochastic gradient descent
  • Weight decay scheduling
  • Weight initialization
  • A geometric approach to model interpretability
  • A statistical approach to model interpretability


Recurrent Neural Networks

  • Autoregressive Neural Networks
  • Gated Recurrent Networks
  • Long-Short Term Memory Networks
  • Semi-parametric Neural Networks for panel data
  • Hybrid GARCH-ML models


Applications of Neural Networks in finance

  • Machine Learning in algorithmic finance
  • Momentum strategies
  • Predicting portfolio returns
  • Data preparation
  • Price impact models
  • Limit Order Book updates
  • Adverse selection
  • Predictive performance comparisons


Prediction from financial time series

  • Time series data
  • Classical time series analysis
  • Autoregressive and moving average processes
  • Stationarity
  • Parameteric tests
  • In-sample diagnostics
  • Time series cross-validation
  • Predicting events
  • Entropy
  • Confusion matrix
  • ROC chart
  • Performance terminology
  • Practical prediction issues
  • Confusion matrices with oversampling
  • ROC curves with oversampling
  • Kernel regression
  • Kernel estimators


Reinforcement Learning

  • Inverse Reinforcement Learning
  • Reinforcement Learning and Inverse Reinforcement Learning: differences and similarities
  • Inverse Reinforcement Learning and Imitation Learning
  • Constraints-based IRL
  • Maximum entropy IRL
  • IRL for linear-quadratic-Gaussian regulation
  • Examples of IRL problems in finance


Deep Reinforcement Learning

  • Combination of Reinforcement Learning techniques with Supervised Learning approaches of Deep Neural Networks
  • Combination of Reinforcement Learning techniques with unsupervised learning approaches of Deep Neural Networks
  • Deep Reinforcement Learning techniques for partially observable Markov decision processes
  • State of the art in Deep Reinforcement Learning


Particle filtering

  • State-space models
  • Particle filtering methods
  • Applying the particle filter to stochastic volatility model with leverage and jumps
  • The Kalman filter
  • Some examples of linear-Gaussian state-space models: the Newtonian system, the autoregressive moving average models, continuous-time stochastic processes (the Wiener process, geometric Brownian motion (GBM), the Ornstein-Uhlenbeck process)
  • The extended Kalman filter
  • An example application of the extended Kalman filter: modelling credit spread
  • Outlier detection in (extended) Kalman filtering
  • Gaussian assumed density filtering
  • Parameter estimation
  • Relationship with Markov chain Monte Carlo methods
  • Prediction
  • Diagnostics


The course examines several real-life datasets, including:

  • S&P 500 stock data
  • High-frequency commodity future price data
  • Cryptocurrency order book data
  • FINRA TRACE corporate bond data
  • Car insurance claim data
  • The National Institute of Diabetes and Digestive and Kidney Diseases data about the Pima group diabetes tendency


The course was designed to be self-contained. However, those wishing to read up on Machine Learning / Artificial Intelligence before starting the course were recommended:

  • Michael R. Berthold (ed.), David Hand (ed.). Intelligent Data Analysis: An Introduction, second edition. Springer, 2006.
  • Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition. Springer, 2009.
  • Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press, 2017.
Those wishing to read up on the mathematical foundations of their course (covered on the first day) were recommended:
  • Murray R. Spiegel, John Schiller, R. Alu Srinivasan. Schaum’s Outlines: Probability and Statistics, second edition. McGraw-Hill, 2000.
  • John B. Fraleigh, Raymond A. Beauregard. Linear Algebra, third edition. Addison Wesley, 1995.
  • Gerard Cornuejols, Reha Tütüncü. Optimization Methods in Finance. Cambridge University Press, 2007.
  • Philip E. Gill, Walter Murray, Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.


We also recommended the following video lectures:



Paul Bilokon

CEO and Founder of Thalesians Ltd. Previously served as Director and Head of global credit and core e-trading quants at Deutsche Bank, the teams that he helped set up with Jason Batt and Martin Zinkin. Having also worked at Morgan Stanley, Lehman Brothers, and Nomura, Paul pioneered electronic trading in credit with Rob Smith and William Osborn at Citigroup.

Paul has graduated from Christ Church, University of Oxford, with a distinction and Best Overall Performance prize. He has also graduated twice from Imperial College London.

Paul’s lectures at Imperial College London in machine learning for MSc students in mathematics and finance and his courses consistently achieve top rankings among the students.

Paul has made contributions to mathematical logic, domain theory, and stochastic filtering theory, and, with Abbas Edalat, has published a prestigious LICS paper. Paul’s books are being published by Wiley and Springer.

Dr Bilokon is a Member of the British Computer Society, Institution of Engineering and Technology, and European Complex Systems Society.

Paul is a frequent speaker at premier conferences such as Global Derivatives/QuantMinds, WBS QuanTech, AI, and Quantitative Finance conferences, alphascope, LICS, and Domains.


Ivan Zhdankin

Quantitative researcher with experience in diverse areas of quantitative finance, including risk modelling, xVA, and electronic trading across asset classes. Ivan has consulted at many different banks in London, including JP Morgan, Citigroup, Jefferies, Nomura, HSBC, and BNP Paribas.

Ivan has generated convincing results in electronic trading alpha with neural nets. Ivan has developed a trading platform for the cryptocurrency for electronic market making.

Ivan is an author of several machine learning articles and appears regularly in QuantNews. Ivan regularly delivers guest lectures on artificial intelligence and machine learning at Imperial College and at Thalesians’ seminars.

Ivan has graduated from new Economic School with a Masters degree in economics. He has a solid mathematical background from Moscow State University, where he studied under the celebrated Albert Shiryaev, one of the developers of modern probability theory.

Ivan is an accomplished sportsman.


Prof. Matthew Dixon

Assistant Progessor in the Applied Math Department at the Illinois Institute of Technology. His research in computational methods for finance is funded by Intel.

Matthew began his career in structured credit trading at Lehman Brothers in London before pursuing academics and consulting for financial institutions in quantitative trading and risk modelling.

He holds a Ph.D. in Applied Mathematics from Imperial College (2007) and has held postdoctoral and visiting professor appointments at Stanford University and UC Davis respectively.

He has published over 20 peer reviewed publications on machine learning and financial modelling, has been cited in Bloomberg Markets and the Financial Times as an AI in fintech expert.

Platinum Sponsor

We’re living in the most interesting time of humanity: We learned to communicate world-wide using the Internet. Now we’re approaching the next evolutionary step: Digitizing trade. The first two fully digital currencies currently have a market capitalization of over $100 billion and $20 billion, respectively. In terms of the global economy, this is small – in the region of 0.1%. So this is only the beginning. CBA Finance AG set itself the goal to develop software, systems, and tools to participate in this budding new market, in order to help building the new, all-digital financial world.



The University of Oxford

Our trainings take place at one of the constituent colleges of the University of Oxford, the oldest university in the English-speaking world and the world’s second-oldest university in continuous operation. Teaching at Oxford goes back as far as 1096.

There are 38 constituent colleges at Oxford and a full range of academic departments organized into four divisions. Christ Church, or Ædes Christi in Latin, is a constituent college of the University of Oxford. It is colloquially known as The House. The college, especially its dining hall, have been featured in the Harry Potter movies.

Sixty-nine Novel Prize winners, four Fields Medalists, and six Turing Award winners have studied, worked, or held visiting fellowships at the University of Oxford.

For all participants, accommodation on Oxford’s campus was provided. They joined a distinguished company of scholars who lived in these very rooms: Lewis Carroll, Albert Einstein, William Ewart Gladstone, Robert Hooke, John Locke, Sir Robert Peel, and others.