Back to all posts

Probabilistic Programming in Quantitative Finance

Probabilistic Programming (aka Bayesian statistics) allows flexible construction of statistical models to gain insight from data. Estimation of best fitting parameter values -- as well as uncertainty in these estimations -- can be automated by sampling algorithms such as Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has already lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science. In this blog post I will highlight what Probabilistic Programming can offer for algorithmic traders.

I recently gave a presentation about Bayesian Data Analysis in PyMC3 at PyData NYC’13 with a special focus on financial applications which should give a good primer -- check it out here:

Bayesian Data Analysis with PyMC3 - Thomas Wiecki from PyData on Vimeo.

Inferring latent processes in Quantitative Finance

Bayesian statistics has many benefits, some of which I discuss in my talk. Most relevant to Quantitative Finance, however, is the fact that you can very flexibly model latent, unobservable processes and how they relate to observable events. As an example, we could model fear of investors. We can't measure it directly but it certainly has a bearing on market behavior and the stock price.

Bayes' formula allows us to then infer backwards: given the observable data (e.g. the stock price), what is the probability of the latent construct (e.g. investor fear)?

Stochastic volatility model

One elegant example of how Probabilistic Programming can infer unobservable quantities of the stock market is the stochastic volatility model. Volatility is an important concept of quantitative finance as it relates to risk. Unfortunately, as Tony Cooper reminds us:

"Volatility is a strange thing - it exists but you can't measure it."

-- Tony Cooper on Quantopian forums.

If we can't measure something, the next best thing we can do is to try and model it. One way to do this is in a probabilistic framework is the concept of stochastic volatility: If we assume that returns are normally distributed, the volatility would be captured as the standard deviation of the normal distribution. Thus, the standard deviation gives rise to stochastic volatility. Intuitively we would assume that the standard deviation is high during times of market turmoil like the 2008 crash.

So the trivial thing to do would be to look at the rolling standard deviation of returns. But this is unsatisfying for multiple reasons:

  • it often lags behind volatility,
  • is a rather unstable measure,
  • strongly dependent on the window size with no principled way of choosing it.

These properties are outlined in the plot below. The larger the window size, the more lag, the smaller the window size, the more unstable the estimate gets.


As we will see, the stochastic volatility model does a better job at all of them. But before we look at that we need to establish one more insight into the nature of volatility: it tends to cluster. This is nicely demonstrated by looking at the returns of the S&P 500 above. As you can see, during the 2008 financial crisis there is a lot volatility in the stock market (huge positive and negative daily returns) that gradually decreases over time.

So how do we model this clustering property? The Stochastic Volatility model assumes that that the standard-deviation of the returns follow a random-walk process. You can read the Wikipedia article I linked but essentially this process allows for slow, gradual changes over time.

What is interesting is that we can model the standard deviation itself to follow a random walk. Intuitively, we allow standard deviation to change over time but only ever so slightly at each time-point. For the mathematical and implementational details of the model applied see this IPython notebook which uses PyMC3 (a new, flexible probabilistic programming framework for Python).

The plot below shows the latent volatility (in orange) inferred from the model based on the market data (in mint). The orange  lines represent the standard deviation of the Normal distribution we assume for the daily returns.


As you can see, the estimated volatility is much more robust. Moreover, all parameters are estimated from the data while we would have to find a reasonable window length for the rolling standard deviation ourselves. Finally, we do not just get a single estimate as with the rolling standard deviation but rather many solutions that are likely. This provides us with a measure of the uncertainty in our estimates and is represented by the width of the orange line above.

In summary, Bayesian statistics and Probabilistic Programming is an extremely powerful framework for Quantitative Finance as it provides:

  • a principled framework for modeling latent, unobservable causes, as demonstrated by the stochastic volatility model;
  • a measure of uncertainty in our estimates;
  • powerful sampling methods that allow automatic estimation of highly complex models.

We are currently working on making PyMC3 available on Quantopian to allow usage of this type of models -- so sign up and start getting familiar with our platform, discuss on the forums, and follow @Quantopian and me on Twitter. If you are interested in Probabilistic Programming I also sometimes blog about it here.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian.

In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

ilya gikhman

In math statistics there are some standard formulas that represent estimates of volatility of a random variable. It would be interesting to look the formulas one uses to estimate volatility of the stochastic process. The statistics used to estimate volatility for a stock can be good if volatility over the period close to constant. Otherwise model risk would be underestimated and suggests visible losses. Unfortunately financial experts do not pay attention to the fact. The stochastic volatility is more complex for calculation object. The stdv of the stock and stock itself are hold Wiener processes. If these Wiener processes are the same it is not clear how to estimate return and volatility of the stock diffusion term. If Wiener processes are independent or correlated it might possible but it does not look reliable its practical implementation. At least it is interesting to look at. I will grateful for directions to the stochastic volatility formula.

Comments are closed.