Authors: Justin Lent (Quantopian), Thomas Wiecki (Quantopian), Scott Clark (SigOpt)

Parameter optimization of trading algorithms is quite different from most other optimization problems. Specifically, the optimization problem is non-convex, non-linear, stochastic, and can include a mix of integers, floats and enums as parameters. Moreover, most optimizers assume that the objective function is quick to evaluate which is definitely not the case for a trading algorithm run over multiple years of financial data. Immediately that disqualifies 95% of optimizers including those offered by scipy or cvxopt. At Quantopian we have long been, and continue to be, interested in robust methods for parameter optimization of trading algorithms.

Bayesian optimization is a rather novel approach to the selection of parameters that is very well suited to optimize trading algorithms. This blog post will first provide a short introduction to Bayesian optimization with a focus on why it is well suited for quantitative finance. We will then show how you can use SigOpt to perform Bayesian optimization on your own trading algorithms running with zipline.

This blog post originally resulted from a collaboration with Alex Wiltschko where we used Whetlab for Bayesian optimization. Whetlab, however, has since been acquired by Twitter and the Whetlab service was discontinued. Scott Clark from SigOpt helped in porting the code to their service which is comparable in functionality and API. Scott also co-wrote the blog post.

Introduction to Bayesian optimization

Bayesian Optimization is a powerful tool that is particularly useful when optimizing anything that is both time consuming and expensive to evaluate (like trading algorithms). At the core, Bayesian Optimization attempts to leverage historical observations to make optimal suggestions on the best variation of parameters to sample (maximizing some objective like expected returns). This field has been actively studied in academia for decades, from the seminal paper "Efficient Global Optimization of Expensive Black-Box Functions" by Jones et al. in 1998 to more recent contributions like "Practical Bayesian Optimization of Machine Learning Algorithms" by Snoek et al. in 2012.

Many of these approaches take a similar route to solving the problem: they map the observed evaluations onto a Gaussian Process (GP), fit the best possible GP model, perform optimization on this model, then return a new set of suggestions for the user to evaluate. At the core, these methods balance the tradeoff between exploration (learning more about the model, the length scales over which they vary, and how they combine to influence the overall objective) and exploitation (using the knowledge already gained to return the best possible expected result). By efficiently and automatically making these tradeoffs, Bayesian Optimization techniques can quickly find the global optima of these difficult to optimize problems, often much faster than traditional methods like brute force or localized search.

At every Bayesian Optimization iteration, the point of highest Expected Improvement is returned; this is the set of parameters that, in expectation, will most improve upon the best objective value observed thus far. SigOpt wraps these powerful optimization techniques behind a simple API so that any expert in any field can optimize their models without resorting to expensive trial and error. More information about how SigOpt works, as well as other examples of using Bayesian Optimization to perform parameter optimization, can be found on our research page.

A Generalized Framework for Evaluating Trading Signals

Historically quant traders have used many price based signals to define an investment strategy. Many of these signals have been implemented into the popular TA-lib library, with an available Python library here. Typically, a price based signal takes in historical prices as input to compute the signal's value. For example, RSI (Relative Strength Index), is commonly used for mean-reversion and momentum trading. To compute RSI one must first choose the number of pricing days over which to compute the signal. Then, the trader chooses a range of valid values to trigger trade entry. The valid values for RSI is between 0 to 100. If a stock has undergone a sharp selloff recently, RSI values will trend towards 0, and after a strong rally, trend towards 100. A common trading strategy is going long a stock when RSI is below 20, betting on mean-reversion. Similarly, to go short when RSI is above 80. However, other groups of traders have found success betting on persistent momentum in the stock (rather than mean-reversion) when RSI reaches an extreme reading. In which case, the trader will go long when RSI is above 80, for example, betting that the stock will continue going higher.

This poses several questions:

  • So which is right? Should we use RSI as a mean-reversion indicator or a momentum indicator?
    • What range of RSI values shall we choose to determine if the momentum or mean-reversion condition is met?
    • How many lookback days of prices should we use to compute the stock's RSI?
  • Besides RSI, can I integrate a second indicator into my investment strategy?

Each decision made regarding specifying our trading signal (e.g. RSI) or how to interpret the signal's value for making a trading decision is effectively a free parameter in our system, and depending upon the range of reasonable values each free parameter can take, it can quickly explode the total combinations of possible parameter settings. Each signal added to the strategy can quickly increase the total combinations into the many millions and even billions; we'll see this firsthand in the example below that incorporates only 2 signals.

A discussion of strategy model overfitting, and evaluating how overfit a trading backtest may be, will not be addressed here, and will be the topic of a future blog post. An excellent introduction for how to address trading strategy overfitting in your own algorithms can be found here and here.

The large parameter space of millions, or billions, of combinations over which our strategy will need to be tested in order to determine the subspace where the global maximum is likely located is why Bayesian Optimization can be so effective at quickly evaluating potentially profitable trading strategies. Brute force grid-search over a billion combination parameter space is often intractable, even if each combination only takes 30-seconds to complete. Bayesian optimization decreases the evaluation of the model over the global search space by an order of magnitude, as described in the previous section.

The trading algorithm we implement below will create a simple structure for the passing in of free parameters into any simple price based trading signals (simplified to work more easily with ta-lib functions). Then each signal is evaluated each trading day, and when all the conditions are true, a trade is entered, and held until the next signal evaluation period (where evaluation period is yet another free parameter).

For the purposes of this optimization, our objective function will be the Sharpe Ratio of the strategy. A broadly accepted metric from industry for evaluating trading strategy performance. However, the framework implemented below allows for ease of swapping in any objective function desired by the analyst.

Example Zipline algorithm

To illustrate how you can use Bayesian optimization on your zipline trading algorithms and how it compares to other naive approaches (i.e. grid search) we will use a rather simple algorithm comprised of a trading trigger based on two commonly used signals from the ta-lib library, RSI (Relative Strength Index) and ROC (Rate of Change).

The trading algorithm will implement what might be considered a sector rotation strategy, and search for trades across these Select Sector SPDR ETF's:


By running the trading logic across all of these ETF's we will be implementing a simple sector-rotation strategy.

Strategy Specification

Buy the ETF only if it meets both the RSI signal and ROC signal criteria. When an ETF is bought long, then an equivalent dollar amount of SPY will be shorted, creating a hedged, dollar-neutral portfolio.


By hedging all of our trades, it serves to "tease-apart" the actual usefulness provided by these signals (RSI, ROC) since it extracts upward movement in the stock simply occurring because the rest of the stock market is going up. As a result, the profit achieved by this hedged strategy can be perceived as more "pure alpha," rather than highly-correlated to the direction of the broad stock market.

The 7 free parameters of our trading strategy are as follows:

  • RSI
    • (1) Lookback window for # of prices used in the RSI calculation
    • (2) Lower_bound value defining the trade entry condition
    • (3) Range_width, which will be added to the Lower-bound
      • Lower_bound + Range_width is the range of values over which our RSI signal will be considered True
  • ROC
    • (4) Lookback window for # of prices used in the ROC calculation
    • (5) Lower_bound value defining the trade entry condition
    • (6) Range_width, which will be added to the Lower-bound
      • Lower_bound + Range_width is the range of values over which our ROC signal will be considered True
  • Signal evaluation frequency
    • (7) Number of days between evaluation of if our signals
      • Do we evaluate them every day, every week, every month, etc.

It's worth noting that even with just 2 price based signals, we have 7 free parameters in this system!

Reasonable Ranges for each of the 7 free parameters above (assuming each is an integer, with integer steps):

  1. 115 values: 5 to 120
  2. 90 values: 0 to 90
  3. 20 values: 10 to 30
  4. 61 values: 2 to 63
  5. 30 values: 0 to 30
  6. 195 values: 5 to 200
  7. 18 values: 3 to 21

Multiplying the valid ranges of each yields a total combination count of:

  • 1,329,623,100,000 theoretical combinations
    • 115 x 90 x 20 x 61 x 30 x 195 x 18
    • Imagine how many combinations are possible if 3, 4, 5... 10 signals are added to a strategy

Obviously grid-searching through all those combinations is unreasonable, though a skilled practitioner can prune the search space significantly by only grid-searching across each parameter using wider steps based on their intuition of the model they are building. But even if the skilled practitioner can reduce the grid-search to 10,000 combinations even that number of combinations may be unwieldy if the objective function (e.g. trading strategy) takes 1-minute to evaluate, which is quite frequently the case with trading strategies. This is where the benefit of having access to a Bayesian optimizer becomes extremely helpful.

Below is the result of an analysis accomplish in an ipython notebook comparing SigOpt's Bayesian Optimizer's results from 3 independent experiments, of only 300 trials each determined intelligently by their optimizer, to a "smart" grid-search of approximately 3500 combinations chosen intuitively from a reasonable interpretations of sensible RSI and ROC values. Only 3500 trials were chosen for the grid-search approach, because even those few combinations took 48 hours to evaluate. 300 trials were chosen for the SigOpt approach because in practice the SigOpt optimizer is able to find good optima within 10 to 20 times the dimensionality of the parameter space being optimized. This linear number of evaluations with respect to the number of parameters allows for optimization of algorithms that would otherwise be intractable using standard methods like grid search, which grow exponentially with the number of parameters.


The SigOpt Bayesian Optimizer discovered a better global maximum when testing only over only 300 combinations, than what was discovered by the ~3500 combination grid search.

(This was seen in 4 out of 5 runs with SigOpt, and the 1 that returned a worse objective value was only a minor shortfall)

In practice SigOpt is able to find a good optima in a linear number of evaluations with respect to the number of parameters being tuned (usually between 10 and 20 times the dimensionality of the parameter space). Grid search, even an expertly tuned grid search, grows exponentially with the number of parameters. As the model being tuned becomes more complex it can quickly become completely intractable to tune a model using these traditional methods.

An example of how quickly SigOpt discovered a parameter combination near our expected global maximum, is shown below, as well as a comparison versus the extremely course (and slow) grid-search:

Deeper Dive

Next we will inspect aspects of the optimization further.  (If you wish to view the entirety of this blog, within the context of the ipython notebook that it was created, you can view it here on Quantopian's public research repo.)

A look at the distribution of the objective values returns from each iteration from each optimizer.  We notice SigOpt's bayesian approach tests more points in the parameter space that are nearer to the expected global maximum, as the mean of all trials is much closer to the maximum value discovered. As well, the mean is firmly above zero, so its suggested parameter combinations are targeting a space in the region of the desired outcome.
Next, let's inspect the distribution of values, of each parameter, attempted by each method to get a sense of how the bayesian approach is able to hone in on the specific region more precisely.  The course grid search simply sets a min/max range for each parameter along with a discrete step to traverse the grid, which is fairly common practice for industry practitioners when running parameter optimizations.  A more complex grid search could be implemented via random sampling, or perhaps implementing a particle swarm optimization, but that complexity is commonly out of reach for non-programmers.

Grid-Search Parameter Combination Distributions.

SigOpt Parameter Combination Distributions.

Here we can see the 1:1 relationship between each of the SigOpt suggested parameter combinations, and which regions of each parameter intersection were determined to be best at resulting in optimal objective function values.  Along the diagonal is simply the distribution of each individual parameter (as a kde fit line plot, similar to the histograms shown directly above).

Now let's apply the optimal parameters chosen by each method to our out-of-sample heldout data.  (We trained our model using market data from 2004-2009, and the heldout data is from 2010-2015).

Grid-Search Results

In-sample, Grid-Search Optimal Strategy
Out-of-Sample, Grid-Search Optimal Parameters.

SigOpt Results

In-Sample, SigOpt Optimal Strategy.
Out-of-Sample, SigOpt Optimal Parameters.

Takeaway:   Recognizing how poorly the strategy performs out-of-sample shows how important it is to perform additional analysis (cross-validation, out-of-sample testing, etc.) after using parameter optimization to discover a global maximum.


On a positive note, however, by increasing the speed of running the optimization from using the bayesian approach versus grid-search, we we able to assess our out-of-sample performance much more rapidly --because grid search to over 2-days to finish!  The bayesian optimization via SigOpt allowed us to continue our research process 10x faster - in a matter of hours, rather than days.

For completeness, and to put the entire analysis together across backtest and out-of-sample, below is a pyfolio tearsheet allowing visual inspection of the strategy as it transitions from in-sample to out-of-sample.




If you wish to work on this analysis, or view the code used to accomplish the above, feel free to clone our research repo on GitHub.





Share on FacebookShare on LinkedInShare on Twitter

Algorithmic trading used to be a very difficult and expensive process. The time and cost of system setup, maintenance, and commission fees made programmatic trading almost impossible for the average investor. That’s all changing now.

We’re excited to announce that Quantopian has integrated with Robinhood, a zero commission brokerage. This partnership has made the process of algorithmic trading, from start-to-finish, completely free.

From initial brainstorming with research, to testing and optimizing with backtesting, and finally, commission-free execution with Robinhood, algorithmic trading has never been easier.

Here’s What Users Get

  • Data - Data is the lifeblood of algorithmic trading. But most data is costly and dirty. Quantopian solves that for you with clean, integrated data sources. Some data sources are entirely free (traded price and volume, corporate fundamentals), and some data sources are freemium (news sentiment, earnings estimates, and more).
  • Platform - Quantopian provides you with a platform to do your free-form research, to write and test your algorithm, to paper trade, and even trade real money. You don't have to set up, maintain, or monitor - we do it all for you.
  • Execution - Robinhood provides order execution, holding your funds and filling your orders.
  • Capital - You can trade your own money, or you can seek an allocation and trade with our money. One way to get allocation is to win our contest, trade $100,000, and keep 100% of the profits. Other algorithms get larger discretionary allocations through our fund.

How To Get Started

If you have an existing Robinhood account, you can begin trading today. If you’d like to open an account, you can sign up directly at Robinhood - the process takes less than five minutes to complete. For more information and video tutorials, our community post has you covered.

P.S. Attached is a sample algorithm that's geared and ready for live trading. It's based off Mebane Faber’s Tactical Asset Allocation. The allocation Faber proposes is designed to be "a simple quantitative method that improves the risk-adjusted returns across various asset classes." You can read the original academic paper from Meb Faber, or the previous discussion of the strategy on Quantopian.

Screen Shot 2015-11-19 at 10.31.16 AM


Share on FacebookShare on LinkedInShare on Twitter

MCMC Sampling for Dummies

November 12th, 2015   Posted by: Thomas Wiecki

When I give talks about probabilistic programming and Bayesian statistics, I usually gloss over the details of how inference is actually performed, treating it as a black box essentially. The beauty of probabilistic programming is that you actually don't have to understand how the inference works in order to build models, but it certainly helps.

When I presented a new Bayesian model to Quantopian's CEO, Fawce, who wasn't trained in Bayesian stats but is eager to understand it, he started to ask about the part I usually gloss over: "Thomas, how does the inference actually work? How do we get these magical samples from the posterior?".

Now I could have said: "Well that's easy, MCMC generates samples from the posterior distribution by constructing a reversible Markov-chain that has as its equilibrium distribution the target posterior distribution. Questions?".

That statement is correct, but is it useful? My pet peeve with how math and stats are taught is that no one ever tells you about the intuition behind the concepts (which is usually quite simple) but only hands you some scary math. This is certainly the way I was taught and I had to spend countless hours banging my head against the wall until that euraka moment came about. Usually things weren't as scary or seemingly complex once I deciphered what it meant.


Metropolis Hastings -- for more information, read on below.


This blog post is an attempt at trying to explain the intuition behind MCMC sampling (specifically, the Metropolis algorithm). Critically, we'll be using code examples rather than formulas or math-speak. Eventually you'll need that but I personally think it's better to start with the an example and build the intuition before you move on to the math.

Table of Contents

Read the rest of this page »

Share on FacebookShare on LinkedInShare on Twitter

Meet Andreas, Winner of Contest 9 (October Prize)

October 12th, 2015   Posted by: Alisa Deychman

A new month brings a new contest winner. Meet Andreas, the winner of Contest 9 (also known as the October Prize)!

Andreas originally hails from Sweden, then moved to the United Kingdom for his university studies. In the UK, he studied mathematics and then stayed to pursue a career in the finance industry, before embarking on a graduate degree in mathematical physics. He is currently a PhD student in Spain continuing his journey in mathematics. Andreas stumbled across Quantopian while traversing the web, and was immediately hooked. With no previous background in Python, he started learning how to create trading algorithms. He shares, "I started coding up some basic algorithms and was impressed by how easy it was to get going. There was also a great community forum and tutorials that had answers to most questions." His Python skills improved and Andreas began coding a variety of algorithms and trying different strategies.

Andreas was focused on the data. "For me, quant research is all about the data. Analysing and understanding the data always comes first (and backtesting last!). Quantopian has a number of interesting data feeds (that I hope it will continue growing!). My algo uses some of these data feeds to select baskets of stocks to trade". Quantopian provides 13 years of pricing data and fundamental data, along with 22 (and growing) datasets in the store.

Andreas continues to improve his current ideas and test new strategies using the research environment and backtester. "I know how much work it goes into creating a proper backtesting and research environment, and that Quantopian makes one available to you for free is quite amazing!" He is currently in the first phase of his prize, undergoing a quant consultation session. Afterward, he will enter the second phase and begin trading a $100,000 brokerage account for 6 months, and keep all the profits. We'll write him a check monthly for his earnings!

We've already paid out over $2300 in contest earnings. Are you the next contest winner? If so, submit your algorithm by the next deadline on Nov. 2 at 9:30AM ET to start your 6 months of paper trading.

Share on FacebookShare on LinkedInShare on Twitter

Every student in every school should have the opportunity to learn computer science.

Screen Shot 2015-09-25 at 4.10.31 is a non-profit dedicated to expanding access to computer science, and increasing participation
by women and underrepresented students of color in this field. They believe computer science should be part of the core curriculum, alongside other courses such as biology, chemistry, or algebra.

We at Quantopian believe in's vision to bring computer science to every student. To help them achieve this goal, we have decided to donate all revenue generated by our live stream ticket sales for QuantCon 2016 to them.

QuantCon 2016 will feature a stellar lineup including: Dr. Emanuel Derman, Dr. Marcos López de Prado, Dr. Ernie Chan, and more. It will be a full day of expert speakers and in-depth tutorials. Talks will focus on innovative trading strategies, unique data sets, and new programming tips and tools. The goal? To give you all the support you need to craft and trade outperforming strategies.

A live stream purchase will also include first-access to all QuantCon recordings and presentation decks. For tickets or more information, please visit

Share on FacebookShare on LinkedInShare on Twitter

Quantopian Talks at Strata

September 21st, 2015   Posted by: wbadmin

Strata, the conference where cutting-edge science and new business fundamentals intersect, will take Screen Shot 2015-09-21 at 3.01.26 PMplace September 29th to October 1st in New York City.

The conference is a deep-immersion event where data scientists, analysts, and executives explore the latest in emerging techniques and technologies.

Quantopian Talks & Tutorials

Our team will be presenting several talks and tutorials at Strata. The topics range from how global-sourcing is flattening finance, to a Blaze tutorial, to a review on pyfolio and how it can improve your portfolio and risk analytics, to an out-performing investment algorithm on women-led companies in the Fortune 1000. 

To see our entire lineup, please click here

Join Us!

If you would like to attend the conference, RSVP here and enter discount code QUANT for a a 20% discount on any pass.

We hope to see you there!


Share on FacebookShare on LinkedInShare on Twitter

Predicting future returns of trading algorithms: Bayesian cone

August 31st, 2015   Posted by: Thomas Wiecki

Authors: Sepideh Sadeghi and Thomas Wiecki

Foreword by Thomas

This blog post is the result of a very successful research project by Sepideh Sadeghi, a PhD student at Tufts who did an internship at Quantopian over the summer 2015. Follow her on twitter here.

All of the models discussed here-within are available through our newly released library for finance performance and risk analysis called pyfolio. For an overview of how to use it see the Bayesian tutorial. Pyfolio is also available on the Quantopian research platform to run on your own backtests.


When evaluating trading algorithms we generally have access to backtest results over a couple of years and a limited amount of paper or real money traded data. The biggest with evaluating a strategy based on the backtest is that it might be overfit to look good only on past data but will fail on unseen data. In this blog, we will take a stab at addressing this problem using Bayesian estimation and prediction of possible future returns we expect to see based on the backtest results. At Quantopian we are building a crowd-source hedge fund and face this problem on a daily basis.

Here, we will briefly introduce two Bayesian models that can be used for predicting future daily returns. These models take the time series of past daily returns of an algorithm as input and simulate possible future daily returns as output. We will talk about the variations of models that can be used for prediction and how they compare to each other in another blog, but here we will mostly talk about how to use the predictions of such models to extract useful information about the algorithms.

All of these models are available through our newly released library for finance performance and risk analysis called pyfolio. For an overview of how to use it see the Bayesian tutorial. Pyfolio is also available on the Quantopian research platform to run on your own backtests.

How do we get the model inputs?

At Quantopian we have built a world-class backtester that allows everyone with basic Python skills to write a trading algorithm and test it on historical data. The resulting daily returns generated by the backtest will be used to train the model predicting the future daily returns.

What can be learned from the predictive models?

Lets not forget that computational modeling always comes with some risks such as estimation uncertainty, model misspecifications and implementation limitations and errors. According to such risk factors, model predictions are not always perfect and 100% reliable. However, model predictions still can be used to extract useful information about algorithms, even if the predictions are not perfect.

For example, comparing the actual performance of a trading algorithm on unseen market data with the predictions generated by our model can inform us whether the algorithm is behaving as expected based on its backtest or whether it is overfit to only work well on past data. Such algorithms may have the best backtest results but they may not necessarily have the best performance in live trading. An example of such an algorithm can be seen in the picture below. As you can see, the live trading results of the algorithm are completely out of our prediction area, and the algorithm is performing worse than our predictions. These predictions are generated by fitting a linear line through the cumulative backtest returns. We then assume that this linear trend continuous going forward. As we have more uncertainty about events further in the future, the linear cone is widening assuming returns are normally distributed with a variance estimated from the backtest data. This is certainly not the best way to generate predictions as it has a couple of strong assumptions like normality of returns and that we can confidently estimate the variance accurately based on limited backtest data. Below we show that we can improve these cone-shaped predictions using Bayesian models to predict the future returns.

On the other hand, there are algorithms that perform equally well on data from the past and on live trading data.  An example of that can be seen in the picture below.

And finally, we can find differences between the algorithm behavior in the past and in live trading period that are due to changes in the market and not due to the characteristics of the algorithm itself. For example the picture below illustrates an algorithm, which is doing pretty well until sometime in 2008, but all of a sudden it crashes as the market crashes.


Why Bayesian models?

In the Bayesian approach we do not get a single estimate for our model parameters as we would with maximum likelihood estimation. Instead, we get a complete posterior distribution for each model parameter, which quantifies how likely different values are for that model parameter. For example, with few data points our estimation uncertainty will be high reflected by a wide posterior distribution. As we gather more data, our uncertainty about the model parameters will decrease and we will get an increasingly narrower posterior distribution. There are many more benefits to the Bayesian approach, such as the ability to incorporate prior knowledge that are outside the scope of this blog post.

Now that we have answered the problem of why predicting future returns and why using Bayesian models for this purpose, lets briefly look at two Bayesian models that can be used for prediction. These models make different assumptions about how daily returns are distributed.

Normal model

We call the first model the normal model. This model assumes that daily returns are sampled from a normal distribution whose mean and standard deviation are accordingly sampled from a normal distribution and a halfcauchy distribution. The statistical description of the normal model and its implementation in PyMC3 are illustrated below.

This is the statistical model:

mu ~ Normal(0, 0.01)
sigma ~ HalfCauch(1)
returns ~ Normal(mu, sigma)

And this is the code used to implement this model in PyMC3:

with pm.Model():
    mu = pm.Normal('mean returns', mu=0, sd=.01, testval=data.mean())
    sigma = pm.HalfCauchy('volatility', beta=1, testval=data.std())
    returns = pm.Normal('returns', mu=mu, sd=sigma, observed=data)

    # Fit the model
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(samples, step, start=start)


We call the second model the T-model. This model is very much similar to the first model except that it assumes that daily returns are sampled from a Student-T distribution. The T distribution is very much like a normal distribution but it has heavier tails, which makes it a better distribution to capture data points that are far away from the center of data distribution. It is well known that daily returns are in fact not normally distributed as they have heavy tails.

This is the statistical description of the model:

mu ~ Normal(0, 0.01)
sigma ~ HalfCauchy(1)
nu ~ Exp(0.1)
returns ~ T(nu+2, mu, sigma)

And this is the code used to implement this model in PyMC3:

with pm.Model():
    mu = pm.Normal('mean returns', mu=0, sd=.01)
    sigma = pm.HalfCauchy('volatility', beta=1)
    nu = pm.Exponential('nu_minus_two', 1. / 10.)

    returns = pm.T('returns', nu=nu + 2, mu=mu, sd=sigma, 

    # Fit model to data
    start = pm.find_MAP(fmin=sp.optimize.fmin_powell)
    step = pm.NUTS(scaling=start)
    trace = pm.sample(samples, step, start=start)


Prediction Cone: Visualization of predictions for live trading period

Here, we describe the steps of creating predictions from our Bayesian model. These predictions can be visualized with a cone-shaped area of cumulative returns that we expect to see from the model. Assume that we are working with the normal model fit to past daily returns of a trading algorithm. The result of this fitting this model in PyMC3 is are the posterior distributions for the model parameters mu (mean) and sigma (variance) –  fig a.

Now we take one sample from the mu posterior distribution and one sample from the sigma posterior distribution with which we can build a normal distribution. This gives us one possible normal distribution that has a reasonable fit to the daily returns data. - fig b.

To generate predicted returns, we take random samples from that normal distribution (the inferred underlying distribution) as can be seen in fig c.

Having the predicted daily returns we can compute the predicted time series of cumulative returns, which is shown in fig d. Note that we have only one predicted path of possible future live trading results because we only had one prediction for each day. We can get more lines of predictions by building more than one inferred distribution on top of actual data and repeating the same steps for each inferred distribution. So we take n samples from the mu posterior and n samples from the sigma posterior. For each posterior sample, we can build n inferred distributions. From each inferred distribution we can again generate future returns and a possible cumulative returns path (fig e). We can summarize the possible cumulative returns we generated by computing the 5%, 25%, 75% and 95% percentile scores for each day and instead plotting those. This leaves us with 4 lines marking the 5, 25, 75 and 95 percentile scores. We highlight the interval between 5 and 95 percentiles in light blue and the interval between 25 and 75 percentiles in dark blue to represent our increased credible interval. This gives us the cone illustrated in fig f. Intuitively, if we observe cumulative returns from an algorithm that are very different from the backtest, we would expect it walk outside of our credible region. In general, this procedure of generating data from the posterior is called a posterior predictive check.



Overfitting and Bayesian consistency score

Now that we have talked about the Bayesian cone and how it has been generated, you may ask what these Bayesian cones can be used for. Just to give a demonstration of what can be learned from Bayesian cones, look at the cones illustrated below. The cone on the right shows an algorithm whose live trading results are pretty much within our prediction area and to be more accurate even in high confidence interval of our prediction area. This basically means that the algorithm is performing in line with our predictions. On the other hand, the cone on the left
shows an algorithm whose live trading results are pretty much outside of our prediction area, which would prompt us to take a closer look as to why the algorithm is behaving according to specifications and potentially turn it off if it is used for real-money live trading. This underperformance in the live trading might be due to the algorithm being overfit to the past market data or other reasons that should be investigated by the person who is deploying the algorithm or selects whether to invest using this algorithm.

algo1_bayescone algo2_bayescone

Lets take a look at the prediction cones generated using the simple linear model we described in the beginning of the blog. It is interesting to see that there is nothing worrisome about the algorithm on the left, while we know that the algorithm illustrated on the right is overfit and the fact that the Bayesian cone gets at that but the linear cone does not, is reinforcing.

algo1_prevcone algo2_prevcone

One of the ways by which the Baysian cone can be useful is detecting the overfit algorithms with good backtest results. In order to be able to numerically measure by how much a strategy is overfit, we have developed Bayesian consistency score. This score is a numerical measure to report the level of consistency between the model predictions and the actual live trading results.

For this, we compute the average percentile score of the paper-trading returns to the predictions and normalize to yield a value between 100 (perfect fit) and 0 (completely outside of cone). See below for an example where we get a high consistency score for an algorithm (the right cone) which stays in the high confidence interval of the Bayesian prediction area (between the 5 to 95 percentiles) and a low value for an algorithm (the left cone) which is mostly out of predicted area.

Accounting for estimation uncertainty

Estimation uncertainty is one of the risk factors, which becomes relevant with modeling and it is reflected on the width of the prediction cone. The more uncertain our predictions, the wider the cone would be. There are two ways by which we may get uncertain predictions from our model: 1) little data, 2) high volatility in the daily returns. First, lets look at how the linear cone deals with uncertainty due to limited amounts of data. For this, we create two cones from cumulative returns of the same trading algorithm. The first only has the 10 most recent in-sample days of trading data, while the second one is fit with the full 300 days of in-sample trading data.


Note how the width of the cone is actually wider in the case where we have more data. That's because the linear cone does not take uncertainty into account. Now let's look at how the Bayesian cone looks like:


As you can see, the top plot has a much wider cone reflecting the fact that we can't really predict what will happen based on the very limited amount of data we have.

Not accounting for uncertainty is only one of the downsides of the linear cone, the other ones are the normality and linearity assumptions it makes. There is no good reason to believe that the slope of the regression line corresponding to the live trading results should be the same as the slope of the regression line corresponding to the backtest results and normality around such line can be problematic when we have big jumps or high volatility in our data.


Having reliable predictive models that not only provide us with predictions but also with model uncertainty in those predictions allows us to have a better evaluation of different risk factors associated with deploying trading algorithms.  Notice the word “reliable” in my previous sentence, which is to refer to the risk of “estimation uncertainty”, a risk factor that becomes relevant with modeling and ideally we would like to minimize it.  There are other systematic and unsystematic risk factors as is illustrated in the figure below. Our Bayesian models can account for volatility risk, tail risk as well as estimation uncertainty.


Furthermore, we can use the predicted cumulative returns to derive a Bayesian Value at Risk (VaR) measure. For example the figure below
shows the distribution of predicted cumulative returns over the next five days (taking uncertainty and tail risk into account). The line
indicates that there is a 5% chance of losing 10% or more of our assets over the next 5 days.


To use these models on your data, check out pyfolio, which is also available on the Quantopian research platform to run on your backtests.

Share on FacebookShare on LinkedInShare on Twitter

Pyfolio -- a new Python library for performance and risk analysis

August 24th, 2015   Posted by: Thomas Wiecki

Today, we are happy to announce pyfolio, our open source library for performance and risk analysis. We originally created this as an internal tool to help us vet algorithms for consideration in the Quantopian hedge fund. Pyfolio allows you to easily generate plots and information about a stock, portfolio, or algorithm.

Tear sheets, or groups of plots and charts, are the heart of pyfolio. Some predefined tear sheets are included, such as sheets that allow for analysis of returns, transactional analysis, and Bayesian analysis. Each tear sheet produces a set of plots about their respective subject.

Pyfolio is available now on Quantopian's research environment. It can be used standalone or in conjunction with zipline.FB_returns_tear_sheet

Here is part of a tear sheet analyzing the returns of Facebook's (FB) stock:



pyfolio is available now on the Quantopian Research Platform. See our forum post for further information.

Finally, see pyfolio website or pyfolio's GitHub page, where you can see the full source code or contribute to the project.

Share on FacebookShare on LinkedInShare on Twitter

Notebooks, backtests, and video lectures. All in one spot:

We have launched a free quantitative finance education curriculum for our community. We have held a series of lectures via meetups and webinars that cover the concepts of the curriculum and demonstrate how to use our platform and tools to write a good algorithm. We have also released cloneable notebooks and algorithms for each topic covered. As promised, we have developed a home to house this curriculum. Please check out our new Lectures page and learn more about:

  • Linear Regression
  • Spearman Rank Correlation
  • Pairs Trading
  • Beta Hedging
Kalman Filters

Kalman Filters

This curriculum is being developed in concert with our academic program, in which we're working with professors from MIT Sloan, Stanford, and Harvard. The education material we're generating is being vetted by professors as they use our platform to teach their classes, so you can expect to get the same materials that are in use at top schools.

Share on FacebookShare on LinkedInShare on Twitter

Secret Sauce vs Open Source

August 12th, 2015   Posted by: fawce

For too long, finance has been a net consumer of open source. Major financial institutions and startups alike have only used major open source projects, but virtually none have launched new projects of any significance. That's simply not good enough. To lead in the technology world, financial firms need to found, maintain, and advance major projects of their own.

So, I was really excited to read today's Wall Street Journal report titled:

Goldman Sachs to Give Out ‘Secret Sauce’ on Trading
New open source platform is an attempt by Goldman to bolster its technology bona fides

Sadly, sharing the Secret Sauce is not the same as Open Source. The article leads me to believe that Goldman will unbundle some of their services into smaller applications. That is a great strategic move, but it isn't open source. It isn't even API access. It is just repackaging – very shrewd repackaging that will tend to lock customers in.

Open source software has become central in the technology industry and to the culture of our profession. Open source projects are where developer communities grow (Python, R, docker), new businesses emerge (github), and technologies mature (hadoop).

Financial services companies need to do more than simply use open source; they need to lead open source projects; they need to contribute code. Otherwise, the financial community is left isolated and outside the mainstream of the software community. Many of the best and most innovative minds stay out of finance because they can't maintain the long term relationship with their work that only open source allows. You can't expect to be competitive in technology without connecting with the software community, and contributing to the culture. Without launching and leading open source, financial technologists risk irrelevance.

Goldman's CTO, R. Martin Chavez, seems to get it. He's built software, teams, and companies. The way he talks about the technology world, I think he really understands the underlying dynamics of the ecosystem. I believe he has the right vision; I'm rooting for him. But, I don't think today's announcement goes far enough.

The article describes SecDB, a legendary system that helps Goldman manage risk. Maybe someday SecDB will be a cloud service with pure API access, and the software packages developed to handle the volume of pricing changes will be opensourced. That would be progress. That would be the visionary leadership our industry needs.

Share on FacebookShare on LinkedInShare on Twitter
  • Follow Us!

    Friend me on FacebookFollow my company on LinkedInRSS FeedFollow me on Twitter
  • Quantopian

    Get email updates about Quantopian:
  • Recent Posts

  • Recent Comments

  • Categories

  • Archives