Parameter Optimization in Algorithmic Finance Part I: Walk forward optimization

Posted by: Thomas Wiecki

Automated trading systems make decisions on how to invest in the stock market. These decisions often depend on parameters that must be optimized to maximize returns and stability while minimizing risk. Quantopian’s mission is to provide you with the necessary tools to implement and test your own trading algorithms. As a simple example, you might have an algorithm that starts buying once a particular stock has gone up x number of times in a row and starts selling once the stock has gone down y number of times in a row. However, how do you determine what good values for x and y are? As a general guiding principle, you might chose values that worked well in the past and ideally, you would want to those values to be determined automatically. Our goal with the parameter optimization is to allow the user to specify that the backtesting engine should find good values for x and y. In your algorithm you would then simply use these parameters as you would any other variable – except that its value will be chosen for you!

Fortunately for us, a large branch of applied math and computer science has been working on this very problem of parameter optimization for quite some time. In general terms, we have some input parameter(s) (x and y in our case) and an objective function we want to maximize (e.g. the cumulative returns). A multitude of diverse optimization algorithms exist, ranging from the more formal gradient descent to more heuristic techniques like particle swarm optimization. And while this is a fascinating and clearly relevant topic, this blog post focuses on how to apply the optimization algorithm rather than on which algorithm to apply. I will discuss the pros and cons of the different algorithms in a future blog post.

Batch optimization

I personally applied a fair share of machine learning and optimization techniques during my ongoing academic endeavors into computational cognitive neuroscience. However, I have mostly dealt with batch-data, never with real-time data. With this background my first solution was to apply an offline optimization algorithm (i.e. gradient descent) and evaluate each parameter value by running the trading algorithm over the complete data set. Specifically, we set one initial parameter value, run the trading algorithm on the whole historical financial data while keeping this parameter value fixed and evaluate how good we did (e.g. via the cumulative returns). The optimization algorithm then suggests a new parameter value which we again test over the whole historical data set. Rinse and repeat until we find the parameter value that provides us with the highest returns.

While this approach did work with a carefully designed example trading algorithm, we discovered a few interesting problems and limitations to this solution:

  1. Unrealistic: It is hard to imagine how we would use this method in a real-world trading system. We probably wouldn’t want to use a parameter that worked well from 1997 to 2007 for all our day-to-day trading. Market demands change all the time and we want to incorporate the new data we observed! Moreover, we probably care less about which parameter values worked well in 1997 and want more recent data to have a bigger influence on our choice of parameters.
  2. Over-fitting: We might find that certain parameter values work extremely well for the period we used in the optimization, however, once we test it on unseen data we notice that it does quite poorly. In the machine learning literature this is a very well known problem called over-fitting. It results from fitting models to noise artifacts in the data.
  3. Boring to watch: It would be pretty boring to just hit the optimize button and be notified a couple of hours (or days) later about the optimal parameter values found without any real insight into what was going on and not get any intermediate feedback.

Walk forward optimization

Our current approach improves all three of these points. Instead of batch optimizing the parameters over the whole historical data set, we run an optimization for each day individually using the data of e.g. last week. This is also known as walk forward optimization. For example, say we wanted to optimize our parameters iteratively using a time-window of two days. For day t, we would use the parameter values obtained by optimizing over day t-2 and t-1. We then move on to the next day, t+1, for which we use the parameters optimized over days t-1 and t and continue to move forward in this fashion. The image below should illustrate this more clearly.

This solves the problems outlined above in the following ways:

  1. Realistic: We could directly apply this method in a real-time trading system that continues to learn as new data becomes available. Moreover, because we only use the most recent data (e.g. last week) in each step, we quickly adapt to changes in the market.
  2. Out-of-sample testing: Note that we always test our chosen parameters on data not used for the optimization (day t is using the optimal value of days t-2 through t-1, not that of day t). This is known as out-of-sample testing.
  3. Continuous progress updates: We get immediate, real-time feedback on the performance of the trading algorithm and the current optimal parameter values while the simulation is ongoing. Not only is this exciting to watch, it allows us to spot potential problems with the trading algorithm or the parameter optimization early on.

On top of these benefits I think there are some other potentially interesting things to explore with this approach. For example, one might want to focus on how the optimal parameter values change over time as this might reflect how our algorithm responds to changing market demands. Coming back to our original example, lower values of x might be better in times of financial turmoil while during a boom, high values of y are beneficial.

What other cool things would you do with this? Leave a comment!

Follow Quantopian and me on Twitter.

Tags:

11 Responses to “Parameter Optimization in Algorithmic Finance Part I: Walk forward optimization”

  1. experquisite says:

    Will quantopian support the various open python optimization packages (openopt, cvxopt, mosek etc)? And for that matter, custom cython? When can we start to play with the system?

  2. twiecki says:

    That’s a great question — optimization algorithms will actually be the topic of a future blog post but I’ll give you an insight into our current plans.

    We are evaluating multiple optimization packages currently. Which of those will be used will probably not be exposed to the user-interface. However, you will be able to choose which optimization algorithm you’ll want to run (e.g. simplex, Newton, genetic algorithms).

    What I see as a problem with the optimization algorithms that come with e.g. cvxopt (such as convex optimiziation) is that they require strict assumptions your algorithm has to meet (i.e. differentiability and a convex relationship between your parameters and your objective function). In algorithmic finance these assumptions can be violated quite easily (think if-statements) and we make no assumptions on what trading algorithms users might come up with. That’s why I think meta-heuristic algorithms like genetic algorithms or particle swarm optimization could be interesting. There is no formal proof that they will actually find the optimal solution but they are quite versatile.

  3. experquisite says:

    I got schooled in the difficulties in applying convex/conic optimizations while building some portfolio optimization python last month. For the problems that fit the mold, they are SO much faster though; any brute force optimizer of say the Sharpe ratio on a long-only portfolio of 2000 assets is (in my experience) too slow.

    I am not sure if you are targeting quantopian at strictly automated short-term trading algorithms, or whether you are trying to build most of a hosted python-for-finance SaaS but I think it would be unwise to simplify away the details of the mechanics of systematic finance. The value proposition of quantopian, in my mind, is not in trying to replicate TradeStation EasyLanguage in python for people with Schwab accounts, but just to once-and-for-all do away with the 80% of the tedious work involved in cleaning input data, OMS, execution, reporting and slippage analysis.

  4. Ben Ford says:

    Have you had a look at http://en.wikipedia.org/wiki/Hierarchical_temporal_memory? I’ve just read Jeff Hawkins book and I’m halfway through the paper. it looks interesting for picking patterns and predictions out of real time streams of data.

  5. twiecki says:

    Hi Ben,

    Thanks for the pointer! I saw Jeff Hawkins give a talk on this once and it seems very interesting. There are quite a few companies now working on products that are apparently learning just the like brain. The only problem is that we have no idea how the brain actually learns (for a blog post on this point, see http://blogs.discovermagazine.com/crux/2011/11/16/later-terminator-we%E2%80%99re-nowhere-near-artificial-brains/))

    Having said that, I think HTM looks very promising from a technical perspective. It seems to be quite similar to Geoff Hinton’s work on deep learning (e.g. Deep Belief Networks), which often does very well on benchmark datasets.

    Of course, it is certainly possible that we will make HTM and related algorithms available in Quantopian so that you can explore how well they can do on historical market data!

    Thomas

  6. Grant Kiehne says:

    Hello Thomas,

    The OLMAR algorithm might be an interesting test case for walk-forward optimization. The author of the algorithm provides guidance on how effectively to eliminate the length of the trailing window used in the moving price average computation (the “BAH(OLMAR)” recipe). However, that still leaves the epsilon parameter (and the list of securities, while we’re counting).

    My first thought is to re-optimize epsilon on a daily basis within the algorithm. I’d have to have access to a trailing window of performance and risk data–could one be constructed? Also, what optimization algorithm do you think would be best?

    Grant

    Ref.:

    https://www.quantopian.com/posts/olmar-implementation-fixed-bug

  7. Hi Grant,

    I think the OLMAR is an excellent example of how critical optimization is. This is a tough nut to crack however. Having said that, I did make some progress recently in figuring out how this can be done. There is one particularly nice paper which uses Gaussian Processes to optimize an objective function (they focus on hyperparameter optimization but it’s more general than that): http://hips.seas.harvard.edu/files/snoek-bayesopt-nips-2012.pdf

    Other than that, you are correct that we also need to enable access to risk metrics while the algorithm is running. This could be done as part of a larger refactoring of our current risk metric implementation I have in mind but that’s also only in the planning phase (but could be achieved much quicker).

    Thomas

  8. Grant Kiehne says:

    Thanks Thomas,

    I suppose that one could compute the performance and risk values versus time, by writing a custom function within the algorithm (rather than getting access to the rolling results computed by the backtester). They could be stored and accessed for optimization.

    Eventually, I’ll see if I can get the “BAH(OLMAR)” implementation working on Quantopian, which should reduce the volatility. Then I’ll think about the epsilon parameter.

    Grant

  9. Nick says:

    Great article. One question though:
    At the end of teh walk forward optimisation you end up with many results. You have one solution for each step.
    So how you choose the best one from all of them?
    You can’t simple select the best of them as like that the result can be just chance.

    Thanks in advance

  10. Hi,

    I don’t think you have to choose the best one overall. The underlying assumption is that there is no best one for all times but rather that the optimal value changes over time. Walk-forward optimization will allow you to do that.

    Having said that, you might want to take a careful look at those values. If they jump around a lot maybe your strategy is not very stable or you are overfitting on the window.

    Finally, if they are stable over time it seems the mean value might be reasonable to take if you wanted to come up with a single number for all settings.

    Thomas

  11. [...] had a blog post a while ago about parameter optimization when building a trading algorithm.  What is parameter [...]