Congratulations to Michael Van Kleeck, our fifth winner of the Quantopian Open! As of right now, Michael is live trading $100,000 of Quantopian’s capital and will be receiving all the profits from his algorithm.
Michael got his start in the market in graduate school, when he began analyzing futures in his spare time. Later, he started trading equities and options while he was working as a software developer. He soon joined a startup hedge fund, researching strategies and implementing an automated futures trading system. As of now he trades his own account and is continuing to work on his algos.
Two years ago, Michael joined Quantopian after a former hedge fund colleague told him about it. Michael says he harnessed the power of "plentiful data, easy backtesting, and friendly community" to build his strategies. He became active in the community, giving a talk at the San Diego Python Users Group. He spent a few months away from Quantopian while pursuing options algorithms, but started actively developing on the platform again once the Quantopian Open launched.
Michael calls himself a “voracious” reader of any literature dealing with trading, and is always on the lookout for simple, easily testable hypotheses exploring the fundamental principles of “economies, markets, or human psychology.”
Michael started live trading his winning algorithm a week ago with our $100,000 brokerage account. He will keep all the profits generated after 6 months. You can track him and the other winners on our Quantopian Open winner’s page.
In addition, you can check out the leaderboard to see how everyone is doing for July’s contest. And we are currently accepting algos for August, so don’t forget to submit by August 3 at 9:30AM EDT - you could be the next winner!
By Delaney Granizo-Mackenzie
We are experimenting with a free quantitative finance education curriculum for our community. We have started holding a series of lectures via meetups and webinars that cover the concepts of the curriculum and demonstrate how to leverage our platform and tools to write a good algorithm. We are also releasing corresponding cloneable notebooks and algorithms for each topic covered.
This curriculum is being developed in concert with our academic outreach initiative, in which we're working with professors at schools including MIT Sloan, Stanford, and Harvard. The education material we're generating is being vetted by professors as they use our platform to teach their classes, so you can expect to get the same materials that are in use at top schools.
Webinars & Cloneable Notebooks and Algorithms
Our first lecture, "The Art of Not Following the Market" covered some approaches for reducing correlation to a benchmark and discussed why returns aren’t everything. If you missed the talk, you can view the webinar and then clone the corresponding notebooks and algorithms from the community forum post.
Our second talk, "The Good, The Bad, and The Correlated" focused on how you can leverage correlation to improve your trading algorithms. To learn more, view the webinar and corresponding notebooks from this lecture.
Upcoming Schedule of Lectures
Roughly every two weeks, we will hold a meetup in our Boston office; the same material will be covered in a live webinar on Thursday of the next week. Until a permanent home is built on our site, we will offer the recording of the webinar out to our community via our community forum. We will also link back to the notebooks and algorithms for each.
Here are the upcoming lectures scheduled:
We will be publishing the upcoming schedule, webinars, and cloneable algorithms and notebooks in the community forum. We'll also be setting up a permanent home and distribution center for our curriculum notebooks and algorithms – and will share this in the coming months. Hope you can join us!
When told he would be competing with his peers in his Alternative Investments class, Olivier Lapierre knew he would have to do something to set him apart from his classmates.
He, along with his teammates Pierre-Luc Nadeau, Marc-Antoine Larochelle Rodrigue, Pierre-Gabriel Gagnon, and Jean-Simon Caron, decide to build their team's hedge fund through Quantopian. The assignment was to create a fund entirely from scratch, including figuring out the strategies, fees, legal partners, and location of the fund. When it came time to backtest strategies, every other team was manually backtesting in Excel, but Olivier was able to work faster and better by creating quantitative algos in Quantopian. The team won first place.
Olivier has been interested in quantitative investing since he was 17 years old - in his mind, “if you're not doing something automated and quantitative, you’re not being competitive in the market.” He and a friend had wanted to build a hedge fund even before their Alternative Investments class, and wanted it to be completely algorithmic.
Even though Lapierre was taking mathematics and programming classes, he still found a lot of trouble feeding the algorithms to a data feed and broker to build his idea. Then he discovered Quantopian. He stumbled upon it through Reddit (r/finance), and realized it would make their project much more fully developed, telling his friend “this is going to change your life.”
Olivier found that the Quantopian community forums are a great place for examples and inspiration. He found a strategy in the community, and tweaked it until he got just what he needed. “The tutorials were well made and it was easy jump in,” says Lapierre, and the team was quickly able to find a percentile channel momentum algorithm, make a few changes and add a volatility filter, and that was the winner. Even though he ended up using pricing data instead of fundamental data (due to a fast approaching deadline for the project), he and his team still ended up blowing their competition out of the water.
Olivier and his team are currently paper trading with their algo to get a good track record and working on moving it to live trading, in anticipation of the upcoming CFA Exam.
Trying to get a leg up on your competition like Olivier? Start paper trading your algorithms now and build the track record you need to succeed!
Do you have an example of how Quantopian has changed your life? Let us know! We’re excited to hear about your experience.
“$AAPL is seriously going to kill it today.”
What do you think this sentence is saying? My guess is, you’re probably not struggling to understand what it means. You knew from first glance that “$AAPL is seriously going to kill it today” was another way of saying, “$AAPL’s stock price is going to increase today”.
The same can’t be said for a computer. If you fed that same sentence into a Python interpreter, all it'd know is that there’s a String, 42 characters long, somewhere in memory. That's where the problem lies. Millions of messages just like these are lost everyday because that's all they are, messages. And while the author spent the morning buying up millions of $AAPL stock, you're pressing "Buy" three hours too late. So what if you could change that? What if you could feed that sentence into an algorithm and know exactly what people are feeling about the market?
That’s exactly what the folks over at PsychSignal set out to do. PsychSignal is a provider of online “Trader Mood Indices” to the financial industry. Their proprietary technology enables any data consumer to gain critical insight into the trader-specific moods behind over 10,000 publicly-traded securities. They built a highly specialized natural language processing engine that’s able to parse through online market conversations to not only analyze and detect the trader moods bullishness & bearishness, but also score the mood intensity. With data going back to 2009 and sources which include StockTwits, Twitter, T3Live Chat and other private stock market specific chat rooms, they have a plethora of datasets - while this backtest looks at StockTwits derived messages specifically.
In collaboration with the PyschSignal team, we created a simple long-only trading strategy that used their sentiment scores to determine which stocks in the NASDAQ 100 to long to present an intro-level strategy using this new type of data.
Simply by trading on the stocks with the highest bull sentiment, I was able to create an enhanced NASDAQ 100 portfolio. However, while the strategy significantly outperforms the market, Dr. Checkley from PsychSignal comments that:
We can guess that those who tweet and who tweet frequently are highly-attuned to market conditions. They are “bell-weathers” for general market mood. If so, then using sentiments derived from tweets is likely to distil the more extreme emotions and lead to highly sensitive indicators. Trading on those indicators gives a high beta.
There are no short positions, which goes hand-in-hand with the beta problem. But clearly, it would be no great challenge to, say, “invert” the analysis above and seek-out strong bear signals for short trades. And there is unlimited scope for changing filters, the buy and sell rules, the universe of stocks, etc. As you get comfortable with using these sentiment metrics, we recommend blending the bull and bear measures with other indicators, such as long and short-term price trend or trade volume. You can also use a more sophisticated model, such as a Neural Network, to create your predictions and trading signals. With hundreds of heavily-tweeted stocks and assets to choose from, and your full arsenal of analytical tools and algorithm-refinements, you can improve on our results in short-order.
It's not an exact science and this strategy only skims the surface of what's possible. However, as technology improves, more and more models & investment strategies will begin incorporating this kind of data.
Another month, another winner! A big congratulations to Szilvia Hegyi for taking first place in the May round of the Quantopian Open. Szilvia is now managing $100,000 of Quantopian’s money and will keep all the profits her algorithm generates.
Szilvia started out as an economist graduating from Budapest Business School, specializing in financial institutions. Shortly after graduating, she worked for the Austrian bank Raiffeisen in the corporate finance field. She then began working as an analyst and consultant, taking a look at the processes of global companies from both a strategic and operational point of view. She has gained experience with insurance, automotive, retail, and many other industries. She often encountered large datasets in her work, and that led her to study more in-depth statistical research methods and data science.
In 2012, She co-founded BrinicleLab, a financial research company, and is mainly responsible for portfolio development and backtest evaluation. Szilvia worked with a team of two other coworkers to come up with her winning algo.
She came across Quantopian when BrinicleLab was looking for platforms to evaluate their own research and algorithms. When asked about her thoughts on Quantopian, Szilvia said she could really get behind Quantopian’s philosophy and framework, and that she believes it could “revolutionize the finance industry.”
Szilvia and her team used a rigorous process to develop the winning algorithm. The team spent about 6 months collecting and reviewing 10 years worth of research papers on algorithmic investing from the world’s top universities. They took the 30 best papers they found to use as their basis for their algorithm. Then came about 3 months worth of coding, followed by hundreds of backtests through many different asset classes in order to determine the final parameters. Then came the portfolio selection and validation and lastly, on June 1, they began the real-money trading of $100,000.
Szilvia will be able to keep whatever profits she earns after six months. You can follow her (and the other winners') progress on the Quantopian Open winners page.
You can also to a look at the leaderboard to see where everyone stands on the current contest. And although June entries are now closed, we are accepting entries for July’s competition, so make sure to submit! Who knows, pretty soon we might end up writing about you.
Data Science is about learning from data, often using Machine Learning and statistics. To do so, we can build statistical models that provide answers to our questions or make predictions based on data we have collected. Ideally, we build the model that most accurately describes our data, makes the best predictions, and provides the answers of interest. Once we have our dream model we just have to figure out how to fit it to data (i.e. do inference). Graphically, this is how I think the process should look like:
Unfortunately, as anyone who has done such a thing can attest, it can be extremely difficult to fit your dream model and requires you to take many short-cuts for mathematical convenience. For example, everyone knows that financial returns are not normally distributed but still, explicitly or implicitly this assumption is still made a lot (e.g. the Sharpe ratio as I show in my talk, but also every time you use a linear regression like when estimating financial alpha and beta). Why? Because it's so convenient to work with! Thus, statistical modeling more often looks like this in reality:
So a lot of times we don't build the models we think best capture our data but rather the models we can make inference on.
I have blogged before about Probabilistic Programming and besides posting the video of a recent talk I gave with accompanying code (see below), I would like to highlight how Probabilistic Programming gets us much closer to the ideal I visualized above. In short, Probabilistic Programming Systems allow you to specific statistical models in code. Once specified in such a way, fitting this model to data (i.e. inference) is completely automatic (if things go well).
Think about that for a second, you're not tied to pre-specified statistical model like a frequentist T-Test that some statistician worked out how to do inference on. Even more important, these statistical models have their own assumptions baked in (like the normality assumption) that all too easily violated if you're not careful.
What about Machine Learning? Comprehensive libraries provide the data scientist with many turn-key algorithms that have very weak assumptions on the actual distribution of the data being modeled. While this blackbox property makes machine learning algorithms applicable to a wide range of problems it also limits the amount of insight that can be gained by applying them.
Here is a recent talk I gave on this topic while on my trip to Singapore. I've posted both the slides and the research NB that implements the models (note that there's two data files in the repo you'll also need).
IEX is currently one of about 40 active ATSs, most of which would be considered dark pools. Collectively, ATSs trade roughly 20% of overall U.S. equity volume. Most of these dark pools operate as continuous order books, much like the lit exchanges (NYSE, Nasdaq, etc.), and most of them are operated by brokers. From a trading perspective, the major differences between continuous dark pools and exchanges are as follows:
Why Trade in a Dark Pool?
There are many reasons why a broker might send an order to a dark pool (keep in mind that most of the major institutional brokers operate their own dark pools - think of all the potential synergies), but I'm going to focus on the reasons why an institutional investor (mutual fund, pension fund, hedge fund, etc.) might want to trade in a dark pool:
In practice, the second reason (price improvement) is probably the only one that's relevant to smaller firms and retail investors.
The standard order type used by institutional investors in dark pools is midpoint peg, an order type managed by the dark pool operator that simply floats with the market at the midpoint of the NBBO. Institutional brokers and investors face two main challenges when trading with midpoint peg orders in dark pools:
In the above chart, the blue line represents the National Best Bid across the exchanges, and the red line represents the National Best Offer. The large midpoint peg buy order floats in the dark pool, executing in pieces in line with the rest of the market throughout the day (presumably smaller-sized sell orders are entering the dark pool at various points, and the large buyer's order has not been fully satisfied until the end of the chart). This is the normal, desirable dark pool experience.
In the second chart, the large buy order only executes in pieces at the local highs immediately before the market moves lower, and as a result its average fill price is substantially worse.
How does this happen? Two main ways:
Stale Quote Arbitrage
We call this latter case a type of structural arbitrage; it only exists due to a technical limitation of the dark pool (the dark pool can't update its prices fast enough). Whereas a broker can build anti-gaming logic into their trading algorithms to try to prevent orders from getting "sniffed out" (e.g. by piecing the order in smaller pieces or with extra-conservative limit prices), when it comes to structural arbitrage on a dark pool or exchange, there's really not much the broker can do other than simply not trading in that market. And if every single dark pool has a similar exposure to this situation, well then, there's not much recourse.
Ok. So everything up until this point has been theoretical. Prior to IEX, my job was to design and program equity execution algorithms at RBC, a large Canadian bank. The first strategy I ever built was a dark pool aggregator, which would take an institutional investor's order (our client) and split it up among several dark pools, shuffling around seeking out the other side of the trade. The resting midpoint orders we sent to dark pools were certainly adversely selected regularly, and we believed the primary cause to be stale quote arbitrage, but we couldn't definitively confirm this belief-- there was no way for us to synchronize the true state of the market with the dark pool's view at the time of the execution. Still, even though we couldn't be 100% certain of the dynamics at play, we figured our team would be able to solve this and other challenges faced by institutional brokers and investors, and so we left to start IEX.
Preventing Structural Arbitrage
At IEX, one of our top priorities right out of the gate was to prevent structural arbitrage on our market; we did not want our technical limitations to expose any of our customers to sub-optimal trades. To specifically address stale quote arbitrage-- the situation where our system is unable to update the prices of resting pegged orders fast enough to prevent them from getting picked off-- we came up with the idea to impose a tiny delay on all inbound orders. We measured that all-in, it takes us a little over 300 microseconds to recognize a price move on the exchanges and update our pegged orders accordingly, so we introduced a delay on inbound orders of 350 microseconds, so that even if a trader could instantaneously recognize a price change in the market and send us an exploratory order at the old price, their order would be delayed for just long enough to ensure that we would know the new price before their order could execute.
This is one of the most common misconceptions about IEX's "speed bump"-- we're not trying to pick winners and losers or equalize everyone's technology. The trader that gets to our entry ports first will get to trade first, so for the majority of trading strategies, we're simply moving the race from our matching engine to another data center. The purpose of the speed bump is simply to ensure that we don't allow trades at stale prices after the market has already moved. A bookie probably shouldn't allow bets to be placed on a race that's already over, even if they have unknowing customers willing to take the losing side of that bet. Similarly, we impose this delay to ensure that no customer can make a trading decision based on more up-to-date market data than we have ourselves, so that we can better enforce the spirit of Reg NMS.
We went live in late October 2013, and for the first several months there was little-to-no adverse selection on IEX. Of course, there will always be some baseline level of short-term adverse selection from market fluctuations and orders getting run over, but we seemed to have solved the stale quote arbitrage problem.
But then suddenly, and quite dramatically, the incidence of adverse selection on IEX rose in mid-2014:
The above chart shows the % of shares added at the midpoint such that the midpoint was at a better price 10 milliseconds later. From March to July of 2014, the incidence of adverse selection of midpoint orders on IEX rose from about 3-4% to about 9-11%. Given our architecture, stale quote arbitrage should have been impossible, so what was happening?
We realized our speed bump was effective at preventing after-the-fact stale quote arbitrage-- that is, no trader can observe a price move in the market and then effect a trade on IEX at the old midpoint. These new incidences of adverse selection, however, were occurring as much as 1 to 2 milliseconds prior to an NBBO update, far enough in advance that our speed bump wasn't relevant.
This was a bit of an "aha" moment for us. We realized that NBBO changes are not always instantaneous events; rather, they are a coordinated series of events that happen across all of the exchanges in rapid succession. In other words, when a liquid stock moves from $9.99 bid/$10.00 offered to $9.98/$9.99, all of the individual buy orders at $9.99 across all of the exchanges are either filled or canceled, and one by one each exchange's best bid drops, and then the offers fill in. The whole process can take a few milliseconds, but while it's in motion, a trader can see the dominoes falling and have a fairly high degree of confidence that the market is moving.
We briefly considered lengthening our speed bump delay to prevent this new type of pseudo-arbitrage, but we felt the order-of-magnitude increase that would be necessary might be too disruptive to normal trading activity. Then we figured we could introduce a new kind of midpoint peg order that monitors for when the market is in transition and, in those moments, becomes less aggressive, and this is exactly what we did.
The New Order Type: Discretionary Peg
We called the new order type discretionary peg, and it went live in November 2014. Discretionary peg is an order type that is willing to trade as aggressively as the midpoint of the NBBO the vast majority of the time, but in moments when IEX observes the market moving in the order's favor, it is only willing to trade on the passive side of the bid/offer. Just like a fast trader conducting this pseudo-arbitrage, we constantly observe the market for a signal that the price of a stock is in transition, and we use this signal to prevent a trade from occurring at the soon-to-be-stale price.
As an aside, I've heard the question several times going all the way back to our RBC days: if we are able to identify profitable signals like this one, why don't we start a proprietary trading shop for ourselves and just print money? There are many reasons why we started IEX instead of going down this path, but from a practical perspective, it's simply not our competitive advantage. Many arbitrage or pseudo-arbitrage strategies are extraordinarily simple from a trading logic perspective-- this stuff isn't rocket science-- and this case is no different. The challenge is that only one trader can successfully take advantage of each arbitrage opportunity, and to win the race consistently requires extremely fast technology and frequent upgrade. I have no reason to believe our team would be nearly as effective in that space.
Protecting our Resting Orders
So given all this, how could we possibly identify that the market is in transition early enough to help our resting order avoid getting picked off? First, keep in mind that we do still have the speed bump working for us, so we don't need to be the single fastest at picking up the signal-- as long as we can identify that the market is transitioning within 350 microseconds of the very fastest trader, we can protect our resting discretionary peg orders. It turns out that 350 microseconds is an enormous head start, and it makes our job a lot easier. Secondly, the downside of a false positive for IEX is smaller than the downside of a false positive for an arbitrage trader: if a trader has a false positive, and they execute a trade anticipating a market move that doesn't actually happen, they now have a position that they will most likely wind up closing at a loss. If IEX, on the other hand, thinks the market is moving, but it doesn't, we just return the order to its normal behavior. There is a small chance that the discretionary peg order may miss a desirable midpoint trade in this tiny window, so false positives are still not a good thing-- they're just not as bad, so we can afford to be a bit more aggressive with our signal than an arbitrage trader.
Ultimately, IEX doesn't need to win the prediction arms race. Of course we will continue to strive to make our signal as precise as possible, but even if the signal is a little bit crude and noisy, as long as we take away the really obvious profitable scenarios, it should make the entire practice much less desirable to conduct.
Here are the results so far:
Whereas midpoint pegged orders have been seeing adverse selection in the 9-11% range on average since last July, discretionary peg orders are down in that 3-4% range that we saw in the early days of IEX. It's important to note that there is a trade-off between using the two order types: midpoint peg orders earn higher priority than discretionary peg inside the spread, and of course, discretionary peg orders naturally face a lower fill rate by avoiding a subset of trades (1-2% lower in practice so far). All-in-all, however, discretionary peg does seem to be a compelling order type for an institutional broker/investor concerned with short-term adverse selection, and we are very happy with the results to date.
In closing, it is the broker's job to navigate the market effectively on behalf of their customers, but if an exchange or a dark pool has a blind spot that allows for structural arbitrage, there isn't much the broker can do. You can't blame a trader for trying to profit off an inefficiency, but we believe that exchanges and ATSs have the responsibility to ensure that they don't have any blind spots.
Dan is a co-founder and quantitative developer at IEX. He is responsible for building and evolving core functionality for the IEX trading venue, namely its matching engine, smart order router and its newest order type: Discretionary Peg. Forbes recognized Dan as one of their 2015 "30 Under 30".
The last two months have been busy as ever at Quantopian. We're in full swing running the Quantopian Open, a paper trading competition where each monthly winner gets to manage a $100,000 brokerage account. The research platform has accepted over 600 people in the beta phase, ramping up the usability and features. More tools and data were added to the backtester and the community has a new look-and-feel. Take a look below at the details of our latest news and releases:
Tools and Features
"I’ve never seen a bad backtest” -- Dimitris Melas, head of research at MSCI.
A backtest is a simulation of a trading strategy used to evaluate how effective the strategy might have been if it were traded historically. Backtestesting is used by hedge funds and other researchers to test strategies before real capital is applied. Backtests are valuable because they enable quants to quickly test and reject trading strategy ideas.
All too often strategies look great in simulation but fail to live up to their promise in live trading. There are a number of reasons for these failures, some of which are beyond the control of a quant developer. But other failures are caused by common, insidious mistakes.
An over optimistic backtest can cause a lot of pain. I’d like to help you avoid that pain by sharing 9 of the most common pitfalls in trading strategy development and testing that can result in overly optimistic backtests:
1. In-sample backtesting
Many strategies require refinement, or model training of some sort. As one example, a regression-based model that seeks to predict future prices might use recent data to build the model. It is perfectly fine to build a model in that manner, but it is not OK to test the model over that same time period. Such models are doomed to succeed.
Don’t trust them.
Solution: Best practices are to build procedures to prevent testing over the same data you train over. As a simple example you might use data from 2007 to train your model, but test over 2008-forward.
By the way, even though it could be called “out-of-sample” testing it is not a good practice to train over later data, say 2014, then test over earlier data, say 2008-2013. This may permit various forms of lookahead bias.
2. Using survivor-biased data
Suppose I told you I have created a fantastic new blood pressure medicine, and that I had tested it using the following protocol:
a. Randomly select 500 subjects
b. Administer my drug to them every day for 5 years
c. Measure their blood pressure each day
At the beginning of the study the average blood pressure of the participants was 160/110, at the end of the study the average BP was 120/80 (significantly lower and better).
Those look like great results, no? What if I told you that 58 of the subjects died during the study? Maybe it was the ones with the high blood pressure that died! This is clearly not an accurate study because it focused on the statistics of survivors at the end of the study.
This same sort of bias is present in backtests that use later lists of stocks (perhaps members of the S&P 500) as the basis for historical evaluations over earlier periods. A common example is to use the current S&P 500 as the universe of stocks for testing a strategy.
Why is this bad? See the two figures below for illustrative examples.
Figure: The green lines show historical performance of stocks that were members of the S&P 500 in 2012. Note that all of these stocks came out of the 2008/2009 downturn very nicely.
Figure: What really happened: If, instead we use the members of the S&P 500 starting in 2008, we find that more than 10% of the listed companies failed.
In our work at Lucena Research, we see an annual 3% to 5% performance “improvement” with strategies using survivor-biased data.
Solution: Find datasets that include historical members of indices, then use those lists to sample from for your strategies.
3. Observing the close & other forms of lookahead bias
In this failure mode, the quant assumes he can observe market closing prices in order to compute an indicator, and then also trade at the close. As an example, one might use closing price/volume to calculate a technical factor used in the strategy, then trade based on that information.
This is a specific example of lookahead bias in which the strategy is allowed to peek a little bit into the future. In my work I have seen time and again that even a slight lookahead bias can provide fantastic (and false) returns.
Other examples of lookahead bias have to do with incorrect registration of data such as earnings reports or news. Assuming for instance that one can trade on the same day earnings are announced even though earnings are usually announced after the close.
Solution: Don’t trade until the open of the next day after information becomes available.
4. Ignoring market impact
The very act of trading affects price. Historical pricing data does not include your trades and is therefore not an accurate representation of the price you would get if you were trading.
Consider the chart below that describes the performance of a real strategy I helped develop. Consider the region A, the first part of the upwardly sloping orange line. This region was the performance of our backtest. The strategy had a Sharpe Ratio over 7.0! Based on the information we had up until that time (the end of A), it looked great so we started trading it.
When we began live trading we saw the real performance illustrated with the green “live” line in region B– essentially flat. The strategy was not working, so we halted trading it after a few weeks. After we stopped trading it, the strategy started performing well again in paper trading (Region C, Arg!).
How can this be? We thought perhaps that the error was in our predictive model, so we backtested again over the “live” area and the backtest showed that same flat area. The only difference between the nice 7.0 Sharpe Ratio sections and the flat section was that we were engaged in the market in the flat region.
What was going on? The answer, very simply, is that by participating in the market we were changing the prices to our disadvantage. We were not modeling market impact in our market simulation. Once we added that feature more accurately, our backtest appropriately showed a flat, no-return result for region A. If we had had that in the first place we probably would never have traded the strategy.
Solution: Be sure to anticipate that price will move against you at every trade. For trades that are a small part of overall volume, a rule of thumb is about 5 bps for S&P 500 stocks and up to 50 bps for more thinly traded stocks. It depends of course on how much of the market your strategy is seeking to trade.
5. Buy $10M of a $1M company
Naïve backtesters will allow a strategy to buy or sell as much of an asset as it likes. This may provide a misleadingly optimistic backtest because large allocations to small companies are allowed.
There often is real alpha in thinly traded stocks, and data mining approaches are likely to find it. Consider for a moment why it seems there is alpha there. The reason is that the big hedge funds aren’t playing there because they can’t execute their strategy with illiquid assets. There are perhaps scraps of alpha to be collected by the little guy, but check to be sure you’re not assuming you can buy $10M of a $1M company.
Solution: Have your backtester limit the strategy’s trading to a percentage of the daily dollar volume of the equity. Another alternative is to filter potential assets to a minimum daily dollar volume.
6. Overfit the model
An overfit model is one that models in-sample data very well. It predicts the data so well that it is likely modeling noise rather than the underlying principle or relationship in the data that you are hoping it will discover.
Here’s a more formal definition of overfitting: As the degrees of freedom of the model increase, overfitting occurs when in-sample prediction error decreases and out-of-sample prediction error increases.
What do we mean by “degrees of freedom?” Degrees of freedom can take many forms, depending on the type of model being created: Number of factors used, number of parameters in a parameterized model and so on.
Solution: Don’t repeatedly “tweak” and “refine” your model using in-sample data. And always compare in-sample error versus out-of-sample error.
7. Trust complex models
Complex models are often overfit models. Simple approaches that arise from a basic idea that makes intuitive sense lead to the best models. A strategy built from a handful of factors combined with simple rules is more likely to be robust and less sensitive to overfitting than a complex model with lots of factors.
Solution: Limit the number of factors considered by a model, use simple logic in combining them.
8. Trusting stateful strategy luck
A stateful strategy is one whose holdings over time depend on which day in history it was started. As an example, if the strategy rapidly accrues assets, it may be quickly fully invested and therefore miss later buying opportunities. If the strategy had started one day later, it’s holdings might be completely different.
Sometimes such strategies’ success vary widely if they are started on a different day. I’ve seen, for instance, a difference in 50% return for the same strategy started on two days in the same week.
Solution: If your strategy is stateful, be sure to test it starting on many difference days. Evaluate the variance of the results across those days. If is large you should be concerned.
9. Data mining fallacy
Even if you avoid all of the pitfalls listed above, if you generate and test enough strategies you’ll eventually find one that works very well in a backtest. However, the quality of the strategy cannot be distinguished from a lucky random stock picker.
How can this pitfall be avoided? It can’t be avoided. However, you can and should forward test before committing significant capital.
Solution: Forward test (paper trade) a strategy before committing capital.
It is best to view backtesting as a method for rejecting strategies, than as a method for validating strategies. One thing is for sure: If it doesn’t work in a backtest, it won’t work in real life. The converse is not true: Just because it works in a backtest does not mean you can expect it to work in live trading.
However, if you avoid the pitfalls listed above, your backtests stand a better chance of more accurately representing real life performance.
Live Webinar: Dr. Balch will present a webinar on this topic on April 24, 2015 at 11AM. You can register to watch the webinar live by following this link.
About the author
Tucker Balch, Ph.D. is a professor of Interactive Computing at Georgia Tech. He is also CTO of Lucena Research, Inc., a financial decision support technology company. You can read more essays by Tucker at http://augmentedtrader.com.
This post is based on the talk of the same title I gave at Quantopian's QuantCon 2015 which commenced at 3.14.15 9:26:54. Do these numbers remind you of something?
A correct backtest of a trading strategy requires accurate historical data. This isn't controversial. Historical data that is full of errors will generate fictitious profits for mean-reverting strategies, since noise in prices is mean-reverting. However, what is lesser known is how perfectly accurate capture of historical prices, if done in a sub-optimal way, can still lead to dangerously inflated backtest results. I will illustrate this with three simple strategies.
CEF Premum Reversion
Patro et al published a paper on trading the mean reversion of closed-end funds’ (CEF) premium. Based on rational analysis, the market value of a CEF should be the same as the net asset value (NAV) of its holdings. So the strategy to exploit any differences is both reasonable and simple: rank all the CEF's by their % difference ("premium") between market value and NAV, and short the quintile with the highest premium and buy the quintile with the lowest (maybe negative) premium. Hold them for a month, and repeat. (You can try this on a daily basis too, since Bloomberg provides daily NAV data.) The Sharpe ratio of this strategy from 1998-2011 is 1.5. Transaction costs are ignored, but shouldn't be significant for a monthly rebalance strategy.
The authors are irreproachable for their use of high quality price data provided by CRSP and monthly fund NAV data from Bloomberg for their backtest. So I was quite confident that I can reproduce their results with the same data from CRSP, and with historical NAV data from Compustat instead. Indeed, here is the cumulative returns chart from my own backtest (click to enlarge):
However, I also know that there is one detail that many traders and academic researchers neglect when they backtest daily strategies for stocks, ETFs, or CEFs. They often use the "consolidated" closing price as the execution price, instead of the "official" (also called "auction" or "primary") closing price. To understand the difference, one has to remember that the US stock market is a network of over 60 "market centers" (see the teaching notes of Prof. Joel Hasbrouck for an excellent review of the US stock market structure). The exact price at which one's order will be executed is highly dependent on the exact market center to which it has been routed. A natural way to execute this CEF strategy is to send a market-on-close (MOC) or limit-on-close (LOC) order near the close, since this is the way we can participate in the closing auction and avoid paying the bid-ask spread. Such orders will be routed to the primary exchange for each stock, ETF, or CEF, and the price it is filled at will be the official/auction/primary price at that exchange. On the other hand, the price that most free data service (such as Yahoo Finance) provides is the consolidated price, which is merely that of the last transaction received by the Securities Information Processor (SIP) from any one of these market centers on or before 4pm ET. There is no reason to believe that one's order will be routed to that particular market center and was executed at that price at all. Unfortunately, the CEF strategy was tested on this consolidated price. So I decide to backtest it again with the official closing price.
Where can we find historical official closing price? Bloomberg provides that, but it is an expensive subscription. CRSP data has conveniently included the last bid and ask that can be used to compute the mid price at 4pm which is a good estimate of the official closing price. This mid price is what I used for a revised backtest. But the CRSP data also doesn't come cheap - I only used it because my academic affiliation allowed me free access. There is, however, an unexpected source that does provide the official closing price at a reasonable rate: QuantGo.com will rent us tick data that has a Cross flag for the closing auction trade. How ironic: the cheapest way to properly backtest a strategy that trades only once a month requires tick data time-stamped at 1 millisecond, with special tags for each trade.
So what is the cumulative returns using the mid price for our backtest?
Opening Gap Reversion
Readers of my book will be familiar with this strategy (Example 4.1): start with the SPX universe, buy the 10 stocks that gapped down most at the open, and short the 10 that gapped up most. Liquidate everything at the close. We can apply various technical or fundamental filters to make this strategy more robust, but the essential driver of the returns is mean-reversion of the overnight gap (i.e. reversion of the return from the previous close to today's open).
We have backtested this strategy using the closing mid price as I recommended above, and including a further 5 bps transaction cost each for the entry and exit trade. The backtest looked wonderful, so we traded it live. Here is the comparison of the backtest vs live cumulative P&L:
Yes, it is still mildly profitable, but nowhere near the profitability of the backtest, or more precisely, walk-forward test. What went wrong? Two things:
• Just like the closing price, we should have used the official/auction/primary open price. Unfortunately CRSP does not provide the opening bid-ask, so we couldn't have estimated the open price from the mid price. QuantGo, though, does provide a Cross flag for the opening auction trade as well.
• To generate the limit on open (LOO) or market on open (MOO) orders suitable for executing this strategy, we need to submit the order using the pre-market quotes before 9:28am ET, based on Nasdaq's rules.
Once again, a strategy that is seemingly low frequency, with just an entry at the open and an exit at the close, actually requires TAQ (ticks and quotes) data to backtest properly.
Lest you think that this requirement for TAQ data for backtesting only applies to mean reversion strategies, we can consider the following futures momentum strategy that can be applied to the gasoline (RB), gold (GC), or various other contracts trading on the NYMEX.
At the end of a trading session (defined as the previous day's open outcry close to today's open outcry close), rank all the trades or quotes in that session. We buy a contract in the next session if the last price is above the 95th percentile, sell it if it drops below the 60th (this serves as a stop loss). Similarly, we short a contract if the last price is below the 5th percentile, and buy cover if it goes above the 40th.
Despite being an intraday strategy, it typically trades only 1 roundtrip a day - a low frequency strategy. We backtested it two ways: with 1-min trade bars (prices are from back-adjusted continuous contracts provided by eSignal), and with best bid-offer (BBO) quotes with 1 ms time stamps (from QuantGo's actual contract prices, not backadjusted).
For all the contracts that we have tested, the 1-ms data produced much worse returns than the 1-min data. The reason is interesting: 1-ms data shows that the strategy exhibits high frequency flip-flops. These are sudden change in the order book (in particular, BBO quotes) that quickly reverts. Some observers have called these flip-flops "mini flash crashes", and they happen as frequently in the futures as in the stock market, and occasionally in the spot Forex market as well. Some people have blamed it on high frequency traders. But I think flip-flops describe the situation better than flash crash, since flash crash implies the sudden disappearance of quotes or liquidity from the order book, while in a flip-flopping situation, new quotes/liquidity above the BBO can suddenly appear and disappear in a few milliseconds, simultaneous with the disappearance and re-appearance of quotes on the opposite side of the order book. Since ours is a momentum strategy, such reversals of course create losses. These losses are very real, and we experienced it in live trading. But these losses are also undetectable if we backtest using 1-min bar data.
Some readers may object: if the 1-min bar backtest shows good profits, why not just trade this live with 1-min bar data and preserve its profit? Let's consider why this doesn't actually allow us to avoid using TAQ data. Note that we were able to avoid the flip-flops using 1-min data only because we were lucky in our backtest - it wasn't because we had some trading rule that prevented our entering or exiting a position when the flip-flops occurred. How then are we to ensure that our luck will continue with live market data? At the very least, we have to test this strategy with many sets of 1-min bar data, and choose the set that shows the worst returns as part of our stress testing. For example, one set may be [9:00:00, 9:01:00, 9:02:00, ...,] and the second set may be [9:00:00.001, 9:01:00.001, 9:02:00.001, ...], etc. This backtest, however, still requires TAQ data, since no historical data vendor I know of provides such multiple sets of time-shifted bars!
As I mentioned above, this type of flip-flops are omnipresent in the stock market as well. This shouldn't be surprising considering that 50% of the stock transaction volume is due to high frequency trading. It is particularly damaging when we are trading spreads, such as the ETF pair EWA vs EWC. A small change in the BBO of a leg may represent a big percentage change in the spread, which itself may be just a few ticks wide. So such flip-flops can frequently trigger orders which are filled at much worse prices than expected.
The three example strategies above illustrates that even when a strategy trades a low frequency, maybe as low as once a month, we often still require high frequency TAQ data to backtest it properly, or even economically. If the strategy trades intraday, even if just once a day, then this requirement becomes all the more important due to the flip-flopping of the order book in the millisecond time frame.
Ernie is the managing member of QTS Capital Management, LLC., a commodity pool operator and trading advisor. Find out more about him at epchan.com.