Data Science is about learning from data, often using Machine Learning and statistics. To do so, we can build statistical models that provide answers to our questions or make predictions based on data we have collected. Ideally, we build the model that most accurately describes our data, makes the best predictions, and provides the answers of interest. Once we have our dream model we just have to figure out how to fit it to data (i.e. do inference). Graphically, this is how I think the process should look like:
Unfortunately, as anyone who has done such a thing can attest, it can be extremely difficult to fit your dream model and requires you to take many short-cuts for mathematical convenience. For example, everyone knows that financial returns are not normally distributed but still, explicitly or implicitly this assumption is still made a lot (e.g. the Sharpe ratio as I show in my talk, but also every time you use a linear regression like when estimating financial alpha and beta). Why? Because it's so convenient to work with! Thus, statistical modeling more often looks like this in reality:
So a lot of times we don't build the models we think best capture our data but rather the models we can make inference on.
I have blogged before about Probabilistic Programming and besides posting the video of a recent talk I gave with accompanying code (see below), I would like to highlight how Probabilistic Programming gets us much closer to the ideal I visualized above. In short, Probabilistic Programming Systems allow you to specific statistical models in code. Once specified in such a way, fitting this model to data (i.e. inference) is completely automatic (if things go well).
Think about that for a second, you're not tied to pre-specified statistical model like a frequentist T-Test that some statistician worked out how to do inference on. Even more important, these statistical models have their own assumptions baked in (like the normality assumption) that all too easily violated if you're not careful.
What about Machine Learning? Comprehensive libraries provide the data scientist with many turn-key algorithms that have very weak assumptions on the actual distribution of the data being modeled. While this blackbox property makes machine learning algorithms applicable to a wide range of problems it also limits the amount of insight that can be gained by applying them.
Here is a recent talk I gave on this topic while on my trip to Singapore. I've posted both the slides and the research NB that implements the models (note that there's two data files in the repo you'll also need).
IEX is currently one of about 40 active ATSs, most of which would be considered dark pools. Collectively, ATSs trade roughly 20% of overall U.S. equity volume. Most of these dark pools operate as continuous order books, much like the lit exchanges (NYSE, Nasdaq, etc.), and most of them are operated by brokers. From a trading perspective, the major differences between continuous dark pools and exchanges are as follows:
Why Trade in a Dark Pool?
There are many reasons why a broker might send an order to a dark pool (keep in mind that most of the major institutional brokers operate their own dark pools - think of all the potential synergies), but I'm going to focus on the reasons why an institutional investor (mutual fund, pension fund, hedge fund, etc.) might want to trade in a dark pool:
In practice, the second reason (price improvement) is probably the only one that's relevant to smaller firms and retail investors.
The standard order type used by institutional investors in dark pools is midpoint peg, an order type managed by the dark pool operator that simply floats with the market at the midpoint of the NBBO. Institutional brokers and investors face two main challenges when trading with midpoint peg orders in dark pools:
In the above chart, the blue line represents the National Best Bid across the exchanges, and the red line represents the National Best Offer. The large midpoint peg buy order floats in the dark pool, executing in pieces in line with the rest of the market throughout the day (presumably smaller-sized sell orders are entering the dark pool at various points, and the large buyer's order has not been fully satisfied until the end of the chart). This is the normal, desirable dark pool experience.
In the second chart, the large buy order only executes in pieces at the local highs immediately before the market moves lower, and as a result its average fill price is substantially worse.
How does this happen? Two main ways:
Stale Quote Arbitrage
We call this latter case a type of structural arbitrage; it only exists due to a technical limitation of the dark pool (the dark pool can't update its prices fast enough). Whereas a broker can build anti-gaming logic into their trading algorithms to try to prevent orders from getting "sniffed out" (e.g. by piecing the order in smaller pieces or with extra-conservative limit prices), when it comes to structural arbitrage on a dark pool or exchange, there's really not much the broker can do other than simply not trading in that market. And if every single dark pool has a similar exposure to this situation, well then, there's not much recourse.
Ok. So everything up until this point has been theoretical. Prior to IEX, my job was to design and program equity execution algorithms at RBC, a large Canadian bank. The first strategy I ever built was a dark pool aggregator, which would take an institutional investor's order (our client) and split it up among several dark pools, shuffling around seeking out the other side of the trade. The resting midpoint orders we sent to dark pools were certainly adversely selected regularly, and we believed the primary cause to be stale quote arbitrage, but we couldn't definitively confirm this belief-- there was no way for us to synchronize the true state of the market with the dark pool's view at the time of the execution. Still, even though we couldn't be 100% certain of the dynamics at play, we figured our team would be able to solve this and other challenges faced by institutional brokers and investors, and so we left to start IEX.
Preventing Structural Arbitrage
At IEX, one of our top priorities right out of the gate was to prevent structural arbitrage on our market; we did not want our technical limitations to expose any of our customers to sub-optimal trades. To specifically address stale quote arbitrage-- the situation where our system is unable to update the prices of resting pegged orders fast enough to prevent them from getting picked off-- we came up with the idea to impose a tiny delay on all inbound orders. We measured that all-in, it takes us a little over 300 microseconds to recognize a price move on the exchanges and update our pegged orders accordingly, so we introduced a delay on inbound orders of 350 microseconds, so that even if a trader could instantaneously recognize a price change in the market and send us an exploratory order at the old price, their order would be delayed for just long enough to ensure that we would know the new price before their order could execute.
This is one of the most common misconceptions about IEX's "speed bump"-- we're not trying to pick winners and losers or equalize everyone's technology. The trader that gets to our entry ports first will get to trade first, so for the majority of trading strategies, we're simply moving the race from our matching engine to another data center. The purpose of the speed bump is simply to ensure that we don't allow trades at stale prices after the market has already moved. A bookie probably shouldn't allow bets to be placed on a race that's already over, even if they have unknowing customers willing to take the losing side of that bet. Similarly, we impose this delay to ensure that no customer can make a trading decision based on more up-to-date market data than we have ourselves, so that we can better enforce the spirit of Reg NMS.
We went live in late October 2013, and for the first several months there was little-to-no adverse selection on IEX. Of course, there will always be some baseline level of short-term adverse selection from market fluctuations and orders getting run over, but we seemed to have solved the stale quote arbitrage problem.
But then suddenly, and quite dramatically, the incidence of adverse selection on IEX rose in mid-2014:
The above chart shows the % of shares added at the midpoint such that the midpoint was at a better price 10 milliseconds later. From March to July of 2014, the incidence of adverse selection of midpoint orders on IEX rose from about 3-4% to about 9-11%. Given our architecture, stale quote arbitrage should have been impossible, so what was happening?
We realized our speed bump was effective at preventing after-the-fact stale quote arbitrage-- that is, no trader can observe a price move in the market and then effect a trade on IEX at the old midpoint. These new incidences of adverse selection, however, were occurring as much as 1 to 2 milliseconds prior to an NBBO update, far enough in advance that our speed bump wasn't relevant.
This was a bit of an "aha" moment for us. We realized that NBBO changes are not always instantaneous events; rather, they are a coordinated series of events that happen across all of the exchanges in rapid succession. In other words, when a liquid stock moves from $9.99 bid/$10.00 offered to $9.98/$9.99, all of the individual buy orders at $9.99 across all of the exchanges are either filled or canceled, and one by one each exchange's best bid drops, and then the offers fill in. The whole process can take a few milliseconds, but while it's in motion, a trader can see the dominoes falling and have a fairly high degree of confidence that the market is moving.
We briefly considered lengthening our speed bump delay to prevent this new type of pseudo-arbitrage, but we felt the order-of-magnitude increase that would be necessary might be too disruptive to normal trading activity. Then we figured we could introduce a new kind of midpoint peg order that monitors for when the market is in transition and, in those moments, becomes less aggressive, and this is exactly what we did.
The New Order Type: Discretionary Peg
We called the new order type discretionary peg, and it went live in November 2014. Discretionary peg is an order type that is willing to trade as aggressively as the midpoint of the NBBO the vast majority of the time, but in moments when IEX observes the market moving in the order's favor, it is only willing to trade on the passive side of the bid/offer. Just like a fast trader conducting this pseudo-arbitrage, we constantly observe the market for a signal that the price of a stock is in transition, and we use this signal to prevent a trade from occurring at the soon-to-be-stale price.
As an aside, I've heard the question several times going all the way back to our RBC days: if we are able to identify profitable signals like this one, why don't we start a proprietary trading shop for ourselves and just print money? There are many reasons why we started IEX instead of going down this path, but from a practical perspective, it's simply not our competitive advantage. Many arbitrage or pseudo-arbitrage strategies are extraordinarily simple from a trading logic perspective-- this stuff isn't rocket science-- and this case is no different. The challenge is that only one trader can successfully take advantage of each arbitrage opportunity, and to win the race consistently requires extremely fast technology and frequent upgrade. I have no reason to believe our team would be nearly as effective in that space.
Protecting our Resting Orders
So given all this, how could we possibly identify that the market is in transition early enough to help our resting order avoid getting picked off? First, keep in mind that we do still have the speed bump working for us, so we don't need to be the single fastest at picking up the signal-- as long as we can identify that the market is transitioning within 350 microseconds of the very fastest trader, we can protect our resting discretionary peg orders. It turns out that 350 microseconds is an enormous head start, and it makes our job a lot easier. Secondly, the downside of a false positive for IEX is smaller than the downside of a false positive for an arbitrage trader: if a trader has a false positive, and they execute a trade anticipating a market move that doesn't actually happen, they now have a position that they will most likely wind up closing at a loss. If IEX, on the other hand, thinks the market is moving, but it doesn't, we just return the order to its normal behavior. There is a small chance that the discretionary peg order may miss a desirable midpoint trade in this tiny window, so false positives are still not a good thing-- they're just not as bad, so we can afford to be a bit more aggressive with our signal than an arbitrage trader.
Ultimately, IEX doesn't need to win the prediction arms race. Of course we will continue to strive to make our signal as precise as possible, but even if the signal is a little bit crude and noisy, as long as we take away the really obvious profitable scenarios, it should make the entire practice much less desirable to conduct.
Here are the results so far:
Whereas midpoint pegged orders have been seeing adverse selection in the 9-11% range on average since last July, discretionary peg orders are down in that 3-4% range that we saw in the early days of IEX. It's important to note that there is a trade-off between using the two order types: midpoint peg orders earn higher priority than discretionary peg inside the spread, and of course, discretionary peg orders naturally face a lower fill rate by avoiding a subset of trades (1-2% lower in practice so far). All-in-all, however, discretionary peg does seem to be a compelling order type for an institutional broker/investor concerned with short-term adverse selection, and we are very happy with the results to date.
In closing, it is the broker's job to navigate the market effectively on behalf of their customers, but if an exchange or a dark pool has a blind spot that allows for structural arbitrage, there isn't much the broker can do. You can't blame a trader for trying to profit off an inefficiency, but we believe that exchanges and ATSs have the responsibility to ensure that they don't have any blind spots.
Dan is a co-founder and quantitative developer at IEX. He is responsible for building and evolving core functionality for the IEX trading venue, namely its matching engine, smart order router and its newest order type: Discretionary Peg. Forbes recognized Dan as one of their 2015 "30 Under 30".
The last two months have been busy as ever at Quantopian. We're in full swing running the Quantopian Open, a paper trading competition where each monthly winner gets to manage a $100,000 brokerage account. The research platform has accepted over 600 people in the beta phase, ramping up the usability and features. More tools and data were added to the backtester and the community has a new look-and-feel. Take a look below at the details of our latest news and releases:
Tools and Features
"I’ve never seen a bad backtest” -- Dimitris Melas, head of research at MSCI.
A backtest is a simulation of a trading strategy used to evaluate how effective the strategy might have been if it were traded historically. Backtestesting is used by hedge funds and other researchers to test strategies before real capital is applied. Backtests are valuable because they enable quants to quickly test and reject trading strategy ideas.
All too often strategies look great in simulation but fail to live up to their promise in live trading. There are a number of reasons for these failures, some of which are beyond the control of a quant developer. But other failures are caused by common, insidious mistakes.
An over optimistic backtest can cause a lot of pain. I’d like to help you avoid that pain by sharing 9 of the most common pitfalls in trading strategy development and testing that can result in overly optimistic backtests:
1. In-sample backtesting
Many strategies require refinement, or model training of some sort. As one example, a regression-based model that seeks to predict future prices might use recent data to build the model. It is perfectly fine to build a model in that manner, but it is not OK to test the model over that same time period. Such models are doomed to succeed.
Don’t trust them.
Solution: Best practices are to build procedures to prevent testing over the same data you train over. As a simple example you might use data from 2007 to train your model, but test over 2008-forward.
By the way, even though it could be called “out-of-sample” testing it is not a good practice to train over later data, say 2014, then test over earlier data, say 2008-2013. This may permit various forms of lookahead bias.
2. Using survivor-biased data
Suppose I told you I have created a fantastic new blood pressure medicine, and that I had tested it using the following protocol:
a. Randomly select 500 subjects
b. Administer my drug to them every day for 5 years
c. Measure their blood pressure each day
At the beginning of the study the average blood pressure of the participants was 160/110, at the end of the study the average BP was 120/80 (significantly lower and better).
Those look like great results, no? What if I told you that 58 of the subjects died during the study? Maybe it was the ones with the high blood pressure that died! This is clearly not an accurate study because it focused on the statistics of survivors at the end of the study.
This same sort of bias is present in backtests that use later lists of stocks (perhaps members of the S&P 500) as the basis for historical evaluations over earlier periods. A common example is to use the current S&P 500 as the universe of stocks for testing a strategy.
Why is this bad? See the two figures below for illustrative examples.
Figure: The green lines show historical performance of stocks that were members of the S&P 500 in 2012. Note that all of these stocks came out of the 2008/2009 downturn very nicely.
Figure: What really happened: If, instead we use the members of the S&P 500 starting in 2008, we find that more than 10% of the listed companies failed.
In our work at Lucena Research, we see an annual 3% to 5% performance “improvement” with strategies using survivor-biased data.
Solution: Find datasets that include historical members of indices, then use those lists to sample from for your strategies.
3. Observing the close & other forms of lookahead bias
In this failure mode, the quant assumes he can observe market closing prices in order to compute an indicator, and then also trade at the close. As an example, one might use closing price/volume to calculate a technical factor used in the strategy, then trade based on that information.
This is a specific example of lookahead bias in which the strategy is allowed to peek a little bit into the future. In my work I have seen time and again that even a slight lookahead bias can provide fantastic (and false) returns.
Other examples of lookahead bias have to do with incorrect registration of data such as earnings reports or news. Assuming for instance that one can trade on the same day earnings are announced even though earnings are usually announced after the close.
Solution: Don’t trade until the open of the next day after information becomes available.
4. Ignoring market impact
The very act of trading affects price. Historical pricing data does not include your trades and is therefore not an accurate representation of the price you would get if you were trading.
Consider the chart below that describes the performance of a real strategy I helped develop. Consider the region A, the first part of the upwardly sloping orange line. This region was the performance of our backtest. The strategy had a Sharpe Ratio over 7.0! Based on the information we had up until that time (the end of A), it looked great so we started trading it.
When we began live trading we saw the real performance illustrated with the green “live” line in region B– essentially flat. The strategy was not working, so we halted trading it after a few weeks. After we stopped trading it, the strategy started performing well again in paper trading (Region C, Arg!).
How can this be? We thought perhaps that the error was in our predictive model, so we backtested again over the “live” area and the backtest showed that same flat area. The only difference between the nice 7.0 Sharpe Ratio sections and the flat section was that we were engaged in the market in the flat region.
What was going on? The answer, very simply, is that by participating in the market we were changing the prices to our disadvantage. We were not modeling market impact in our market simulation. Once we added that feature more accurately, our backtest appropriately showed a flat, no-return result for region A. If we had had that in the first place we probably would never have traded the strategy.
Solution: Be sure to anticipate that price will move against you at every trade. For trades that are a small part of overall volume, a rule of thumb is about 5 bps for S&P 500 stocks and up to 50 bps for more thinly traded stocks. It depends of course on how much of the market your strategy is seeking to trade.
5. Buy $10M of a $1M company
Naïve backtesters will allow a strategy to buy or sell as much of an asset as it likes. This may provide a misleadingly optimistic backtest because large allocations to small companies are allowed.
There often is real alpha in thinly traded stocks, and data mining approaches are likely to find it. Consider for a moment why it seems there is alpha there. The reason is that the big hedge funds aren’t playing there because they can’t execute their strategy with illiquid assets. There are perhaps scraps of alpha to be collected by the little guy, but check to be sure you’re not assuming you can buy $10M of a $1M company.
Solution: Have your backtester limit the strategy’s trading to a percentage of the daily dollar volume of the equity. Another alternative is to filter potential assets to a minimum daily dollar volume.
6. Overfit the model
An overfit model is one that models in-sample data very well. It predicts the data so well that it is likely modeling noise rather than the underlying principle or relationship in the data that you are hoping it will discover.
Here’s a more formal definition of overfitting: As the degrees of freedom of the model increase, overfitting occurs when in-sample prediction error decreases and out-of-sample prediction error increases.
What do we mean by “degrees of freedom?” Degrees of freedom can take many forms, depending on the type of model being created: Number of factors used, number of parameters in a parameterized model and so on.
Solution: Don’t repeatedly “tweak” and “refine” your model using in-sample data. And always compare in-sample error versus out-of-sample error.
7. Trust complex models
Complex models are often overfit models. Simple approaches that arise from a basic idea that makes intuitive sense lead to the best models. A strategy built from a handful of factors combined with simple rules is more likely to be robust and less sensitive to overfitting than a complex model with lots of factors.
Solution: Limit the number of factors considered by a model, use simple logic in combining them.
8. Trusting stateful strategy luck
A stateful strategy is one whose holdings over time depend on which day in history it was started. As an example, if the strategy rapidly accrues assets, it may be quickly fully invested and therefore miss later buying opportunities. If the strategy had started one day later, it’s holdings might be completely different.
Sometimes such strategies’ success vary widely if they are started on a different day. I’ve seen, for instance, a difference in 50% return for the same strategy started on two days in the same week.
Solution: If your strategy is stateful, be sure to test it starting on many difference days. Evaluate the variance of the results across those days. If is large you should be concerned.
9. Data mining fallacy
Even if you avoid all of the pitfalls listed above, if you generate and test enough strategies you’ll eventually find one that works very well in a backtest. However, the quality of the strategy cannot be distinguished from a lucky random stock picker.
How can this pitfall be avoided? It can’t be avoided. However, you can and should forward test before committing significant capital.
Solution: Forward test (paper trade) a strategy before committing capital.
It is best to view backtesting as a method for rejecting strategies, than as a method for validating strategies. One thing is for sure: If it doesn’t work in a backtest, it won’t work in real life. The converse is not true: Just because it works in a backtest does not mean you can expect it to work in live trading.
However, if you avoid the pitfalls listed above, your backtests stand a better chance of more accurately representing real life performance.
Live Webinar: Dr. Balch will present a webinar on this topic on April 24, 2015 at 11AM. You can register to watch the webinar live by following this link.
About the author
Tucker Balch, Ph.D. is a professor of Interactive Computing at Georgia Tech. He is also CTO of Lucena Research, Inc., a financial decision support technology company. You can read more essays by Tucker at http://augmentedtrader.com.
This post is based on the talk of the same title I gave at Quantopian's QuantCon 2015 which commenced at 3.14.15 9:26:54. Do these numbers remind you of something?
A correct backtest of a trading strategy requires accurate historical data. This isn't controversial. Historical data that is full of errors will generate fictitious profits for mean-reverting strategies, since noise in prices is mean-reverting. However, what is lesser known is how perfectly accurate capture of historical prices, if done in a sub-optimal way, can still lead to dangerously inflated backtest results. I will illustrate this with three simple strategies.
CEF Premum Reversion
Patro et al published a paper on trading the mean reversion of closed-end funds’ (CEF) premium. Based on rational analysis, the market value of a CEF should be the same as the net asset value (NAV) of its holdings. So the strategy to exploit any differences is both reasonable and simple: rank all the CEF's by their % difference ("premium") between market value and NAV, and short the quintile with the highest premium and buy the quintile with the lowest (maybe negative) premium. Hold them for a month, and repeat. (You can try this on a daily basis too, since Bloomberg provides daily NAV data.) The Sharpe ratio of this strategy from 1998-2011 is 1.5. Transaction costs are ignored, but shouldn't be significant for a monthly rebalance strategy.
The authors are irreproachable for their use of high quality price data provided by CRSP and monthly fund NAV data from Bloomberg for their backtest. So I was quite confident that I can reproduce their results with the same data from CRSP, and with historical NAV data from Compustat instead. Indeed, here is the cumulative returns chart from my own backtest (click to enlarge):
However, I also know that there is one detail that many traders and academic researchers neglect when they backtest daily strategies for stocks, ETFs, or CEFs. They often use the "consolidated" closing price as the execution price, instead of the "official" (also called "auction" or "primary") closing price. To understand the difference, one has to remember that the US stock market is a network of over 60 "market centers" (see the teaching notes of Prof. Joel Hasbrouck for an excellent review of the US stock market structure). The exact price at which one's order will be executed is highly dependent on the exact market center to which it has been routed. A natural way to execute this CEF strategy is to send a market-on-close (MOC) or limit-on-close (LOC) order near the close, since this is the way we can participate in the closing auction and avoid paying the bid-ask spread. Such orders will be routed to the primary exchange for each stock, ETF, or CEF, and the price it is filled at will be the official/auction/primary price at that exchange. On the other hand, the price that most free data service (such as Yahoo Finance) provides is the consolidated price, which is merely that of the last transaction received by the Securities Information Processor (SIP) from any one of these market centers on or before 4pm ET. There is no reason to believe that one's order will be routed to that particular market center and was executed at that price at all. Unfortunately, the CEF strategy was tested on this consolidated price. So I decide to backtest it again with the official closing price.
Where can we find historical official closing price? Bloomberg provides that, but it is an expensive subscription. CRSP data has conveniently included the last bid and ask that can be used to compute the mid price at 4pm which is a good estimate of the official closing price. This mid price is what I used for a revised backtest. But the CRSP data also doesn't come cheap - I only used it because my academic affiliation allowed me free access. There is, however, an unexpected source that does provide the official closing price at a reasonable rate: QuantGo.com will rent us tick data that has a Cross flag for the closing auction trade. How ironic: the cheapest way to properly backtest a strategy that trades only once a month requires tick data time-stamped at 1 millisecond, with special tags for each trade.
So what is the cumulative returns using the mid price for our backtest?
Opening Gap Reversion
Readers of my book will be familiar with this strategy (Example 4.1): start with the SPX universe, buy the 10 stocks that gapped down most at the open, and short the 10 that gapped up most. Liquidate everything at the close. We can apply various technical or fundamental filters to make this strategy more robust, but the essential driver of the returns is mean-reversion of the overnight gap (i.e. reversion of the return from the previous close to today's open).
We have backtested this strategy using the closing mid price as I recommended above, and including a further 5 bps transaction cost each for the entry and exit trade. The backtest looked wonderful, so we traded it live. Here is the comparison of the backtest vs live cumulative P&L:
Yes, it is still mildly profitable, but nowhere near the profitability of the backtest, or more precisely, walk-forward test. What went wrong? Two things:
• Just like the closing price, we should have used the official/auction/primary open price. Unfortunately CRSP does not provide the opening bid-ask, so we couldn't have estimated the open price from the mid price. QuantGo, though, does provide a Cross flag for the opening auction trade as well.
• To generate the limit on open (LOO) or market on open (MOO) orders suitable for executing this strategy, we need to submit the order using the pre-market quotes before 9:28am ET, based on Nasdaq's rules.
Once again, a strategy that is seemingly low frequency, with just an entry at the open and an exit at the close, actually requires TAQ (ticks and quotes) data to backtest properly.
Lest you think that this requirement for TAQ data for backtesting only applies to mean reversion strategies, we can consider the following futures momentum strategy that can be applied to the gasoline (RB), gold (GC), or various other contracts trading on the NYMEX.
At the end of a trading session (defined as the previous day's open outcry close to today's open outcry close), rank all the trades or quotes in that session. We buy a contract in the next session if the last price is above the 95th percentile, sell it if it drops below the 60th (this serves as a stop loss). Similarly, we short a contract if the last price is below the 5th percentile, and buy cover if it goes above the 40th.
Despite being an intraday strategy, it typically trades only 1 roundtrip a day - a low frequency strategy. We backtested it two ways: with 1-min trade bars (prices are from back-adjusted continuous contracts provided by eSignal), and with best bid-offer (BBO) quotes with 1 ms time stamps (from QuantGo's actual contract prices, not backadjusted).
For all the contracts that we have tested, the 1-ms data produced much worse returns than the 1-min data. The reason is interesting: 1-ms data shows that the strategy exhibits high frequency flip-flops. These are sudden change in the order book (in particular, BBO quotes) that quickly reverts. Some observers have called these flip-flops "mini flash crashes", and they happen as frequently in the futures as in the stock market, and occasionally in the spot Forex market as well. Some people have blamed it on high frequency traders. But I think flip-flops describe the situation better than flash crash, since flash crash implies the sudden disappearance of quotes or liquidity from the order book, while in a flip-flopping situation, new quotes/liquidity above the BBO can suddenly appear and disappear in a few milliseconds, simultaneous with the disappearance and re-appearance of quotes on the opposite side of the order book. Since ours is a momentum strategy, such reversals of course create losses. These losses are very real, and we experienced it in live trading. But these losses are also undetectable if we backtest using 1-min bar data.
Some readers may object: if the 1-min bar backtest shows good profits, why not just trade this live with 1-min bar data and preserve its profit? Let's consider why this doesn't actually allow us to avoid using TAQ data. Note that we were able to avoid the flip-flops using 1-min data only because we were lucky in our backtest - it wasn't because we had some trading rule that prevented our entering or exiting a position when the flip-flops occurred. How then are we to ensure that our luck will continue with live market data? At the very least, we have to test this strategy with many sets of 1-min bar data, and choose the set that shows the worst returns as part of our stress testing. For example, one set may be [9:00:00, 9:01:00, 9:02:00, ...,] and the second set may be [9:00:00.001, 9:01:00.001, 9:02:00.001, ...], etc. This backtest, however, still requires TAQ data, since no historical data vendor I know of provides such multiple sets of time-shifted bars!
As I mentioned above, this type of flip-flops are omnipresent in the stock market as well. This shouldn't be surprising considering that 50% of the stock transaction volume is due to high frequency trading. It is particularly damaging when we are trading spreads, such as the ETF pair EWA vs EWC. A small change in the BBO of a leg may represent a big percentage change in the spread, which itself may be just a few ticks wide. So such flip-flops can frequently trigger orders which are filled at much worse prices than expected.
The three example strategies above illustrates that even when a strategy trades a low frequency, maybe as low as once a month, we often still require high frequency TAQ data to backtest it properly, or even economically. If the strategy trades intraday, even if just once a day, then this requirement becomes all the more important due to the flip-flopping of the order book in the millisecond time frame.
Ernie is the managing member of QTS Capital Management, LLC., a commodity pool operator and trading advisor. Find out more about him at epchan.com.
At Quantopian, we were having a flip conversation over instant messaging about some difficult to understand mathematical models that some quants use. One of the engineers on the team joked "fancier math must mean more alpha".
It got some laughs and we moved on with our day. But it got me to thinking about this project I've been working on.
Since we launched access to Morningstar fundamental data in our development environment, we've been working on some sample algorithms that use the fundamental data.
More easily than anywhere else, a Quantopian user can create trading algorithm that embraces a value investing approach. The fundamental data provides 600+ metrics on companies like EBIT, capex, market cap, P/E Ratio and many more.
We wanted to try out some of these value investing concepts ourselves.
In beginning to research value investing, we quickly learned of both Joel Greenblatt's Magic Formula and the Acquirer's Multiple, advocated by Tobias Carlisle. Carlisle has written two books about value investing (with quant twists) and even made an appearance in our forums once).
I emailed Tobias for advice on how to get started with value investing inside Quantopian. He suggested we try duplicating some of his findings, specifically that the Magic Formula, simple as it might be, could be improved upon by making it even simpler. His position is that a single ratio, the Acquirer's Multiple, can outperform Greenblatt's approach.
Now in the grand scheme of things, the Magic Formula is pretty simple itself. It is composed of two ratios and those two ratios are used to screen stocks for purchase. But this seemed like a good way to understand what our customers were trying to start to do with Quantopian and fundamentals. So we took Tobias up on his idea.
What would we learn if we tried to implement these definitively "non-fancy" math algorithms? Today, we published our progress over in the Quantopian forum. Check it out.
Soon, you can start trading your algorithms with your E*TRADE brokerage account (1).
What does this mean for you?
For those who are curious on seeing it in action, I presented a quick demo of an algorithm trading with E*TRADE here:
Do you want a chance to trade real money With E*TRADE in our pilot program?
How much does this cost?
Quantopian does not charge for live trading integration, though we may need to charge a monthly fee at some point in the future. Your brokerage agreement with E*TRADE, including fees and commissions, will be subject to the same terms and rules as your existing E*TRADE brokerage account. (1)
(1) E*TRADE Financial Corporation is not affiliated with and does not endorse or recommend Quantopian, Inc. E*TRADE provides execution and clearing services to customers who integrate their E*TRADE account with their Quantopian account. For more information regarding E*TRADE Financial Corporation, please visit www.etrade.com.
P.S. Attached is a sample algorithm that's geared and ready for live trading. Tweak it to use a different list of stocks or create your own algorithm from scratch.
tl;dr: In this blog post I run several simulations using Zipline to show what happens when we add more capital to an algorithm, and keep adding capital until we see capacity limitations.
Assume that you have developed an algorithm that looks very promising in a backtest. You might then want to test this algorithm in the real market, but since you are not yet confident enough in your algorithm, you probably start out investing a small amount of money at first. If the algorithm actually performs as well as you hoped for you will probably want to ramp it up and put more money behind it. If you're really certain you might even convince others to also invest using your strategy for a fee. The problem with your plan to ramp up is capacity. What if your newly-increased investment clobbers all of the market liquidity, and your edge disappears? Unfortunately, we just can't assume that our orders are always going to be filled at a certain price, particularly if we are trading in low-cap stocks.
It is very easy to fool ourselves with a good backtest performance; that backtest might not hold up the ultimate test against reality. Accurate backtesting can be very tricky as various levels of complexity are involved (see http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2308659 for a recent paper on backtest overfitting). At the most basic level, failing to account for order delays, commission, the influence of our own orders on the market (i.e. slippage) will make our strategy look much better than in reality it would be. Zipline -- our open-source backtester that's also used on Quantopian -- by default accounts for all these complexities.
To shed some light on what happens when we trade larger and larger quantities I ran some simulations using Zipline.
The algorithm we will be using is the OLMAR algorithm. At the core, the algorithm assumes mean-reversion and rebalances its portfolio to add more weight to stocks that under-performed and less weight to stocks that over-performed recently. The details of the strategy are not critical here but note that the algorithm always aims to be fully invested and is long-only.
I next ran the algorithm using different amounts of starting capital and either a portfolio of large cap stocks (including $SBUX and $CERN with an average daily trading volume of $127,316,155) and a portfolio of small cap stocks (including $HMSY and $BWLD with an average daily trading volume of $11,952,436).
You can view the notebook with the full code here to run the simulations and generate the plots here.
The plot below shows the cumulative returns over time of the two portfolios under different starting capital levels.
If there were no capacity limitations whatsoever, we'd expect all lines to be on top of each other. But note how with more and more starting capital the lines start to diverge more and more despite running the identical strategy in every simulation. Why is that the case?
Specifically, there seem to be two patterns emerging. With low amounts of starting capital we see performance to decrease in a step-wise function. The return pattern is the same, but shifted downward. Secondly, after a certain point of capital (starting at 5 million $ for the low cap stocks) the algorithm seems to stop working all-together and simply burns money.
The two mechanisms at play here are:
The first one is called slippage and describes that our own orders influence the price. For example, buying a large quantity will drive the price up. This can be seen by taking a closer look at the fill price of a single transaction under different starting capitals:
As you can see, as our order sizes increase, the fill-price also increases. This is a property of the Volume Shared slippage model used by Zipline. Note that the actual numbers of price impact are subject to how we parameterize the slippage model.
2. Capacity limitations
The second one is that there is not enough liquidity in the market to fill our orders in a reasonable time. You can already see above that at most we can buy 48900 shares on that day.
As you can see in the plot below, the orders of the algorithm get filled to a smaller and smaller percentage because the order sizes get so large that we start to run into capacity limitations.
Moreover, this is dependent on the volume of the specific stock, see e.g. $IRBT where orders can not get filled at a capital of more than $1,000,000. Note also, that this is where our algorithm's performance starts to completely change, as can be seen in figure 1 above. In summary, running into capacity limitations for the smallest volume stock is enough to cause our algorithm to behave unexpectedly.
Being able to estimate the capacity of a trading algorithm is very useful when trying to gauge how much capital to put behind a certain strategy. Certainly large hedge-funds have this problem. But also small scale investors can run into this problem when investing in small-cap stocks as demonstrated above.
Running various backtests at different capacities allows us to estimate the capacity of a given algorithm and see how it is affected (subject to how good our slippage model is) but we could also look at the volume of the individual stocks being traded.
One solution to scaling up a strategy is to slow it down so that it has more time to spread out its orders over time and decrease its own market impact. However, by slowing down a strategy you are also diluting your prediction. If you predict that a stock will rise a few hours after a certain event (e.g. earning announcements) you need to act as fast as possible. This sounds very much like an optimization problem where one is trying to find the sweet spot between maximizing speed and minimizing market impact.
What are your experiences with slippage? Leave a comment below!
You can find and clone the IPython notebook that was used to generate these simulations here.
And, of course, the third contest starts on April 1st! (Get your entry in before 9:30AM EDT to qualify).
We've learned quite a bit since we kicked this off. This webinar covers what we've learned about scoring, what tools our community needs, cheating, and more.
According to Credit Suisse’s Gender 3000 report, at the end of 2013, women accounted for 12.9% of top management in 3000 companies across 40 countries. Additionally, since 2009, companies with women as 25-50% of their management team returned 22-29%.
If companies with women in management outperform, what would happen if you invested in women-led companies?
At QuantCon 2015, I explained my analysis and dug into some of the questions I've been asked about in the past weeks. You can watch my talk here:
As I explain at the end of the talk, there are many things I could do to explore this further. The truth is, I'm a product manager with a tool to build. My goal is to get this algo live trading, so I can understand that experience from the perspective of my users.
It's likely I'll explore some of the interesting alternatives to this study, but unlikely I'll do them all. If you would like to take it further, my notebook can be viewed here (and my process in detail here) and all of the data I used can be downloaded here. All I ask is that you share your findings back.