This is a short overview of common types of quantitative finance algorithms that are traded today. Of course, this is only an overview, and not comprehensive! Let me know if you think there are other algo types I should cover.
It’s been about a month now since Quantopian opened live trading up to a public/private beta – we figured it’s about time to share an update on how things are going.
First, I’d like to say thank you to all the Quantopian community members who have joined the pilot and taken the time to provide invaluable feedback, insight and encouragement. The community is, without a doubt, the most amazing part of Quantopian – we couldn’t do this without you, and it is a privilege to work with such talented quants, hackers, scientists and analysts on a daily basis.
I’d like to share two of the key metrics we are starting to track for live trading:
80 – Number of Quantopians who have connected their brokerage account.
20 – Number of live (IB-backed) algorithms running today.
I’m delighted to report the first real dollars directed by Quantopian-hosted algorithms. Making the leap from simulation to real trading is a huge deal. I’m incredibly grateful that our beta group has trusted us with their capital. I’d like to congratulate each pilot trader who has launched a real-money algo and claimed their free lifetime subscription so far — the rest of you have until March 31st to join that club.
We have also been listening closely to conversations in the larger community about live trading– it is great to see spirited and open discussions like this one unfold and evolve on the forums. My takeaway is that the key questions about live trading on Quantopian can be divided into three thematic buckets: Trust, Technology and Trading capital.
Trust: Do I trust my research and my code enough to trade? Do I trust Quantopian to protect my IP and place trades as I expect?
Technology: Are all the nuts and bolts required to implement my algorithm the way I need it in place?
Trading capital: Do I have the trading capital and the risk tolerance to live trade my algorithm?
The commonalities across these three themes are that first, the answers are personal, rather than universal and second, the answers are going to keep changing (hopefully from ‘Not yet.’ to ‘Yes!’) as we build the platform out together.
Finally I wanted to share a screenshot of my own live money algo performance (we shared the code for this a few weeks ago here). This is a conservative, market-tracking strategy that holds an equal weighted portfolio of 9 sector ETFs and rebalances as often as daily. I’ve been running this algo live since Jan 23rd when I started with a capital base of just under $30,000, I grabbed this update after the market closed yesterday.
Last week I gave a quant finance meet-up talk at the Hacker Dojo in Mountain View, CA. The format was inspired by some analysis I did on the types of algorithms shared and cloned in the Quantopian community – initially I wanted to ask: What are the most popular strategies coded up on Quantopian? To answer this question I ranked all public forum posts three ways, first on number of replies, second on number of views, and third on number of times cloned. I averaged these scores and re-ranked the list to come up with the top 25 ‘Most popular posts of all time’. (NB: I did not do any correction for the date of the original post, so the amount of time the thread has been alive has not been normalized.)
|Combo Rank||Post||Reply Count||View Count||Clone Count|
|1||Google Search Terms predict market movements||64||32121||821|
|2||OLMAR implementation – fixed bug||64||26216||701|
|3||Easy Volatility Investing by Tony Cooper @ Double-Digit Numerics||57||15211||846|
|4||discuss the sample algorithm||16||18701||2930|
|5||Global Minimum Variance Portfolio||28||10230||702|
|6||ML – Stochastic Gradient Descent Method Using Hinge Loss Function||10||20421||973|
|7||Mebane Faber Relative Strength Strategy with MA Rule||22||11199||622|
|8||OLMAR w/ NASDAQ 100 & dollar-volume||31||7766||701|
|9||Using the CNN Fear & Greed Index as a trading signal||22||9914||367|
|10||New Sample Algorithm||33||8336||328|
|11||Bollinger Bands With Trading||18||8390||566|
|12||Brent/WTI Spread Fetcher Example||17||10892||327|
|13||Ernie Chan’s Gold Pairs Trade||15||10420||329|
|14||Ranking and Trading on “Days to Cover”||4||24976||384|
|15||Determining price direction using exponential and log-normal distributions||9||9781||624|
|16||Time to “sell in may and go away”?||27||8231||263|
|18||Simple Mean Reversion Strategy||6||11861||275|
|19||Neural Network that tests for mean-reversion or momentum trending||4||10101||407|
|21||Using weather as a trading signal||6||11958||199|
|22||Global market rotation strategy||53||7629||95|
|23||trading earnings surprises with Estimize data||34||7506||130|
|24||Trading Strategy: Mean-reversion||13||8252||216|
|25||Turtle Trading Strategy||11||8012||318|
Starting from this list, I worked backwards and used examples from the Quantopian community to introduce 5 basic quant strategy types: Mean Reversion, Momentum, Value, Sentiment and Seasonality. While this list is not technically ‘mutually exclusive and collectively exhaustive’, it covers a large fraction of intraday to lower frequency quant strategies and provides a good overview of the way equity focused quants think about predicting market prices. I went back to my Top 25 list and categorized each algo into one of these five buckets and then created this pie chart based on the aggregated number of views for each strategy type.
There are a number of interesting conclusions to be drawn from this initial overview of community activity. Perhaps the most obvious and predictable of these is that price based strategies are currently in the lead by a large margin – due, I expect, to the easy access to minute-level equity pricing and the accessibility of the logic for momentum and mean-reversion. Indeed there were no value-based strategies that made their way into the Top 25 – which in my view represents a key opportunity space right now.
More subtle and, from my admittedly biased point of view, more compelling is the diversity and quality of content and collaboration in the public sphere. Having joined the Quantopian team from a large corporate setting working with a small group of institutional clients, seeing that the Top 25 algos have been cloned over 13,000 times, an average of over 500 clones per strategy is… well it’s pretty damn cool.
Below you can find the slide deck from my presentation:
If you have a brokerage account with Interactive Brokers (IB), you can integrate Quantopian with your IB account and start paper trading today.
(If you don’t have an IB account, that’s OK. You can still paper trade with Quantopian! And, if you’d also like to have an IB account, you can create an account on their website.)
Do You Want to Trade With Real Money, Not Paper?
Our pilot program for real money trading is still in private beta. If you haven’t requested a spot in our real money pilot program, you can do that by filling out this short form.
How Much Does This Cost?
Paper trading and live trading through IB are both free while we run this beta program. In the future we plan to charge a flat monthly fee per live algorithm.
We also have a special offer. If you get your algorithm up and running with real money by 3/31/14, we’ll give you a lifetime free subscription*.
Live Trading Pilot Progam
Our pilot live trading program has been growing rapidly in the last few weeks, and we’re excited to grow the program even more by opening up paper trading to everyone.
We think that live trading on Quantopian is going to change the way you trade forever.
Quantopian helps you focus on the big picture and create new ideas, instead of operating a manual, tedious system.
* Free, life-time subscription offer is for one algorithm, up to $100,000 in account balance. Additional algorithms and larger account balances are not covered by this offer.
P.S. Attached is a sample algorithm that’s geared for live trading. It takes a list of stocks and rebalances them every day. You can paper trade it today:
Recently I was asked to comment on a piece in the Economist about the rise and supposed fall of quant investing in the last ten years.
The thesis of the article, which equates ‘quant’ with trend following or technical analysis, seems to be: See, those pointy-headed geeks aren’t as smart as they think they are! Sure, they racked up outsized returns in the mid-2000s and the market downturn of ’08. But now it’s different, the market has changed, their tricks are broken, and since it’s impossible to understand what quants are really doing anyway, there’s also no way to predict whether their techniques will ever work again.
The logic of this argument is dubious at best; it is at once overly simplistic – relegating the entirety of data-driven systematic investment to the trend-following label – while at the same time providing no convincing counterpoint or real basis for comparison of the relative performance of systematic versus discretionary investing, or ‘stock picking’ over some meaningful historical time frame (you be the judge, the full list of comparable trailing performance for all HFRI funds is available here and a more in depth analysis on HF underperformance in general here). In short, the piece suffers from a clear case of stumbling into a “complexity valley”. Former GETCO trader Nancy Hua gave a nice explanation of this phenomenon as it relates to quant finance in a recent Quora post:
“[Quant] suffers from a complexity valley: If people don’t know anything about it, when they hear about it in passing, they don’t understand it and they don’t think much of it; if people think about it a medium amount and read news articles about it, when they are armchair philosophizing they hate it; when people actively participate in algorithmic trading and electronic markets, they get obsessed with it and want to integrate it with every aspect of their existing trading.”
Does the Economist really intend to take the position that we’ve given the whole ‘computers in investing’ thing a shot and it just didn’t really work out? This data-driven analysis stuff is just not as satisfying as a good ‘ole fashioned hunch! The Economist is one of my favorite magazines, full of bold analysis and expertise. This particular article isn’t to The Economist’s usual standard.
Far from being a passing fad – I’d argue that the technological stars are aligning for systematic investing (done right*) to bring a massive wave of disintermediation to the financial industry. As individual investors reach the other side of this particular complexity valley and recognize the accessibility and economy of automated investing – they are going to wonder why 2 and 20 on an ONGOING BASIS ever seemed even remotely reasonable for a job that, thought through once can absolutely be – indeed already IS — run day to day by computers.
A natural outgrowth of what we do at Quantopian is that our users tend to be technically sophisticated and love data. Therefore, when we have a significant event, such as the recent security breach, we are as open and transparent as possible about what transpired. In that spirit, we present this detailed analysis of the security breach and what we’ve since done to strengthen our site security.
In presenting such an analysis, there is a fine line between between being sufficiently open and transparent with our users and with the maintainers of other web sites, who might benefit from our lessons learned, and providing information that could aid future attacks. While we don’t believe in relying on security by obscurity, we also don’t think it’s a good idea to tell the bad guys exactly where to focus their efforts. Therefore, some details have been omitted.
Essential to our mission as an advanced algorithmic trading platform is the fact that our servers execute arbitrary Python code written by our users. Therefore, while we share with every other web application the requirement to code our application securely, we have an additional, challenging requirement most sites don’t: preventing our users’ code from compromising our security. This makes security more challenging and also makes our site a more attractive target.
We run users’ algorithms in a sandbox which is isolated from the rest of our application in numerous ways, including:
The algorithm sandbox obviously cannot be entirely isolated from the rest of our application, because we need to send pricing and universe data to the algorithm and receive in return the logging data and results that it generates.
Malicious users attempt on a regular basis to break out of our algorithm sandbox. We actively monitor them, evaluate them on a case-by-case basis, and take additional steps when necessary.
On the afternoon of Thursday, November 14, our monitoring systems alerted us to the fact that an attacker was attempting to escape from the sandbox using a technique we knew about and had already blocked. Many other users had tried the same technique unsuccessfully, so we weren’t particularly worried.
However, two things about this particular attacker surprised us. First of all, he was more persistent than most; he tried pretty much every conceivable variety of this particular attack, when most attackers give up after it fails a few times. Second, because of his persistence, he actually found a minor chink in our armor: he was able to get a peek at some internals of our sandbox due to a typographical error in one of our source files.
The information he was able to retrieve was relatively inconsequential, but we we were obviously alarmed by the unintended exposure of that data. We immediately blocked the attacker’s access to the site, tracked down the root cause of the vulnerability, fixed it, and released the fix.
Shortly after we blocked the attacker, he circumvented the block, returned to the site, and kept working on the attack, so we blocked him again. After that, he tried unsuccessfully for a while to get back into the site, and then went away. We thought that was the end of it, but unfortunately we were wrong.
On Friday morning, the attacker returned. He tried several techniques unsuccessfully, but then he found one that worked, a method for partially circumventing the limitations on Python modules accessible from user algorithm code. We had known about this particular vulnerability before his attack, and it was on our roadmap to fix soon, but we hadn’t yet done so because we thought that it was limited in scope and it was unlikely that anyone would find it. With 20/20 hindsight, we were clearly wrong.
After detecting the attacker’s return, we began to analyze what he had been able to accomplish, and it quickly became clear that the breach was significant. We shut down the application, notified our users on our blog, and set to completing our analysis to determine the precise magnitude of the compromise and what needed to be done to eliminate the vulnerabilities that had enabled it.
Within an hour of detecting the compromise, we were able to identify the root cause and a remediation plan which we immediately began implementing. Within a few hours of detecting the compromise, we were able to confirm that the attacker had not accessed any user data.
We left the application shut down all day Friday and Saturday and most of the day Sunday while we implemented and tested the enhancements needed to prevent this kind of attack in the future. After developing and testing the fixes, we brought the site back up at around 8:00pm on Sunday night.
The attacker made several more unsuccessful attempts to compromise the site on Sunday night.
On Monday morning, we had a smooth market open for our live traders and paper traders.
We continue to aggressively monitor the application.
Before this breach, access to Python modules and methods within using algorithm code was limited by three different mechanisms:
The flaw in this logic, which the attacker discovered, is that if a whitelisted module imports a module that is not whitelisted, then that module is accessible as an attribute of the whitelisted module’s object. For example:
>>> import pytz >>> import sys >>> print sys <module 'sys' (built-in)> >>> print pytz.sys <module 'sys' (built-in)> >>>
We couldn’t add every single module name to the keyword blacklist, since that would have prevented users from using too many variable names. And even if we had been able to do that, a module could also import a blacklisted module under a different name, e.g., “import sys as sysmodule”, and we would have had to fully audit all whitelisted modules for such references, as well as re-auditing them each time we upgraded our module versions, which wasn’t practical.
The attacker accessed the sys module through another, whitelisted module and used it to gain access to other Python modules containing sensitive data. Fortunately, however, he did not access any user data. He also didn’t get our data encryption keys, which we guard extremely carefully for obvious reasons, so even if he had accessed user data he would not have been able to decrypt it.
We’ve made a number of changes to our application to eliminate this particular vulnerability and mitigate the risk of other vulnerabilities as well. These include:
In closing, we first and foremost want to reiterate that no user data were compromised during this incident.
We take very seriously our responsibility to safeguard our members’ intellectual property. Security is an ongoing process, not a one-time thing, and we will continue to evolve our practices to stay current with the state of the art. We believe the best way to earn your trust is to be open and transparent, and we will continue to do that even when we are sharing unpleasant news.
If you have any questions or concerns, please let us know. We always reply to email received at firstname.lastname@example.org. We monitor email@example.com for emails concerning our security. You are always welcome to reach me personally at firstname.lastname@example.org.
Vice President of Operations
For those of you who weren’t able to join us in NYC last month here are some highlight clips from our event.
We kicked off with an intro from Fawce. First here’s the general overview including a look at the live trading dashboard:
Fawce on the two most common questions people ask him about Quantopian:
Gary Chan gave our keynote talk – in this clip he explains his motivation for sharing his work in a public meetup forum:
I really liked the examples Gary used for what his definitions of Easy vs. Hard are. Spoiler alert: Getting into Harvard is hard. Opening a successful restaurant is hard. Algorithmic trading is (by comparison) easy.
Gary walks through his trading infrastructure and toolkit:
Finally Gary shares the results of his actual backtests:
You can view the full slide deck from Gary’s presentation here:
Probabilistic Programming (aka Bayesian statistics) allows flexible construction of statistical models to gain insight from data. Estimation of best fitting parameter values — as well as uncertainty in these estimations — can be automated by sampling algorithms such as Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has already lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science. In this blog post I will highlight what Probabilistic Programming can offer for algorithmic traders.
I recently gave a presentation about Bayesian Data Analysis in PyMC3 at PyData NYC’13 with a special focus on financial applications which should give a good primer — check it out here:
Bayesian statistics has many benefits, some of which I discuss in my talk. Most relevant to Quantitative Finance, however, is the fact that you can very flexibly model latent, unobservable processes and how they relate to observable events. As an example, we could model fear of investors. We can’t measure it directly but it certainly has a bearing on market behavior and the stock price.
Bayes’ formula allows us to then infer backwards: given the observable data (e.g. the stock price), what is the probability of the latent construct (e.g. investor fear)?
One elegant example of how Probabilistic Programming can infer unobservable quantities of the stock market is the stochastic volatility model. Volatility is an important concept of quantitative finance as it relates to risk. Unfortunately, as Tony Cooper reminds us:
“Volatility is a strange thing – it exists but you can’t measure it.”
If we can’t measure something, the next best thing we can do is to try and model it. One way to do this is in a probabilistic framework is the concept of stochastic volatility: If we assume that returns are normally distributed, the volatility would be captured as the standard deviation of the normal distribution. Thus, the standard deviation gives rise to stochastic volatility. Intuitively we would assume that the standard deviation is high during times of market turmoil like the 2008 crash.
So the trivial thing to do would be to look at the rolling standard deviation of returns. But this is unsatisfying for multiple reasons:
These properties are outlined in the plot below. The larger the window size, the more lag, the smaller the window size, the more unstable the estimate gets.
As we will see, the stochastic volatility model does a better job at all of them. But before we look at that we need to establish one more insight into the nature of volatility: it tends to cluster. This is nicely demonstrated by looking at the returns of the S&P 500 above. As you can see, during the 2008 financial crisis there is a lot volatility in the stock market (huge positive and negative daily returns) that gradually decreases over time.
So how do we model this clustering property? The Stochastic Volatility model assumes that that the standard-deviation of the returns follow a random-walk process. You can read the Wikipedia article I linked but essentially this process allows for slow, gradual changes over time.
What is interesting is that we can model the standard deviation itself to follow a random walk. Intuitively, we allow standard deviation to change over time but only ever so slightly at each time-point. For the mathematical and implementational details of the model applied see this IPython notebook which uses PyMC3 (a new, flexible probabilistic programming framework for Python).
The plot below shows the latent volatility (in orange) inferred from the model based on the market data (in mint). The orange lines represent the standard deviation of the Normal distribution we assume for the daily returns.
As you can see, the estimated volatility is much more robust. Moreover, all parameters are estimated from the data while we would have to find a reasonable window length for the rolling standard deviation ourselves. Finally, we do not just get a single estimate as with the rolling standard deviation but rather many solutions that are likely. This provides us with a measure of the uncertainty in our estimates and is represented by the width of the orange line above.
In summary, Bayesian statistics and Probabilistic Programming is an extremely powerful framework for Quantitative Finance as it provides:
We are currently working on making PyMC3 available on Quantopian to allow usage of this type of models — so sign up and start getting familiar with our platform, discuss on the forums, and follow @Quantopian and me on Twitter. If you are interested in Probabilistic Programming I also sometimes blog about it here.
Last Thursday Quantopian hosted over a hundred quants for the NYC Algorithmic Trading Meetup on Manhattan’s lower East Side. Our founder and CEO John Fawcett joined us to share an update on Quantopian’s progress and to demo live trading.
Our keynote speaker was meetup community member Gary Chan. At our last meetup in NYC, Gary pitched us on the idea of a how-to pairs trading session aimed at beginner (and aspiring) algo traders (like himself), and we loved the idea. Gary walked us through the basics on pairs trading and even built a trade on-the-fly (which took guts!) We’ve got his presentation (below) and the completed spreadsheet with historical prices and intermediate values needed for the pairs trade (beta, spread, standard deviation). If you’d like to try his strategy out yourself you can also check out the version we coded up on Quantopian here.
On Friday (Nov-15-2013) morning, an intruder was able to gain enough access on our website to run some unauthorized code. The intruder did not access our databases; no user information, algorithms, backtest results, etc. were compromised. The intruder was able to see information about our system and infrastructure. This information could have been used to gain further access to our systems and data, but we have modified our security systems to prevent that.
When we identified the attack, we took down the website and shut down all access. We analyzed our logs of the incident, including logs maintained by separate vendors that we use. We are confident that we understand the extent of the attack and the attack vectors that were used. We will put up a more detailed, technical postmortem in a few days.
People regularly attempt to gain access to our servers. Some of the attacks are minor – people rattling the door to see if it’s locked. Other attackers are more serious and try increasingly sophisticated methods to get access. We log them all, and we respond with both automation and human intervention, and follow up as appropriate. This particular attack was noteworthy because the attacker actually got somewhere that he shouldn’t have been able to. Even though the attacker didn’t get any information of value, we view the breach as very serious and we have addressed the vulnerability the attacker exploited. We are working very hard to prevent any future incidents.
We view security as an ongoing concern – the work is never done. We anticipated that as we became more visible, we’d become a more tempting target to attackers. We mapped out plans to increase our security measures as our threat profile increased. In light of the attacks of the last few weeks, we have accelerated the implementation of additional security measures. That work is continuing at a high priority.
We’ve said this before, but it is very important and bears repeating: The protection of our members’ intellectual property is one of our core promises, and we take it very seriously. We want our members to trust us with their intellectual property. We believe that the best way to earn your trust is by being transparent with you about Quantopian. We hope that we can continue to earn your trust, even when we are sharing unpleasant news.
If you have any questions or concerns, please let us know. We always reply to email received at email@example.com. We monitor firstname.lastname@example.org for emails concerning our security. You are always welcome to reach me personally at email@example.com.
CEO and Co-Founder