Along with many other sites on the internet, Quantopian is taking steps to protect ourselves from the “Heartbleed Bug“, which was disclosed yesterday. Although we have no reason to believe that our site or any of our members’ accounts or data have been compromised, we are taking a number of precautions to safeguard the security of our members’ accounts. We will be documenting here the steps we are taking.
[DONE] We are generating a new SSL certificate to protect our site, using a newly generated encryption key; deploying the new key and certificate to our servers; and asking our SSL certificate authority to revoke our old certificate.
[DONE] We are adding a prominent banner within our application notifying all members to change their passwords. The banner will go away automatically when the user’s password is changed.
[DONE] We are requiring all members who have brokerage accounts configured within Quantopian to change their passwords.
[DONE] We are modifying our application so that members are not able to configure a brokerage account within Quantopian until they have changed their password.
[DONE] We are rotating the passwords and encryption keys used by the components of our application when they are communicating with each other. This requires application down-time the evening of April 8, 2014, starting at 5:00pm US/Eastern.
[IN PROGRESS] We are generating new encryption keys used to protect data in our databases and re-encrypting all data using the new keys.
Quantopian founder and CEO John “Fawce” Fawcett was invited by Reddit’s investing subreddit to host an AMA (Ask Me Anything). The turnout was overwhelming! With over 90 comments in the thread, we’ve pulled out the highlights. Take a look below at the conversation summary:
Q: How much computer coding ability must one have to use this well?
Fawce: You don’t need to be a coding expert to work on an algorithm in Quantopian. Some algorithms can be coded up with some fairly basic logic, stuff that you’d learn in early stages of a programming class. The more intricate your logic, or the more advanced your mathematical operations are, the more you’re going to need advanced skills.
We think that algorithm collaboration (upcoming new feature) is going to help with this quite a bit – people with great ideas will be able to pair with people with more coding skills, and they can collaborate on a result. We also think there is a future in renting of algorithms.
If you’re already pretty decent at Python, I highly recommend Wes McKinney’s book on pandas. He’s one of our advisors, and he’s done a ton of work to make complex mathematical operations easy in Python.
Q: Do you guys have any plans to integrate company financial data for the fundamental investors?
Fawce: Yes, we definitely do. We get this request all the time, and we are chomping at the bit to deliver it. Company financial data appeals to more than just fundamental investors, too. We think that algorithmic trading has three steps – data exploration, coding/backtesting, and live trading. We built the the backtester first, then we did live trading, and data exploration is next.
Q: Written any good algorithms yourself?
Fawce: I worked with a few other people to write the algo that is trading Quantopian’s money right now, which we shared with the world: https://www.quantopian.com/posts/paper-trading-with-interactive-brokers-open-beta-launch
I love algo ideas that model relationships between securities, the simplest of which is pairing. I also like Ernie Chan’s explanations of these types of trades, so I wrote a co-integration algo on gld/gdx. More recently, another quant posted Ernie Chan’s EWA/EWC pair trade. His implementation is cool because it also uses a Kalman filter: https://www.quantopian.com/posts/ernie-chans-ewa-slash-ewc-pair-trade-with-kalman-filter?c=1
Q: What does your pricing model look like?
Fawce: Right now, everything is free. We’re focused on bringing the community together, and we’re talking to our current live traders about pricing.The basic structure will be: – backtesting and forward testing are free – trading through your broker will be $TBD/month/algorithm.
Q: any chance you’ll offer I/B/E/S data, maybe for an extra cost?
Fawce: Yes, we’re talking to a number of data vendors about estimate data. I see fundamental corporate data as breaking between historical data from filings and forward estimates. I’m a huge fan of Estimize, and I think they are part of a major trend to use crowdsourcing to improve data origination and data quality.
As our community continues to grow, one way to draw more data to the platform is to become a sales channel for the data providers. I think that core data like historical filings needs to be free for research and testing, but that freshly delivered data used in live trading can be fee-based.
Q: How much AUM is currently being managed by Quantopian-hosted live-trading algorithms?
Fawce: This is really the question about our platform today. I have to be a little bit vague, because we haven’t spoken about total assets under management (AUM) publicly yet, and it is one of the best hooks we have for news coverage. I’ve actually started referring to the total as our “Algorithmically Directed Assets” or ADA since we aren’t a broker or fund manager and don’t hold your account.
We’re trading through 30 IB accounts today in private beta. I feel very comfortable that we are getting both sufficient technical burn-in and that we are directing a meaningful amount of capital. At the moment, the bottleneck for ADA growth seems to be algorithm development. But we know people are digging in, because we’ve seen a significant jump in the amount of coding/backtesting/paper trading over the past few weeks following our announcement of live trading.
Q: I love the things you are doing! How many developers did you initially start out with? Did you start doing the coding (yourself?) before venture capital started rolling in? Or how did the launch come along? … Do you feel that your hosting is latency sensitive (yet?)?.. I am just wondering how you go from idea to working with something so awesome
Fawce: My wife was able to support our family for the first phase of Quantopian’s existence, so I was able to invest the proceeds of selling my last company into Quantopian. To keep costs low I set up for work in our shed. I wrote the original prototype myself, and the first website drew a good initial audience after a few blogs mentioned it.
I started in June of 2011, and around November it started to get really cold in the shed, so I figured I should try to get some funding so I could get a real office :). I met Spark Capital around then, and they lead our seed round in January of 2012. Then I hired my CTO, and we worked together for about 6 months re-writing the prototype to be “real”. But, we didn’t get our office until June of 2012 when we wanted to ramp up our team. Instead, I worked from the public library in my town through the winter :).
Everything is hosted on amazon. We’re specifically avoiding HFT, so latency of 1-2s is our target. We see much less than that.
If you have passion for an idea, it will gain momentum when you talk to other people about it. I also relied on the opinions of others – my wife, friends with experience starting companies, friends from finance, and a few mentors.
Working alone from the shed and public spaces was a humble start, but it still felt awesome in the beginning. Starting up is a thrill, and each milestone feels epic – loading a month of historical trade data seemed incredible when I did it the first time. And when the first really avid user who started emailing me feature requests, I was dizzy with excitement. Same with hiring people. You have to really enjoy each step so that you can weather the setbacks. We had to make it through losing to big companies in recruiting, realizing we couldn’t afford all the data purchases we wanted to make originally, and live trading being way harder than we expected. Thanks for asking!
To see the full conversation thread, click here.
This is a short overview of common types of quantitative finance algorithms that are traded today. Of course, this is only an overview, and not comprehensive! Let me know if you think there are other algo types I should cover.
It’s been about a month now since Quantopian opened live trading up to a public/private beta – we figured it’s about time to share an update on how things are going.
First, I’d like to say thank you to all the Quantopian community members who have joined the pilot and taken the time to provide invaluable feedback, insight and encouragement. The community is, without a doubt, the most amazing part of Quantopian – we couldn’t do this without you, and it is a privilege to work with such talented quants, hackers, scientists and analysts on a daily basis.
I’d like to share two of the key metrics we are starting to track for live trading:
80 – Number of Quantopians who have connected their brokerage account.
20 – Number of live (IB-backed) algorithms running today.
I’m delighted to report the first real dollars directed by Quantopian-hosted algorithms. Making the leap from simulation to real trading is a huge deal. I’m incredibly grateful that our beta group has trusted us with their capital. I’d like to congratulate each pilot trader who has launched a real-money algo and claimed their free lifetime subscription so far — the rest of you have until March 31st to join that club.
We have also been listening closely to conversations in the larger community about live trading– it is great to see spirited and open discussions like this one unfold and evolve on the forums. My takeaway is that the key questions about live trading on Quantopian can be divided into three thematic buckets: Trust, Technology and Trading capital.
Trust: Do I trust my research and my code enough to trade? Do I trust Quantopian to protect my IP and place trades as I expect?
Technology: Are all the nuts and bolts required to implement my algorithm the way I need it in place?
Trading capital: Do I have the trading capital and the risk tolerance to live trade my algorithm?
The commonalities across these three themes are that first, the answers are personal, rather than universal and second, the answers are going to keep changing (hopefully from ‘Not yet.’ to ‘Yes!’) as we build the platform out together.
Finally I wanted to share a screenshot of my own live money algo performance (we shared the code for this a few weeks ago here). This is a conservative, market-tracking strategy that holds an equal weighted portfolio of 9 sector ETFs and rebalances as often as daily. I’ve been running this algo live since Jan 23rd when I started with a capital base of just under $30,000, I grabbed this update after the market closed yesterday.
Last week I gave a quant finance meet-up talk at the Hacker Dojo in Mountain View, CA. The format was inspired by some analysis I did on the types of algorithms shared and cloned in the Quantopian community – initially I wanted to ask: What are the most popular strategies coded up on Quantopian? To answer this question I ranked all public forum posts three ways, first on number of replies, second on number of views, and third on number of times cloned. I averaged these scores and re-ranked the list to come up with the top 25 ‘Most popular posts of all time’. (NB: I did not do any correction for the date of the original post, so the amount of time the thread has been alive has not been normalized.)
|Combo Rank||Post||Reply Count||View Count||Clone Count|
|1||Google Search Terms predict market movements||64||32121||821|
|2||OLMAR implementation – fixed bug||64||26216||701|
|3||Easy Volatility Investing by Tony Cooper @ Double-Digit Numerics||57||15211||846|
|4||discuss the sample algorithm||16||18701||2930|
|5||Global Minimum Variance Portfolio||28||10230||702|
|6||ML – Stochastic Gradient Descent Method Using Hinge Loss Function||10||20421||973|
|7||Mebane Faber Relative Strength Strategy with MA Rule||22||11199||622|
|8||OLMAR w/ NASDAQ 100 & dollar-volume||31||7766||701|
|9||Using the CNN Fear & Greed Index as a trading signal||22||9914||367|
|10||New Sample Algorithm||33||8336||328|
|11||Bollinger Bands With Trading||18||8390||566|
|12||Brent/WTI Spread Fetcher Example||17||10892||327|
|13||Ernie Chan’s Gold Pairs Trade||15||10420||329|
|14||Ranking and Trading on “Days to Cover”||4||24976||384|
|15||Determining price direction using exponential and log-normal distributions||9||9781||624|
|16||Time to “sell in may and go away”?||27||8231||263|
|18||Simple Mean Reversion Strategy||6||11861||275|
|19||Neural Network that tests for mean-reversion or momentum trending||4||10101||407|
|21||Using weather as a trading signal||6||11958||199|
|22||Global market rotation strategy||53||7629||95|
|23||trading earnings surprises with Estimize data||34||7506||130|
|24||Trading Strategy: Mean-reversion||13||8252||216|
|25||Turtle Trading Strategy||11||8012||318|
Starting from this list, I worked backwards and used examples from the Quantopian community to introduce 5 basic quant strategy types: Mean Reversion, Momentum, Value, Sentiment and Seasonality. While this list is not technically ‘mutually exclusive and collectively exhaustive’, it covers a large fraction of intraday to lower frequency quant strategies and provides a good overview of the way equity focused quants think about predicting market prices. I went back to my Top 25 list and categorized each algo into one of these five buckets and then created this pie chart based on the aggregated number of views for each strategy type.
There are a number of interesting conclusions to be drawn from this initial overview of community activity. Perhaps the most obvious and predictable of these is that price based strategies are currently in the lead by a large margin – due, I expect, to the easy access to minute-level equity pricing and the accessibility of the logic for momentum and mean-reversion. Indeed there were no value-based strategies that made their way into the Top 25 – which in my view represents a key opportunity space right now.
More subtle and, from my admittedly biased point of view, more compelling is the diversity and quality of content and collaboration in the public sphere. Having joined the Quantopian team from a large corporate setting working with a small group of institutional clients, seeing that the Top 25 algos have been cloned over 13,000 times, an average of over 500 clones per strategy is… well it’s pretty damn cool.
Below you can find the slide deck from my presentation:
If you have a brokerage account with Interactive Brokers* (IB), you can integrate Quantopian with your IB account and start paper trading today.
(If you don’t have an IB account, that’s OK. You can still paper trade with Quantopian! And, if you’d also like to have an IB account, you can create an account on their website.)
Do You Want to Trade With Real Money, Not Paper?
Our pilot program for real money trading is still in private beta. If you haven’t requested a spot in our real money pilot program, you can do that by filling out this short form.
How Much Does This Cost?
Paper trading and live trading through IB* are both free while we run this beta program. In the future we plan to charge a flat monthly fee per live algorithm.
We also have a special offer. If you get your algorithm up and running with real money by 3/31/14, we’ll give you a lifetime free subscription**.
Live Trading Pilot Progam
Our pilot live trading program has been growing rapidly in the last few weeks, and we’re excited to grow the program even more by opening up paper trading to everyone.
We think that live trading on Quantopian is going to change the way you trade forever.
Quantopian helps you focus on the big picture and create new ideas, instead of operating a manual, tedious system.
* Interactive Brokers LLC is not affiliated with and does not endorse or recommend Quantopian, Inc. Interactive Brokers provides execution and clearing services to customers who integrate their Interactive Brokers account with their Quantopian account. For more information regarding Interactive Brokers LLC, please visit www.interactivebrokers.com.
** Free, life-time subscription offer is for one algorithm, up to $100,000 in account balance. Additional algorithms and larger account balances are not covered by this offer.
P.S. Attached is a sample algorithm that’s geared for live trading. It takes a list of stocks and rebalances them every day. You can paper trade it today:
Recently I was asked to comment on a piece in the Economist about the rise and supposed fall of quant investing in the last ten years.
The thesis of the article, which equates ‘quant’ with trend following or technical analysis, seems to be: See, those pointy-headed geeks aren’t as smart as they think they are! Sure, they racked up outsized returns in the mid-2000s and the market downturn of ’08. But now it’s different, the market has changed, their tricks are broken, and since it’s impossible to understand what quants are really doing anyway, there’s also no way to predict whether their techniques will ever work again.
The logic of this argument is dubious at best; it is at once overly simplistic – relegating the entirety of data-driven systematic investment to the trend-following label – while at the same time providing no convincing counterpoint or real basis for comparison of the relative performance of systematic versus discretionary investing, or ‘stock picking’ over some meaningful historical time frame (you be the judge, the full list of comparable trailing performance for all HFRI funds is available here and a more in depth analysis on HF underperformance in general here). In short, the piece suffers from a clear case of stumbling into a “complexity valley”. Former GETCO trader Nancy Hua gave a nice explanation of this phenomenon as it relates to quant finance in a recent Quora post:
“[Quant] suffers from a complexity valley: If people don’t know anything about it, when they hear about it in passing, they don’t understand it and they don’t think much of it; if people think about it a medium amount and read news articles about it, when they are armchair philosophizing they hate it; when people actively participate in algorithmic trading and electronic markets, they get obsessed with it and want to integrate it with every aspect of their existing trading.”
Does the Economist really intend to take the position that we’ve given the whole ‘computers in investing’ thing a shot and it just didn’t really work out? This data-driven analysis stuff is just not as satisfying as a good ‘ole fashioned hunch! The Economist is one of my favorite magazines, full of bold analysis and expertise. This particular article isn’t to The Economist’s usual standard.
Far from being a passing fad – I’d argue that the technological stars are aligning for systematic investing (done right*) to bring a massive wave of disintermediation to the financial industry. As individual investors reach the other side of this particular complexity valley and recognize the accessibility and economy of automated investing – they are going to wonder why 2 and 20 on an ONGOING BASIS ever seemed even remotely reasonable for a job that, thought through once can absolutely be – indeed already IS — run day to day by computers.
A natural outgrowth of what we do at Quantopian is that our users tend to be technically sophisticated and love data. Therefore, when we have a significant event, such as the recent security breach, we are as open and transparent as possible about what transpired. In that spirit, we present this detailed analysis of the security breach and what we’ve since done to strengthen our site security.
In presenting such an analysis, there is a fine line between between being sufficiently open and transparent with our users and with the maintainers of other web sites, who might benefit from our lessons learned, and providing information that could aid future attacks. While we don’t believe in relying on security by obscurity, we also don’t think it’s a good idea to tell the bad guys exactly where to focus their efforts. Therefore, some details have been omitted.
Essential to our mission as an advanced algorithmic trading platform is the fact that our servers execute arbitrary Python code written by our users. Therefore, while we share with every other web application the requirement to code our application securely, we have an additional, challenging requirement most sites don’t: preventing our users’ code from compromising our security. This makes security more challenging and also makes our site a more attractive target.
We run users’ algorithms in a sandbox which is isolated from the rest of our application in numerous ways, including:
The algorithm sandbox obviously cannot be entirely isolated from the rest of our application, because we need to send pricing and universe data to the algorithm and receive in return the logging data and results that it generates.
Malicious users attempt on a regular basis to break out of our algorithm sandbox. We actively monitor them, evaluate them on a case-by-case basis, and take additional steps when necessary.
On the afternoon of Thursday, November 14, our monitoring systems alerted us to the fact that an attacker was attempting to escape from the sandbox using a technique we knew about and had already blocked. Many other users had tried the same technique unsuccessfully, so we weren’t particularly worried.
However, two things about this particular attacker surprised us. First of all, he was more persistent than most; he tried pretty much every conceivable variety of this particular attack, when most attackers give up after it fails a few times. Second, because of his persistence, he actually found a minor chink in our armor: he was able to get a peek at some internals of our sandbox due to a typographical error in one of our source files.
The information he was able to retrieve was relatively inconsequential, but we we were obviously alarmed by the unintended exposure of that data. We immediately blocked the attacker’s access to the site, tracked down the root cause of the vulnerability, fixed it, and released the fix.
Shortly after we blocked the attacker, he circumvented the block, returned to the site, and kept working on the attack, so we blocked him again. After that, he tried unsuccessfully for a while to get back into the site, and then went away. We thought that was the end of it, but unfortunately we were wrong.
On Friday morning, the attacker returned. He tried several techniques unsuccessfully, but then he found one that worked, a method for partially circumventing the limitations on Python modules accessible from user algorithm code. We had known about this particular vulnerability before his attack, and it was on our roadmap to fix soon, but we hadn’t yet done so because we thought that it was limited in scope and it was unlikely that anyone would find it. With 20/20 hindsight, we were clearly wrong.
After detecting the attacker’s return, we began to analyze what he had been able to accomplish, and it quickly became clear that the breach was significant. We shut down the application, notified our users on our blog, and set to completing our analysis to determine the precise magnitude of the compromise and what needed to be done to eliminate the vulnerabilities that had enabled it.
Within an hour of detecting the compromise, we were able to identify the root cause and a remediation plan which we immediately began implementing. Within a few hours of detecting the compromise, we were able to confirm that the attacker had not accessed any user data.
We left the application shut down all day Friday and Saturday and most of the day Sunday while we implemented and tested the enhancements needed to prevent this kind of attack in the future. After developing and testing the fixes, we brought the site back up at around 8:00pm on Sunday night.
The attacker made several more unsuccessful attempts to compromise the site on Sunday night.
On Monday morning, we had a smooth market open for our live traders and paper traders.
We continue to aggressively monitor the application.
Before this breach, access to Python modules and methods within using algorithm code was limited by three different mechanisms:
The flaw in this logic, which the attacker discovered, is that if a whitelisted module imports a module that is not whitelisted, then that module is accessible as an attribute of the whitelisted module’s object. For example:
>>> import pytz >>> import sys >>> print sys <module 'sys' (built-in)> >>> print pytz.sys <module 'sys' (built-in)> >>>
We couldn’t add every single module name to the keyword blacklist, since that would have prevented users from using too many variable names. And even if we had been able to do that, a module could also import a blacklisted module under a different name, e.g., “import sys as sysmodule”, and we would have had to fully audit all whitelisted modules for such references, as well as re-auditing them each time we upgraded our module versions, which wasn’t practical.
The attacker accessed the sys module through another, whitelisted module and used it to gain access to other Python modules containing sensitive data. Fortunately, however, he did not access any user data. He also didn’t get our data encryption keys, which we guard extremely carefully for obvious reasons, so even if he had accessed user data he would not have been able to decrypt it.
We’ve made a number of changes to our application to eliminate this particular vulnerability and mitigate the risk of other vulnerabilities as well. These include:
In closing, we first and foremost want to reiterate that no user data were compromised during this incident.
We take very seriously our responsibility to safeguard our members’ intellectual property. Security is an ongoing process, not a one-time thing, and we will continue to evolve our practices to stay current with the state of the art. We believe the best way to earn your trust is to be open and transparent, and we will continue to do that even when we are sharing unpleasant news.
If you have any questions or concerns, please let us know. We always reply to email received at [email protected]. We monitor [email protected] for emails concerning our security. You are always welcome to reach me personally at [email protected].
Vice President of Operations
For those of you who weren’t able to join us in NYC last month here are some highlight clips from our event.
We kicked off with an intro from Fawce. First here’s the general overview including a look at the live trading dashboard:
Fawce on the two most common questions people ask him about Quantopian:
Gary Chan gave our keynote talk – in this clip he explains his motivation for sharing his work in a public meetup forum:
I really liked the examples Gary used for what his definitions of Easy vs. Hard are. Spoiler alert: Getting into Harvard is hard. Opening a successful restaurant is hard. Algorithmic trading is (by comparison) easy.
Gary walks through his trading infrastructure and toolkit:
Finally Gary shares the results of his actual backtests:
You can view the full slide deck from Gary’s presentation here:
Probabilistic Programming (aka Bayesian statistics) allows flexible construction of statistical models to gain insight from data. Estimation of best fitting parameter values — as well as uncertainty in these estimations — can be automated by sampling algorithms such as Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has already lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science. In this blog post I will highlight what Probabilistic Programming can offer for algorithmic traders.
I recently gave a presentation about Bayesian Data Analysis in PyMC3 at PyData NYC’13 with a special focus on financial applications which should give a good primer — check it out here:
Bayesian statistics has many benefits, some of which I discuss in my talk. Most relevant to Quantitative Finance, however, is the fact that you can very flexibly model latent, unobservable processes and how they relate to observable events. As an example, we could model fear of investors. We can’t measure it directly but it certainly has a bearing on market behavior and the stock price.
Bayes’ formula allows us to then infer backwards: given the observable data (e.g. the stock price), what is the probability of the latent construct (e.g. investor fear)?
One elegant example of how Probabilistic Programming can infer unobservable quantities of the stock market is the stochastic volatility model. Volatility is an important concept of quantitative finance as it relates to risk. Unfortunately, as Tony Cooper reminds us:
“Volatility is a strange thing – it exists but you can’t measure it.”
If we can’t measure something, the next best thing we can do is to try and model it. One way to do this is in a probabilistic framework is the concept of stochastic volatility: If we assume that returns are normally distributed, the volatility would be captured as the standard deviation of the normal distribution. Thus, the standard deviation gives rise to stochastic volatility. Intuitively we would assume that the standard deviation is high during times of market turmoil like the 2008 crash.
So the trivial thing to do would be to look at the rolling standard deviation of returns. But this is unsatisfying for multiple reasons:
These properties are outlined in the plot below. The larger the window size, the more lag, the smaller the window size, the more unstable the estimate gets.
As we will see, the stochastic volatility model does a better job at all of them. But before we look at that we need to establish one more insight into the nature of volatility: it tends to cluster. This is nicely demonstrated by looking at the returns of the S&P 500 above. As you can see, during the 2008 financial crisis there is a lot volatility in the stock market (huge positive and negative daily returns) that gradually decreases over time.
So how do we model this clustering property? The Stochastic Volatility model assumes that that the standard-deviation of the returns follow a random-walk process. You can read the Wikipedia article I linked but essentially this process allows for slow, gradual changes over time.
What is interesting is that we can model the standard deviation itself to follow a random walk. Intuitively, we allow standard deviation to change over time but only ever so slightly at each time-point. For the mathematical and implementational details of the model applied see this IPython notebook which uses PyMC3 (a new, flexible probabilistic programming framework for Python).
The plot below shows the latent volatility (in orange) inferred from the model based on the market data (in mint). The orange lines represent the standard deviation of the Normal distribution we assume for the daily returns.
As you can see, the estimated volatility is much more robust. Moreover, all parameters are estimated from the data while we would have to find a reasonable window length for the rolling standard deviation ourselves. Finally, we do not just get a single estimate as with the rolling standard deviation but rather many solutions that are likely. This provides us with a measure of the uncertainty in our estimates and is represented by the width of the orange line above.
In summary, Bayesian statistics and Probabilistic Programming is an extremely powerful framework for Quantitative Finance as it provides:
We are currently working on making PyMC3 available on Quantopian to allow usage of this type of models — so sign up and start getting familiar with our platform, discuss on the forums, and follow @Quantopian and me on Twitter. If you are interested in Probabilistic Programming I also sometimes blog about it here.