Soon, you can start trading your algorithms with your E*TRADE brokerage account (1).
What does this mean for you?
For those who are curious on seeing it in action, I presented a quick demo of an algorithm trading with E*TRADE here:
Do you want a chance to trade real money With E*TRADE in our pilot program?
How much does this cost?
Quantopian does not charge for live trading integration, though we may need to charge a monthly fee at some point in the future. Your brokerage agreement with E*TRADE, including fees and commissions, will be subject to the same terms and rules as your existing E*TRADE brokerage account. (1)
(1) E*TRADE Financial Corporation is not affiliated with and does not endorse or recommend Quantopian, Inc. E*TRADE provides execution and clearing services to customers who integrate their E*TRADE account with their Quantopian account. For more information regarding E*TRADE Financial Corporation, please visit www.etrade.com.
P.S. Attached is a sample algorithm that's geared and ready for live trading. Tweak it to use a different list of stocks or create your own algorithm from scratch.
tl;dr: In this blog post I run several simulations using Zipline to show what happens when we add more capital to an algorithm, and keep adding capital until we see capacity limitations.
Assume that you have developed an algorithm that looks very promising in a backtest. You might then want to test this algorithm in the real market, but since you are not yet confident enough in your algorithm, you probably start out investing a small amount of money at first. If the algorithm actually performs as well as you hoped for you will probably want to ramp it up and put more money behind it. If you're really certain you might even convince others to also invest using your strategy for a fee. The problem with your plan to ramp up is capacity. What if your newly-increased investment clobbers all of the market liquidity, and your edge disappears? Unfortunately, we just can't assume that our orders are always going to be filled at a certain price, particularly if we are trading in low-cap stocks.
It is very easy to fool ourselves with a good backtest performance; that backtest might not hold up the ultimate test against reality. Accurate backtesting can be very tricky as various levels of complexity are involved (see http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2308659 for a recent paper on backtest overfitting). At the most basic level, failing to account for order delays, commission, the influence of our own orders on the market (i.e. slippage) will make our strategy look much better than in reality it would be. Zipline -- our open-source backtester that's also used on Quantopian -- by default accounts for all these complexities.
To shed some light on what happens when we trade larger and larger quantities I ran some simulations using Zipline.
The algorithm we will be using is the OLMAR algorithm. At the core, the algorithm assumes mean-reversion and rebalances its portfolio to add more weight to stocks that under-performed and less weight to stocks that over-performed recently. The details of the strategy are not critical here but note that the algorithm always aims to be fully invested and is long-only.
I next ran the algorithm using different amounts of starting capital and either a portfolio of large cap stocks (including $SBUX and $CERN with an average daily trading volume of $127,316,155) and a portfolio of small cap stocks (including $HMSY and $BWLD with an average daily trading volume of $11,952,436).
You can view the notebook with the full code here to run the simulations and generate the plots here.
The plot below shows the cumulative returns over time of the two portfolios under different starting capital levels.
If there were no capacity limitations whatsoever, we'd expect all lines to be on top of each other. But note how with more and more starting capital the lines start to diverge more and more despite running the identical strategy in every simulation. Why is that the case?
Specifically, there seem to be two patterns emerging. With low amounts of starting capital we see performance to decrease in a step-wise function. The return pattern is the same, but shifted downward. Secondly, after a certain point of capital (starting at 5 million $ for the low cap stocks) the algorithm seems to stop working all-together and simply burns money.
The two mechanisms at play here are:
The first one is called slippage and describes that our own orders influence the price. For example, buying a large quantity will drive the price up. This can be seen by taking a closer look at the fill price of a single transaction under different starting capitals:
As you can see, as our order sizes increase, the fill-price also increases. This is a property of the Volume Shared slippage model used by Zipline. Note that the actual numbers of price impact are subject to how we parameterize the slippage model.
2. Capacity limitations
The second one is that there is not enough liquidity in the market to fill our orders in a reasonable time. You can already see above that at most we can buy 48900 shares on that day.
As you can see in the plot below, the orders of the algorithm get filled to a smaller and smaller percentage because the order sizes get so large that we start to run into capacity limitations.
Moreover, this is dependent on the volume of the specific stock, see e.g. $IRBT where orders can not get filled at a capital of more than $1,000,000. Note also, that this is where our algorithm's performance starts to completely change, as can be seen in figure 1 above. In summary, running into capacity limitations for the smallest volume stock is enough to cause our algorithm to behave unexpectedly.
Being able to estimate the capacity of a trading algorithm is very useful when trying to gauge how much capital to put behind a certain strategy. Certainly large hedge-funds have this problem. But also small scale investors can run into this problem when investing in small-cap stocks as demonstrated above.
Running various backtests at different capacities allows us to estimate the capacity of a given algorithm and see how it is affected (subject to how good our slippage model is) but we could also look at the volume of the individual stocks being traded.
One solution to scaling up a strategy is to slow it down so that it has more time to spread out its orders over time and decrease its own market impact. However, by slowing down a strategy you are also diluting your prediction. If you predict that a stock will rise a few hours after a certain event (e.g. earning announcements) you need to act as fast as possible. This sounds very much like an optimization problem where one is trying to find the sweet spot between maximizing speed and minimizing market impact.
What are your experiences with slippage? Leave a comment below!
You can find and clone the IPython notebook that was used to generate these simulations here.
And, of course, the third contest starts on April 1st! (Get your entry in before 9:30AM EDT to qualify).
We've learned quite a bit since we kicked this off. This webinar covers what we've learned about scoring, what tools our community needs, cheating, and more.
According to Credit Suisse’s Gender 3000 report, at the end of 2013, women accounted for 12.9% of top management in 3000 companies across 40 countries. Additionally, since 2009, companies with women as 25-50% of their management team returned 22-29%.
If companies with women in management outperform, what would happen if you invested in women-led companies?
At QuantCon 2015, I explained my analysis and dug into some of the questions I've been asked about in the past weeks. You can watch my talk here:
As I explain at the end of the talk, there are many things I could do to explore this further. The truth is, I'm a product manager with a tool to build. My goal is to get this algo live trading, so I can understand that experience from the perspective of my users.
It's likely I'll explore some of the interesting alternatives to this study, but unlikely I'll do them all. If you would like to take it further, my notebook can be viewed here (and my process in detail here) and all of the data I used can be downloaded here. All I ask is that you share your findings back.
In this blog post you will learn about the basic idea behind Markowitz portfolio optimization as well as how to do it in Python. We will then show how you can create a simple backtest that rebalances its portfolio in a Markowitz-optimal way. We hope you enjoy it and get a little more enlightened in the process.
We will start by using random data and only later use actual stock data. This will hopefully help you to get a sense of how to use modelling and simulation to improve your understanding of the theoretical concepts. Don‘t forget that the skill of an algo-trader is to put mathematical models into code and this example is great practice.
Let's start with importing a few modules, which we need later and produce a series of normally distributed returns.
cvxopt is a convex solver which you can easily download with
sudo pip install cvxopt.
%matplotlib inline import numpy as np import matplotlib.pyplot as plt import cvxopt as opt from cvxopt import blas, solvers import pandas as pd np.random.seed(123) # Turn off progress printing solvers.options['show_progress'] = False
Assume that we have 4 assets, each with a return series of length 1000. We can use
numpy.random.randn to sample returns from a normal distribution.
## NUMBER OF ASSETS n_assets = 4 ## NUMBER OF OBSERVATIONS n_obs = 1000 return_vec = np.random.randn(n_assets, n_obs)
plt.plot(return_vec.T, alpha=.4); plt.xlabel('time') plt.ylabel('returns')
<matplotlib.text.Text at 0x7fa919b8c690>
These return series can be used to create a wide range of portfolios, which all
have different returns and risks (standard deviation). We can produce a wide range
of random weight vectors and plot those portfolios. As we want all our capital to be invested, this vector will have to sum to one.
def rand_weights(n): ''' Produces n random weights that sum to 1 ''' k = np.random.rand(n) return k / sum(k) print rand_weights(n_assets) print rand_weights(n_assets)
[ 0.54066805 0.2360283 0.11660484 0.1066988 ] [ 0.27638339 0.03006307 0.47850085 0.21505269]
Next, lets evaluate how many of these random portfolios would perform. Towards this goal we are calculating the mean returns as well as the volatility (here we are using standard deviation). You can also see that there is
a filter that only allows to plot portfolios with a standard deviation of < 2 for better illustration.
def random_portfolio(returns): ''' Returns the mean and standard deviation of returns for a random portfolio ''' p = np.asmatrix(np.mean(returns, axis=1)) w = np.asmatrix(rand_weights(returns.shape)) C = np.asmatrix(np.cov(returns)) mu = w * p.T sigma = np.sqrt(w * C * w.T) # This recursion reduces outliers to keep plots pretty if sigma > 2: return random_portfolio(returns) return mu, sigma
In the code you will notice the calculation of the return with:
where is the expected return, is the transpose of the vector for the mean
returns for each time series and w is the weight vector of the portfolio. is a Nx1
column vector, so turns into a 1xN row vector which can be multiplied with the
Nx1 weight (column) vector w to give a scalar result. This is equivalent to the dot
product used in the code. Keep in mind that
Python has a reversed definition of
rows and columns and the accurate
NumPy version of the previous equation would
R = w * p.T
Next, we calculate the standard deviation with
where is the covariance matrix of the returns which is a NxN matrix. Please
note that if we simply calculated the simple standard deviation with the appropriate weighting using
std(array(ret_vec).T*w) we would get a slightly different
’bullet’. This is because the simple standard deviation calculation would not take
covariances into account. In the covariance matrix, the values of the diagonal
represent the simple variances of each asset while the off-diagonals are the variances between the assets. By using ordinary
std() we effectively only regard the
diagonal and miss the rest. A small but significant difference.
Lets generate the mean returns and volatility for 500 random portfolios:
n_portfolios = 500 means, stds = np.column_stack([ random_portfolio(return_vec) for _ in xrange(n_portfolios) ])
Upon plotting those you will observe that they form a characteristic parabolic
shape called the ‘Markowitz bullet‘ with the boundaries being called the ‘efficient
frontier‘, where we have the lowest variance for a given expected.
plt.plot(stds, means, 'o', markersize=5) plt.xlabel('std') plt.ylabel('mean') plt.title('Mean and standard deviation of returns of randomly generated portfolios')
Once we have a good representation of our portfolios as the blue dots show we can calculate the efficient frontier Markowitz-style. This is done by minimising
for on the expected portfolio return whilst keeping the sum of all the
weights equal to 1:
Here we parametrically run through and find the minimum variance
for different ‘s. This can be done with
scipy.optimise.minimize but we have
to define quite a complex problem with bounds, constraints and a Lagrange multiplier. Conveniently, the
cvxopt package, a convex solver, does all of that for us. We used one of their examples with some modifications as shown below. You will notice that there are some conditioning expressions in the code. They are simply needed to set up the problem. For more information please have a look at the
mus vector produces a series of expected return values in a non-linear and more appropriate way. We will see later that we don‘t need to calculate a lot of these as they perfectly fit a parabola, which can safely be extrapolated for higher values.
def optimal_portfolio(returns): n = len(returns) returns = np.asmatrix(returns) N = 100 mus = [10**(5.0 * t/N - 1.0) for t in range(N)] # Convert to cvxopt matrices S = opt.matrix(np.cov(returns)) pbar = opt.matrix(np.mean(returns, axis=1)) # Create constraint matrices G = -opt.matrix(np.eye(n)) # negative n x n identity matrix h = opt.matrix(0.0, (n ,1)) A = opt.matrix(1.0, (1, n)) b = opt.matrix(1.0) # Calculate efficient frontier weights using quadratic programming portfolios = [solvers.qp(mu*S, -pbar, G, h, A, b)['x'] for mu in mus] ## CALCULATE RISKS AND RETURNS FOR FRONTIER returns = [blas.dot(pbar, x) for x in portfolios] risks = [np.sqrt(blas.dot(x, S*x)) for x in portfolios] ## CALCULATE THE 2ND DEGREE POLYNOMIAL OF THE FRONTIER CURVE m1 = np.polyfit(returns, risks, 2) x1 = np.sqrt(m1 / m1) # CALCULATE THE OPTIMAL PORTFOLIO wt = solvers.qp(opt.matrix(x1 * S), -pbar, G, h, A, b)['x'] return np.asarray(wt), returns, risks weights, returns, risks = optimal_portfolio(return_vec) plt.plot(stds, means, 'o') plt.ylabel('mean') plt.xlabel('std') plt.plot(risks, returns, 'y-o')
In yellow you can see the optimal portfolios for each of the desired returns (i.e. the
mus). In addition, we get the one optimal portfolio returned:
[[ 2.77880107e-09] [ 3.20322848e-06] [ 1.54301198e-06] [ 9.99995251e-01]]
This is all very interesting but not very applied. We next demonstrate how you can create a simple algorithm in
zipline -- the open-source backtester that powers Quantopian -- to test this optimization on actual historical stock data.
from zipline.utils.factory import load_bars_from_yahoo end = pd.Timestamp.utcnow() start = end - 2500 * pd.tseries.offsets.BDay() data = load_bars_from_yahoo(stocks=['IBM', 'GLD', 'XOM', 'AAPL', 'MSFT', 'TLT', 'SHY'], start=start, end=end)
IBM GLD XOM AAPL MSFT TLT SHY
data.loc[:, :, 'price'].plot(figsize=(8,5)) plt.ylabel('price in $')
Next, we'll create a
zipline algorithm by defining two functions --
initialize() which is called once before the simulation starts, and
handle_data() which is called for every trading bar. We then instantiate the algorithm object.
If you are confused about the syntax of
zipline, check out the tutorial.
import zipline from zipline.api import (add_history, history, set_slippage, slippage, set_commission, commission, order_target_percent) from zipline import TradingAlgorithm def initialize(context): ''' Called once at the very beginning of a backtest (and live trading). Use this method to set up any bookkeeping variables. The context object is passed to all the other methods in your algorithm. Parameters context: An initialized and empty Python dictionary that has been augmented so that properties can be accessed using dot notation as well as the traditional bracket notation. Returns None ''' # Register history container to keep a window of the last 100 prices. add_history(100, '1d', 'price') # Turn off the slippage model set_slippage(slippage.FixedSlippage(spread=0.0)) # Set the commission model (Interactive Brokers Commission) set_commission(commission.PerShare(cost=0.01, min_trade_cost=1.0)) context.tick = 0 def handle_data(context, data): ''' Called when a market event occurs for any of the algorithm's securities. Parameters data: A dictionary keyed by security id containing the current state of the securities in the algo's universe. context: The same context object from the initialize function. Stores the up to date portfolio as well as any state variables defined. Returns None ''' # Allow history to accumulate 100 days of prices before trading # and rebalance every day thereafter. context.tick += 1 if context.tick < 100: return # Get rolling window of past prices and compute returns prices = history(100, '1d', 'price').dropna() returns = prices.pct_change().dropna() try: # Perform Markowitz-style portfolio optimization weights, _, _ = optimal_portfolio(returns.T) # Rebalance portfolio accordingly for stock, weight in zip(prices.columns, weights): order_target_percent(stock, weight) except ValueError as e: # Sometimes this error is thrown # ValueError: Rank(A) < p or Rank([P; A; G]) < n pass # Instantinate algorithm algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data) # Run algorithm results = algo.run(data) results.portfolio_value.plot()
[2015-01-28 14:35:58.352355] INFO: Performance: Simulated 2411 trading days out of 2411. [2015-01-28 14:35:58.352976] INFO: Performance: first open: 2005-06-29 13:31:00+00:00 [2015-01-28 14:35:58.353412] INFO: Performance: last close: 2015-01-27 21:00:00+00:00
As you can see, the performance here is pretty good, even through the 2008 financial crisis. This is most likely due to our universe selection and shouldn't always be expected. Increasing the number of stocks in the universe might reduce the volatility as well. Please let us know in the comments section if you had any success with this strategy and how many stocks you used.
In this blog, co-written by Quantopian friend Dr. Thomas Starke, we wanted to provide an intuitive and gentle introduction to Markowitz portfolio optimization which still remains relevant today. By using simulation of various random portfolios we have seen that certain portfolios perform better than others. Convex optimization using
cvxopt allowed us to then numerically determine the portfolios that live on the efficient frontier. The zipline backtest serves as an example but also shows compelling performance.
cvxoptto the Quantopian backtester -- stay tuned!
Today Quantopian is publicly releasing the results of our internal performance test framework. You can see the latest results at http://www.quantopian.com/performance, updated with each commit to our code repo.
I'm excited that Quantopian is rallying around system performance as a team and that we are releasing results that show how much work we need to do. If you’re passionate about system performance, we’re hiring. We’d also welcome PRs to Zipline, our open-source backtesting engine.
I enjoy working on improving system performance because it defines system quality. Stability, the user experience, scalability, and developer productivity all stem from performance and performance improvements bubble up through all layers of the system. At Quantopian, faster backtests very directly mean that our users can run more tests to ensure their algorithms are functioning properly.
While building our live trading capabilities, I worked on a prototype that switched our equity minute data source from using documents stored in Mongo to bcolz, a file-based data source. I tested and measured the prototype using the existing tools we had developed to get profiling results of our backtesting engine (like Scott Sanderson's great https://github.com/ssanderson/pstats-view). Jean Bredeche, our CTO, loved the prototype results, so we set up a project to convert Quantopian’s production and development infrastructure to use bcolz.
Jean asked me to present this project to the team. Preparing that presentation forced us to think of a clear way to measure and present Quantopian’s backtesting performance. Solving that communication problem proved as challenging as coding and releasing the bcolz improvement. After all, Zipline and Quantopian are designed to run user code. The incredible flexibility we provide for our users makes evaluating system performance difficult, especially so because we never look at user algorithm source code without explicit permission.
Together with Jess Stauth, our lead quant, I designed a set of test algorithms and ran simulations with our different data stores, which let me plot comparisons. These plots really captured people’s attention, and our developers started to ask me to run the simulations on their code branches to check for performance regressions, or to prove ideas for speed improvements.
We started to talk about our development culture and the disconnect between the value we place on performance and the investment we make in improving it. Performance is just as important as correctness. For correctness, we run continuous integration, and maintain a huge suite of tests. Any platform code changes are automatically run in parallel to the prior release, and we check that simulation results match.
We make a huge and ongoing investment in the correctness of our system. We reasoned that performance needed the same continuous measurement and the same feedback loop for our developers. We developed a suite of synthetic algorithms that let us stress test different parts of our backtester, such as universe size, buying frequency, and history window length. We will be continuously adding to our suite of test algorithms, and welcome suggestions for new ones.
I'm proud to say that the performance test framework is now part of our development process, and as a result, performance improvement is part of every code push.
About 4 months ago, we announced that we were hard at work building a hosted research platform for analyzing our curated datasets, your Quantopian algorithms, and your backtest results. We've been making great progress, and currently have over 40 alpha users on the new environment and they are helping us to improve it every day. We aim to have the platform available for everyone within the next few months. In the meantime, we wanted to give a sneak peak into it's capabilities to show you it can help you create and explore your ideas. We have shared an example notebook in the community.
In this example, we walk through the process of exploring and understanding an external dataset. We then use that understanding to optimize a trading strategy. The external dataset is provided by EventVestor, a financial data and intelligence platform for investors. EventVestor aggregates event-driven data and provide a multitude of analytics services on it. In this notebook, we analyze if share buybacks are an indicator of drift and optimize a strategy for investing based on share buybacks.
In the community, you'll be able to view the notebook in it's entirety and see the related backtests. Once the research platform is ready for prime time, we'll add the ability to clone the notebook (like you can do today with an ago) so you can experiment on your own.
If you haven't signed up to be a Research Beta user yet, now is a great time. We expect to be adding more users in the next few weeks.
We ship a lot of code here at Quantopian, with a lot of new features. Some of those features get top billing and lots of space on the website: launch of the Quantopian Open, addition of fundamental data from Morningstar, set up of the IDE quick chat, development of the Managers Program, and others. But, other improvements we make don't get the same level of attention. Some of these improvements are quite nifty - making using Quantopian easier, faster, and more reliable. We plan on sharing them periodically on the blog. Here is a list of our latest news, features, and tools:
Tools and Features:
Moving forward, we promise to post a monthly summary of all our updates here for easy consumption. If you would like up to the minute updates, subscribe to our RSS feed and follow us on Facebook, Twitter, and Linkedin.
Last month, Quantopian introduced a powerful new feature: programmatic access to fundamental data from Morningstar in the backtester. It is yet another piece of the Quantopian platform that is leveling the algorithmic investing playing field.
Since the announcement, the response from the Quantopian community has been phenomenal with thousands of backtests already run using the data. Whole new classes of investment strategy, like quantitive value investing, are now more easily executed in Quantopian.
In tandem with the announcement, we made a special offer. The community members who post the best algorithms that use fundamentals to our forums by January 1 would win an additional 12 months of free access to the fundamental data.
With the deadline past, we’ve got seven community members who have earned the additional 12 month prize. Here are the winning posts, algorithms and authors:
If you'd like to learn more about fundamentals, check out the winning algorithms or simply read the original forum post announcing the availability of the data and clone the algorithm there. Or, sign up to attend our upcoming webinar which will teach the basics of using fundamentals inside Quantopian.
Congratulations to all our winners!
Starting today, Quantopian community members can programmatically access Morningstar's corporate fundamental data.
Quantopian's comprehensive historical fundamental data API is unprecedented in the industry. For the first time ever, individual investors can build fundamentally driven investment algorithms.
With access to the fundamentals data within your algorithm, you can use it to define your investible stock universe. Want your algorithm to only focus on stocks with a market cap over $1B? You can do it now. Filter stocks by PE ratio? By dividends? By EPS? All possible with Quantopian. The Quantopian IDE makes it easy to search for the right fundamental metrics for your algorithm.
We’ve taken care of the heavy lifting for you: Morningstar's company identifiers are mapped to Quantopian's security identifiers and the API includes 'knowledge date' indexing to avoid look-ahead bias. I can’t imagine an easier way to programmatically work with fundamentals.
Check out our simple example and documentation of the new methods for accessing and incorporating fundamentals into your code.
In tandem with this news, I'm delighted to announce two special offers:
1. As of January 1st, every registered user of Quantopian, will be guaranteed 6 months of complimentary access to the fundamental data in the Quantopian backtester. So if you've been lying in wait, now is the time to register for free. And if you have friends who are fundamental investors, please pass along the news!
2. Share your coolest fundamentals-based algorithm to the community forum before January 1st for a chance to win an additional 12 months of free access. Justin Lent, Quantopian's new director of fund development, will review submissions and select the best sample algos to be highlighted on our blog in January. Submit your entry by January 1st and tag it with #Fundamentals in the post title to be considered.
We can't wait to see the burst of creativity in the community that this is surely going to unleash.
Get started on Quantopian today.