Predicting future returns of trading algorithms: Bayesian cone

August 31st, 2015   Posted by: Thomas Wiecki

Authors: Sepideh Sadeghi and Thomas Wiecki

Foreword by Thomas

This blog post is the result of a very successful research project by Sepideh Sadeghi, a PhD student at Tufts who did an internship at Quantopian over the summer 2015. Follow her on twitter here.

All of the models discussed here-within are available through our newly released library for finance performance and risk analysis called pyfolio. For an overview of how to use it see the Bayesian tutorial. Pyfolio is also available on the Quantopian research platform to run on your own backtests.

Introduction

When evaluating trading algorithms we generally have access to backtest results over a couple of years and a limited amount of paper or real money traded data. The biggest with evaluating a strategy based on the backtest is that it might be overfit to look good only on past data but will fail on unseen data. In this blog, we will take a stab at addressing this problem using Bayesian estimation and prediction of possible future returns we expect to see based on the backtest results. At Quantopian we are building a crowd-source hedge fund and face this problem on a daily basis.

Here, we will briefly introduce two Bayesian models that can be used for predicting future daily returns. These models take the time series of past daily returns of an algorithm as input and simulate possible future daily returns as output. We will talk about the variations of models that can be used for prediction and how they compare to each other in another blog, but here we will mostly talk about how to use the predictions of such models to extract useful information about the algorithms.

All of these models are available through our newly released library for finance performance and risk analysis called pyfolio. For an overview of how to use it see the Bayesian tutorial. Pyfolio is also available on the Quantopian research platform to run on your own backtests.

How do we get the model inputs?

At Quantopian we have built a world-class backtester that allows everyone with basic Python skills to write a trading algorithm and test it on historical data. The resulting daily returns generated by the backtest will be used to train the model predicting the future daily returns.

What can be learned from the predictive models?

Lets not forget that computational modeling always comes with some risks such as estimation uncertainty, model misspecifications and implementation limitations and errors. According to such risk factors, model predictions are not always perfect and 100% reliable. However, model predictions still can be used to extract useful information about algorithms, even if the predictions are not perfect.

For example, comparing the actual performance of a trading algorithm on unseen market data with the predictions generated by our model can inform us whether the algorithm is behaving as expected based on its backtest or whether it is overfit to only work well on past data. Such algorithms may have the best backtest results but they may not necessarily have the best performance in live trading. An example of such an algorithm can be seen in the picture below. As you can see, the live trading results of the algorithm are completely out of our prediction area, and the algorithm is performing worse than our predictions. These predictions are generated by fitting a linear line through the cumulative backtest returns. We then assume that this linear trend continuous going forward. As we have more uncertainty about events further in the future, the linear cone is widening assuming returns are normally distributed with a variance estimated from the backtest data. This is certainly not the best way to generate predictions as it has a couple of strong assumptions like normality of returns and that we can confidently estimate the variance accurately based on limited backtest data. Below we show that we can improve these cone-shaped predictions using Bayesian models to predict the future returns.

bad_out_sample
On the other hand, there are algorithms that perform equally well on data from the past and on live trading data.  An example of that can be seen in the picture below.

good_out_sample
And finally, we can find differences between the algorithm behavior in the past and in live trading period that are due to changes in the market and not due to the characteristics of the algorithm itself. For example the picture below illustrates an algorithm, which is doing pretty well until sometime in 2008, but all of a sudden it crashes as the market crashes.

stress_testing

Why Bayesian models?

In the Bayesian approach we do not get a single estimate for our model parameters as we would with maximum likelihood estimation. Instead, we get a complete posterior distribution for each model parameter, which quantifies how likely different values are for that model parameter. For example, with few data points our estimation uncertainty will be high reflected by a wide posterior distribution. As we gather more data, our uncertainty about the model parameters will decrease and we will get an increasingly narrower posterior distribution. There are many more benefits to the Bayesian approach, such as the ability to incorporate prior knowledge that are outside the scope of this blog post.

Now that we have answered the problem of why predicting future returns and why using Bayesian models for this purpose, lets briefly look at two Bayesian models that can be used for prediction. These models make different assumptions about how daily returns are distributed.

Normal model

We call the first model the normal model. This model assumes that daily returns are sampled from a normal distribution whose mean and standard deviation are accordingly sampled from a normal distribution and a halfcauchy distribution. The statistical description of the normal model and its implementation in PyMC3 are illustrated below.

This is the statistical model:

mu ~ Normal(0, 0.01)
sigma ~ HalfCauch(1)
returns ~ Normal(mu, sigma)

And this is the code used to implement this model in PyMC3:

with pm.Model():
    mu = pm.Normal('mean returns', mu=0, sd=.01, testval=data.mean())
    sigma = pm.HalfCauchy('volatility', beta=1, testval=data.std())
    returns = pm.Normal('returns', mu=mu, sd=sigma, observed=data)

    # Fit the model
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(samples, step, start=start)

T-model

We call the second model the T-model. This model is very much similar to the first model except that it assumes that daily returns are sampled from a Student-T distribution. The T distribution is very much like a normal distribution but it has heavier tails, which makes it a better distribution to capture data points that are far away from the center of data distribution. It is well known that daily returns are in fact not normally distributed as they have heavy tails.

This is the statistical description of the model:

mu ~ Normal(0, 0.01)
sigma ~ HalfCauchy(1)
nu ~ Exp(0.1)
returns ~ T(nu+2, mu, sigma)

And this is the code used to implement this model in PyMC3:

with pm.Model():
    mu = pm.Normal('mean returns', mu=0, sd=.01)
    sigma = pm.HalfCauchy('volatility', beta=1)
    nu = pm.Exponential('nu_minus_two', 1. / 10.)

    returns = pm.T('returns', nu=nu + 2, mu=mu, sd=sigma, 
                   observed=data)

    # Fit model to data
    start = pm.find_MAP(fmin=sp.optimize.fmin_powell)
    step = pm.NUTS(scaling=start)
    trace = pm.sample(samples, step, start=start)

 

Prediction Cone: Visualization of predictions for live trading period

Here, we describe the steps of creating predictions from our Bayesian model. These predictions can be visualized with a cone-shaped area of cumulative returns that we expect to see from the model. Assume that we are working with the normal model fit to past daily returns of a trading algorithm. The result of this fitting this model in PyMC3 is are the posterior distributions for the model parameters mu (mean) and sigma (variance) –  fig a.

Now we take one sample from the mu posterior distribution and one sample from the sigma posterior distribution with which we can build a normal distribution. This gives us one possible normal distribution that has a reasonable fit to the daily returns data. - fig b.

To generate predicted returns, we take random samples from that normal distribution (the inferred underlying distribution) as can be seen in fig c.

Having the predicted daily returns we can compute the predicted time series of cumulative returns, which is shown in fig d. Note that we have only one predicted path of possible future live trading results because we only had one prediction for each day. We can get more lines of predictions by building more than one inferred distribution on top of actual data and repeating the same steps for each inferred distribution. So we take n samples from the mu posterior and n samples from the sigma posterior. For each posterior sample, we can build n inferred distributions. From each inferred distribution we can again generate future returns and a possible cumulative returns path (fig e). We can summarize the possible cumulative returns we generated by computing the 5%, 25%, 75% and 95% percentile scores for each day and instead plotting those. This leaves us with 4 lines marking the 5, 25, 75 and 95 percentile scores. We highlight the interval between 5 and 95 percentiles in light blue and the interval between 25 and 75 percentiles in dark blue to represent our increased credible interval. This gives us the cone illustrated in fig f. Intuitively, if we observe cumulative returns from an algorithm that are very different from the backtest, we would expect it walk outside of our credible region. In general, this procedure of generating data from the posterior is called a posterior predictive check.

bayescone_a
(a)
bayescone_b
(b)
bayescone_c
(c)
bayescone_d
(d)
bayescone_e
(e)
bayescone_f
(f)

 

Overfitting and Bayesian consistency score

Now that we have talked about the Bayesian cone and how it has been generated, you may ask what these Bayesian cones can be used for. Just to give a demonstration of what can be learned from Bayesian cones, look at the cones illustrated below. The cone on the right shows an algorithm whose live trading results are pretty much within our prediction area and to be more accurate even in high confidence interval of our prediction area. This basically means that the algorithm is performing in line with our predictions. On the other hand, the cone on the left
shows an algorithm whose live trading results are pretty much outside of our prediction area, which would prompt us to take a closer look as to why the algorithm is behaving according to specifications and potentially turn it off if it is used for real-money live trading. This underperformance in the live trading might be due to the algorithm being overfit to the past market data or other reasons that should be investigated by the person who is deploying the algorithm or selects whether to invest using this algorithm.

algo1_bayescone algo2_bayescone

Lets take a look at the prediction cones generated using the simple linear model we described in the beginning of the blog. It is interesting to see that there is nothing worrisome about the algorithm on the left, while we know that the algorithm illustrated on the right is overfit and the fact that the Bayesian cone gets at that but the linear cone does not, is reinforcing.

algo1_prevcone algo2_prevcone

One of the ways by which the Baysian cone can be useful is detecting the overfit algorithms with good backtest results. In order to be able to numerically measure by how much a strategy is overfit, we have developed Bayesian consistency score. This score is a numerical measure to report the level of consistency between the model predictions and the actual live trading results.

For this, we compute the average percentile score of the paper-trading returns to the predictions and normalize to yield a value between 100 (perfect fit) and 0 (completely outside of cone). See below for an example where we get a high consistency score for an algorithm (the right cone) which stays in the high confidence interval of the Bayesian prediction area (between the 5 to 95 percentiles) and a low value for an algorithm (the left cone) which is mostly out of predicted area.

Accounting for estimation uncertainty

Estimation uncertainty is one of the risk factors, which becomes relevant with modeling and it is reflected on the width of the prediction cone. The more uncertain our predictions, the wider the cone would be. There are two ways by which we may get uncertain predictions from our model: 1) little data, 2) high volatility in the daily returns. First, lets look at how the linear cone deals with uncertainty due to limited amounts of data. For this, we create two cones from cumulative returns of the same trading algorithm. The first only has the 10 most recent in-sample days of trading data, while the second one is fit with the full 300 days of in-sample trading data.

linear_cone_uncertainty

Note how the width of the cone is actually wider in the case where we have more data. That's because the linear cone does not take uncertainty into account. Now let's look at how the Bayesian cone looks like:

bayes_cone_uncertainty

As you can see, the top plot has a much wider cone reflecting the fact that we can't really predict what will happen based on the very limited amount of data we have.

Not accounting for uncertainty is only one of the downsides of the linear cone, the other ones are the normality and linearity assumptions it makes. There is no good reason to believe that the slope of the regression line corresponding to the live trading results should be the same as the slope of the regression line corresponding to the backtest results and normality around such line can be problematic when we have big jumps or high volatility in our data.

Summary

Having reliable predictive models that not only provide us with predictions but also with model uncertainty in those predictions allows us to have a better evaluation of different risk factors associated with deploying trading algorithms.  Notice the word “reliable” in my previous sentence, which is to refer to the risk of “estimation uncertainty”, a risk factor that becomes relevant with modeling and ideally we would like to minimize it.  There are other systematic and unsystematic risk factors as is illustrated in the figure below. Our Bayesian models can account for volatility risk, tail risk as well as estimation uncertainty.

riskfactors

Furthermore, we can use the predicted cumulative returns to derive a Bayesian Value at Risk (VaR) measure. For example the figure below
shows the distribution of predicted cumulative returns over the next five days (taking uncertainty and tail risk into account). The line
indicates that there is a 5% chance of losing 10% or more of our assets over the next 5 days.

bayesian_VaR

To use these models on your data, check out pyfolio, which is also available on the Quantopian research platform to run on your backtests.

Share on FacebookShare on LinkedInShare on Twitter

Pyfolio -- a new Python library for performance and risk analysis

August 24th, 2015   Posted by: Thomas Wiecki

Today, we are happy to announce pyfolio, our open source library for performance and risk analysis. We originally created this as an internal tool to help us vet algorithms for consideration in the Quantopian hedge fund. Pyfolio allows you to easily generate plots and information about a stock, portfolio, or algorithm.

Tear sheets, or groups of plots and charts, are the heart of pyfolio. Some predefined tear sheets are included, such as sheets that allow for analysis of returns, transactional analysis, and Bayesian analysis. Each tear sheet produces a set of plots about their respective subject.

Pyfolio is available now on Quantopian's research environment. It can be used standalone or in conjunction with zipline.FB_returns_tear_sheet

Here is part of a tear sheet analyzing the returns of Facebook's (FB) stock:

 

 

pyfolio is available now on the Quantopian Research Platform. See our forum post for further information.

Finally, see pyfolio website or pyfolio's GitHub page, where you can see the full source code or contribute to the project.

Share on FacebookShare on LinkedInShare on Twitter

Notebooks, backtests, and video lectures. All in one spot: https://www.quantopian.com/lectures.

We have launched a free quantitative finance education curriculum for our community. We have held a series of lectures via meetups and webinars that cover the concepts of the curriculum and demonstrate how to use our platform and tools to write a good algorithm. We have also released cloneable notebooks and algorithms for each topic covered. As promised, we have developed a home to house this curriculum. Please check out our new Lectures page and learn more about:

  • Linear Regression
  • Spearman Rank Correlation
  • Pairs Trading
  • Beta Hedging
Kalman Filters

Kalman Filters

This curriculum is being developed in concert with our academic program, in which we're working with professors from MIT Sloan, Stanford, and Harvard. The education material we're generating is being vetted by professors as they use our platform to teach their classes, so you can expect to get the same materials that are in use at top schools.

https://www.quantopian.com/lectures

Share on FacebookShare on LinkedInShare on Twitter

Secret Sauce vs Open Source

August 12th, 2015   Posted by: fawce

For too long, finance has been a net consumer of open source. Major financial institutions and startups alike have only used major open source projects, but virtually none have launched new projects of any significance. That's simply not good enough. To lead in the technology world, financial firms need to found, maintain, and advance major projects of their own.

So, I was really excited to read today's Wall Street Journal report titled:

Goldman Sachs to Give Out ‘Secret Sauce’ on Trading
New open source platform is an attempt by Goldman to bolster its technology bona fides

Sadly, sharing the Secret Sauce is not the same as Open Source. The article leads me to believe that Goldman will unbundle some of their services into smaller applications. That is a great strategic move, but it isn't open source. It isn't even API access. It is just repackaging – very shrewd repackaging that will tend to lock customers in.

Open source software has become central in the technology industry and to the culture of our profession. Open source projects are where developer communities grow (Python, R, docker), new businesses emerge (github), and technologies mature (hadoop).

Financial services companies need to do more than simply use open source; they need to lead open source projects; they need to contribute code. Otherwise, the financial community is left isolated and outside the mainstream of the software community. Many of the best and most innovative minds stay out of finance because they can't maintain the long term relationship with their work that only open source allows. You can't expect to be competitive in technology without connecting with the software community, and contributing to the culture. Without launching and leading open source, financial technologists risk irrelevance.

Goldman's CTO, R. Martin Chavez, seems to get it. He's built software, teams, and companies. The way he talks about the technology world, I think he really understands the underlying dynamics of the ecosystem. I believe he has the right vision; I'm rooting for him. But, I don't think today's announcement goes far enough.

The article describes SecDB, a legendary system that helps Goldman manage risk. Maybe someday SecDB will be a cloud service with pure API access, and the software packages developed to handle the volume of pricing changes will be opensourced. That would be progress. That would be the visionary leadership our industry needs.

Share on FacebookShare on LinkedInShare on Twitter

New Features This Summer on Quantopian

August 3rd, 2015   Posted by: Dan Dunn

This is the latest edition of the Quantopian product update.  It's not that we like tooting our own horn, but sometimes people like to go see new features that they missed.

The biggest feature is Quantopian Research.  It's now available to everyone!

In addition to the product work, the Quantopian Open, has awarded its 5th prize of $100,000 to manage.  We've done meetups in Boston and New York City. We have started integrating Quantopian into courses at various universities.  And, of course, the hedge fund development continues behind the scenes.  You can see some of that progress on the fund page.

Tools and Features

  • Released research access to all members
  • Added fundamental data to the research environment
  • Expanded the tradeable stock universe size from a maximum of 200 to 500 securities
  • Futures trading is available in Zipline
  • Made progress on a number of areas of backtest performance
  • Improved the data quality and performance in our fundamentals and pricing databases
  • Created a new sample algorithm explaining mean reversion
  • Added security enhancements
  • Spaces are converted to tabs automatically when pasting code into the IDE
  • Fixed bug if a ticker symbol is reused over time
  • Improved mobile interface when viewing live algorithms
  • Fixed a bug with the legend on custom variables displayed in backtest charts on forum posts
  • Improved documentation and sample algorithms

Quantopian Open

  • Added some free consulting to the prize package
  • Added a beta filter to the contest criteria to align the the selection process with the fund
  • Badges on leaderboard for entries that passed the filters
  • Announced three new contest winners
  • Faster page loading
  • Improved the mobile experience for accessing contest data
  • Improved communications if entry was disqualified in the contest
  • Contest rules updated in July edition to add the hedging requirement and remove the backtest score
  • Contest rules updated in August edition to add 6 month contests, remove the consistency factor, and swap the Sortino ratio in for the Calmar ratio

Research Environment

  • You can now clone shared notebooks
  • Included a guided tour to research
  • Added more tutorial notebooks to research

Community

  • Added notebook icon to forum posts with attached notebooks
  • Improved styling of buttons and text throughout the site

Outreach

  • Unveiled Quantopian's integrations with academic institutions
  • Began the Summer Lecture Series, detailing different ways to improve your algos
  • Hosted six meetups in Boston and New York

For up to the minute updates, subscribe to our RSS feed and follow us on Facebook, Twitter, and Linkedin.

Share on FacebookShare on LinkedInShare on Twitter

Michael Van Kleeck is June's Quantopian Open Winner

July 20th, 2015   Posted by: Dan Dunn

Congratulations to Michael Van Kleeck, our fifth winner of the Quantopian Open! As of right now, Michael is live trading $100,000 of Quantopian’s capital and will be receiving all the profits from his algorithm.

MVKedit

Michael got his start in the market in graduate school, when he began analyzing futures in his spare time. Later, he started trading equities and options while he was working as a software developer. He soon joined a startup hedge fund, researching strategies and implementing an automated futures trading system. As of now he trades his own account and is continuing to work on his algos.

Two years ago, Michael joined Quantopian after a former hedge fund colleague told him about it. Michael says he harnessed the power of "plentiful data, easy backtesting, and friendly community" to build his strategies. He became active in the community, giving a talk at the San Diego Python Users Group. He spent a few months away from Quantopian while pursuing options algorithms, but started actively developing on the platform again once the Quantopian Open launched.

Michael calls himself a “voracious” reader of any literature dealing with trading, and is always on the lookout for simple, easily testable hypotheses exploring the fundamental principles of “economies, markets, or human psychology.”

Michael started live trading his winning algorithm a week ago with our $100,000 brokerage account. He will keep all the profits generated after 6 months. You can track him and the other winners on our Quantopian Open winner’s page.

In addition, you can check out the leaderboard to see how everyone is doing for July’s contest. And we are currently accepting algos for August, so don’t forget to submit by August 3 at 9:30AM EDT - you could be the next winner!

Share on FacebookShare on LinkedInShare on Twitter

The Quantopian Summer Lecture Series has Arrived

July 15th, 2015   Posted by: wbadmin

By Delaney Granizo-Mackenzie

We are experimenting with a free quantitative finance education curriculum for our community. We have started holding a series of lectures via meetups and webinars that cover the concepts of the curriculum and demonstrate how to leverage our platform and tools to write a good algorithm. We are also releasing corresponding cloneable notebooks and algorithms for each topic covered.

This curriculum is being developed in concert with our academic outreach initiative, in which we're working with professors at schools including MIT Sloan, Stanford, and Harvard. The education material we're generating is being vetted by professors as they use our platform to teach their classes, so you can expect to get the same materials that are in use at top schools.

Webinars & Cloneable Notebooks and Algorithms

Our first lecture, "The Art of Not Following the Market" covered some approaches for reducing correlation to a benchmark and discussed why returns aren’t everything. You can view the webinar and then clone the corresponding notebooks and algos here.

"The Good, The Bad, and The Correlated" focused on how to diversify your portfolio by minimizing correlation between your assets’ return streams. To learn more, view the webinar and corresponding notebooks from this lecture.

"You Don't Know How Wrong You Are" reviewed problems with how estimates are often taken and discuss some ways to quantify this uncertainty. This lecture is broken into 3 parts. Learn more, view the corresponding webinars and notebooks below.

 

Screen Shot 2015-07-15 at 1.38.25 PM

Upcoming Schedule of Lectures 

Roughly every two weeks, we will hold a meetup in our Boston office; the same material will be covered in a live webinar on Thursday of the next week. Until a permanent home is built on our site, we will offer the recording of the webinar out to our community via our community forum. We will also link back to the notebooks and algorithms for each.

Here are the upcoming lectures scheduled:

  • On August 5th, we will hold "You Don't Know How Wrong You Are Part 2: This Time You're More Wrong" meetup in Boston. RSVP herefor the event. The corresponding webinar will be held on August 6th and you can RSVP here.
  • On August 26th in Boston: "What's In Your Returns?" will cover how factor models are useful for researching strategies, reducing dependencies on external factors, and understanding your returns distribution. The corresponding webinar will be held on August 27th.

Stay Tuned

We will be publishing the upcoming schedule, webinars, and cloneable algorithms and notebooks in the community forum. We'll also be setting up a permanent home and distribution center for our curriculum notebooks and algorithms – and will share this in the coming months. Hope you can join us!

 

 

Share on FacebookShare on LinkedInShare on Twitter

Best In Class, Using Quantopian

June 30th, 2015   Posted by: Dan Dunn

When told he would be competing with his peers in his Alternative Investments class, Olivier Lapierre knew he would have to do something to set him apart from his classmates.

326913_10151220807635873_1680330548_o

He, along with his teammates Pierre-Luc Nadeau, Marc-Antoine Larochelle Rodrigue, Pierre-Gabriel Gagnon, and Jean-Simon Caron, decide to build their team's hedge fund through Quantopian. The assignment was to create a fund entirely from scratch, including figuring out the strategies, fees, legal partners, and location of the fund. When it came time to backtest strategies, every other team was manually backtesting in Excel, but Olivier was able to work faster and better by creating quantitative algos in Quantopian. The team won first place.

Olivier has been interested in quantitative investing since he was 17 years old - in his mind, “if you're not doing something automated and quantitative, you’re not being competitive in the market.” He and a friend had wanted to build a hedge fund even before their Alternative Investments class, and wanted it to be completely algorithmic.

Even though Lapierre was taking mathematics and programming classes, he still found a lot of trouble feeding the algorithms to a data feed and broker to build his idea. Then he discovered Quantopian. He stumbled upon it through Reddit (r/finance), and realized it would make their project much more fully developed, telling his friend “this is going to change your life.”

Olivier found that the Quantopian community forums are a great place for examples and inspiration. He found a strategy in the community, and tweaked it until he got just what he needed. “The tutorials were well made and it was easy jump in,” says Lapierre, and the team was quickly able to find a percentile channel momentum algorithm, make a few changes and add a volatility filter, and that was the winner. Even though he ended up using pricing data instead of fundamental data (due to a fast approaching deadline for the project), he and his team still ended up blowing their competition out of the water.

Olivier and his team are currently paper trading with their algo to get a good track record and working on moving it to live trading, in anticipation of the upcoming CFA Exam.

Trying to get a leg up on your competition like Olivier?  Start paper trading your algorithms now and build the track record you need to succeed!

Do you have an example of how Quantopian has changed your life? Let us know! We’re excited to hear about your experience.

Share on FacebookShare on LinkedInShare on Twitter

“$AAPL is seriously going to kill it today.”

What do you think this sentence is saying? My guess is, you’re probably not struggling to understand what it means. You knew from first glance that “$AAPL is seriously going to kill it today” was another way of saying, “$AAPL’s stock price is going to increase today”.

The same can’t be said for a computer. If you fed that same sentence into a Python interpreter, all it'd know is that there’s a String, 42 characters long, somewhere in memory. That's where the problem lies. Millions of messages just like these are lost everyday because that's all they are, messages. And while the author spent the morning buying up millions of $AAPL stock, you're pressing "Buy" three hours too late. So what if you could change that? What if you could feed that sentence into an algorithm and know exactly what people are feeling about the market?

That’s exactly what the folks over at PsychSignal set out to do. PsychSignal is a provider of online “Trader Mood Indices” to the financial industry. Their proprietary technology enables any data consumer to gain critical insight into the trader-specific moods behind over 10,000 publicly-traded securities. They built a highly specialized natural language processing engine that’s able to parse through online market conversations to not only analyze and detect the trader moods bullishness & bearishness, but also score the mood intensity. With data going back to 2009 and sources which include StockTwits, Twitter, T3Live Chat and other private stock market specific chat rooms, they have a plethora of datasets - while this backtest looks at StockTwits derived messages specifically.

In collaboration with the PyschSignal team, we created a simple long-only trading strategy that used their sentiment scores to determine which stocks in the NASDAQ 100 to long to present an intro-level strategy using this new type of data.

Screen Shot 2015-06-29 at 9.27.19 AM

Data Basics:

  • BULLISH_INTENSITY – The PsychSignal intensity score from their sentiment engine
  • BULL_SCORED_MESSAGES – The number of bullish-scored messages from that day
  • TOTAL_SCANNED_MESSAGES – The total number of StockTwits messages for that day
  • BULL_MINUS_BEAR – The bullish intensity score minus the bearish intensity score

Algorithm Rules:

  • Total number of scanned messages has to be greater than 50 for the past 30 days
  • Average Bull minus Bear intensity score has to be greater than 0 for the past 30 days
  • Each security that meets the filter receives a score where the score is the bullish intensity level multiplied by the number of bull messages.
  • Long the top 5 securities with the highest score
  • Use the NASDAQ 100 as the securities universe

Simply by trading on the stocks with the highest bull sentiment, I was able to create an enhanced NASDAQ 100 portfolio. However, while the strategy significantly outperforms the market, Dr. Checkley from PsychSignal comments that:

We can guess that those who tweet and who tweet frequently are highly-attuned to market conditions. They are “bell-weathers” for general market mood. If so, then using sentiments derived from tweets is likely to distil the more extreme emotions and lead to highly sensitive indicators. Trading on those indicators gives a high beta.

There are no short positions, which goes hand-in-hand with the beta problem. But clearly, it would be no great challenge to, say, “invert” the analysis above and seek-out strong bear signals for short trades. And there is unlimited scope for changing filters, the buy and sell rules, the universe of stocks, etc. As you get comfortable with using these sentiment metrics, we recommend blending the bull and bear measures with other indicators, such as long and short-term price trend or trade volume. You can also use a more sophisticated model, such as a Neural Network, to create your predictions and trading signals. With hundreds of heavily-tweeted stocks and assets to choose from, and your full arsenal of analytical tools and algorithm-refinements, you can improve on our results in short-order.

It's not an exact science and this strategy only skims the surface of what's possible. However, as technology improves, more and more models & investment strategies will begin incorporating this kind of data.

Links:

 

Share on FacebookShare on LinkedInShare on Twitter

May's Quantopian Open Winner, Szilvia Hegyi

June 22nd, 2015   Posted by: Dan Dunn

Another month, another winner! A big congratulations to Szilvia Hegyi for taking first place in the May round of the Quantopian Open. Szilvia is now managing $100,000 of Quantopian’s money and will keep all the profits her algorithm generates.Szilvia

Szilvia started out as an economist graduating from Budapest Business School, specializing in financial institutions. Shortly after graduating, she worked for the Austrian bank Raiffeisen in the corporate finance field. She then began working as an analyst and consultant, taking a look at the processes of global companies from both a strategic and operational point of view. She has gained experience with insurance, automotive, retail, and many other industries.  She often encountered large datasets in her work, and that led her to study more in-depth statistical research methods and data science.

In 2012, She co-founded BrinicleLab, a financial research company, and is mainly responsible for portfolio development and backtest evaluation. Szilvia worked with a team of two other coworkers to come up with her winning algo.

She came across Quantopian when BrinicleLab was looking for platforms to evaluate their own research and algorithms. When asked about her thoughts on Quantopian, Szilvia said she could really get behind Quantopian’s philosophy and framework, and that she believes it could “revolutionize the finance industry.”

Szilvia and her team used a rigorous process to develop the winning algorithm. The team spent about 6 months collecting and reviewing 10 years worth of research papers on algorithmic investing from the world’s top universities. They took the 30 best papers they found to use as their basis for their algorithm. Then came about 3 months worth of coding, followed by hundreds of backtests through many different asset classes in order to determine the final parameters. Then came the portfolio selection and validation and lastly, on June 1, they began the real-money trading of $100,000.

Szilvia will be able to keep whatever profits she earns after six months. You can follow her (and the other winners') progress on the Quantopian Open winners page.

You can also to a look at the leaderboard to see where everyone stands on the current contest. And although June entries are now closed, we are accepting entries for July’s competition, so make sure to submit! Who knows, pretty soon we might end up writing about you.

Share on FacebookShare on LinkedInShare on Twitter
  • Follow Us!

    Friend me on FacebookFollow my company on LinkedInRSS FeedFollow me on Twitter
  • Quantopian

    Get email updates about Quantopian:
  • Recent Posts

  • Recent Comments

  • Categories

  • Archives