Heartbleed Bug

April 8th, 2014   Posted by: jik

Along with many other sites on the internet, Quantopian is taking steps to protect ourselves from the “Heartbleed Bug“, which was disclosed yesterday. Although we have no reason to believe that our site or any of our members’ accounts or data have been compromised, we are taking a number of precautions to safeguard the security of our members’ accounts. We will be documenting here the steps we are taking.

[DONE] We are generating a new SSL certificate to protect our site, using a newly generated encryption key; deploying the new key and certificate to our servers; and asking our SSL certificate authority to revoke our old certificate.

[DONE] We are adding a prominent banner within our application notifying all members to change their passwords. The banner will go away automatically when the user’s password is changed.

[DONE] We are requiring all members who have brokerage accounts configured within Quantopian to change their passwords.

[DONE] We are modifying our application so that members are not able to configure a brokerage account within Quantopian until they have changed their password.

[DONE] We are rotating the passwords and encryption keys used by the components of our application when they are communicating with each other. This requires application down-time the evening of April 8, 2014, starting at 5:00pm US/Eastern.

[IN PROGRESS] We are generating new encryption keys used to protect data in our databases and re-encrypting all data using the new keys.

 

Highlights of Fawce’s Reddit AMA (Ask Me Anything)

March 27th, 2014   Posted by: fawce

Quantopian founder and CEO John “Fawce” Fawcett was invited by Reddit’s investing subreddit to host an AMA (Ask Me Anything). The turnout was overwhelming! With over 90 comments in the thread, we’ve pulled out the highlights. Take a look below at the conversation summary:

Q: How much computer coding ability must one have to use this well?

Fawce: You don’t need to be a coding expert to work on an algorithm in Quantopian. Some algorithms can be coded up with some fairly basic logic, stuff that you’d learn in early stages of a programming class. The more intricate your logic, or the more advanced your mathematical operations are, the more you’re going to need advanced skills.

We think that algorithm collaboration (upcoming new feature) is going to help with this quite a bit – people with great ideas will be able to pair with people with more coding skills, and they can collaborate on a result. We also think there is a future in renting of algorithms.

If you’re already pretty decent at Python, I highly recommend Wes McKinney’s book on pandas. He’s one of our advisors, and he’s done a ton of work to make complex mathematical operations easy in Python.

Q: Do you guys have any plans to integrate company financial data for the fundamental investors?

Fawce: Yes, we definitely do. We get this request all the time, and we are chomping at the bit to deliver it. Company financial data appeals to more than just fundamental investors, too. We think that algorithmic trading has three steps – data exploration, coding/backtesting, and live trading. We built the the backtester first, then we did live trading, and data exploration is next.

Q: Written any good algorithms yourself?

Fawce: I worked with a few other people to write the algo that is trading Quantopian’s money right now, which we shared with the world: https://www.quantopian.com/posts/paper-trading-with-interactive-brokers-open-beta-launch

I love algo ideas that model relationships between securities, the simplest of which is pairing. I also like Ernie Chan’s explanations of these types of trades, so I wrote a co-integration algo on gld/gdx. More recently, another quant posted Ernie Chan’s EWA/EWC pair trade. His implementation is cool because it also uses a Kalman filter: https://www.quantopian.com/posts/ernie-chans-ewa-slash-ewc-pair-trade-with-kalman-filter?c=1

Q: What does your pricing model look like?

Fawce: Right now, everything is free. We’re focused on bringing the community together, and we’re talking to our current live traders about pricing.The basic structure will be: – backtesting and forward testing are free – trading through your broker will be $TBD/month/algorithm.

Q: any chance you’ll offer I/B/E/S data, maybe for an extra cost?

Fawce: Yes, we’re talking to a number of data vendors about estimate data. I see fundamental corporate data as breaking between historical data from filings and forward estimates. I’m a huge fan of Estimize, and I think they are part of a major trend to use crowdsourcing to improve data origination and data quality.

As our community continues to grow, one way to draw more data to the platform is to become a sales channel for the data providers. I think that core data like historical filings needs to be free for research and testing, but that freshly delivered data used in live trading can be fee-based.

Q: How much AUM is currently being managed by Quantopian-hosted live-trading algorithms?

Fawce: This is really the question about our platform today. I have to be a little bit vague, because we haven’t spoken about total assets under management (AUM) publicly yet, and it is one of the best hooks we have for news coverage. I’ve actually started referring to the total as our “Algorithmically Directed Assets” or ADA since we aren’t a broker or fund manager and don’t hold your account.

We’re trading through 30 IB accounts today in private beta. I feel very comfortable that we are getting both sufficient technical burn-in and that we are directing a meaningful amount of capital. At the moment, the bottleneck for ADA growth seems to be algorithm development. But we know people are digging in, because we’ve seen a significant jump in the amount of coding/backtesting/paper trading over the past few weeks following our announcement of live trading.

Q: I love the things you are doing! How many developers did you initially start out with? Did you start doing the coding (yourself?) before venture capital started rolling in? Or how did the launch come along? … Do you feel that your hosting is latency sensitive (yet?)?.. I am just wondering how you go from idea to working with something so awesome :)

Fawce: My wife was able to support our family for the first phase of Quantopian’s existence, so I was able to invest the proceeds of selling my last company into Quantopian. To keep costs low I set up for work in our shed. I wrote the original prototype myself, and the first website drew a good initial audience after a few blogs mentioned it.

I started in June of 2011, and around November it started to get really cold in the shed, so I figured I should try to get some funding so I could get a real office :). I met Spark Capital around then, and they lead our seed round in January of 2012. Then I hired my CTO, and we worked together for about 6 months re-writing the prototype to be “real”. But, we didn’t get our office until June of 2012 when we wanted to ramp up our team. Instead, I worked from the public library in my town through the winter :).

Everything is hosted on amazon. We’re specifically avoiding HFT, so latency of 1-2s is our target. We see much less than that.

If you have passion for an idea, it will gain momentum when you talk to other people about it. I also relied on the opinions of others – my wife, friends with experience starting companies, friends from finance, and a few mentors.

Working alone from the shed and public spaces was a humble start, but it still felt awesome in the beginning. Starting up is a thrill, and each milestone feels epic – loading a month of historical trade data seemed incredible when I did it the first time. And when the first really avid user who started emailing me feature requests, I was dizzy with excitement. Same with hiring people. You have to really enjoy each step so that you can weather the setbacks. We had to make it through losing to big companies in recruiting, realizing we couldn’t afford all the data purchases we wanted to make originally, and live trading being way harder than we expected. Thanks for asking!

To see the full conversation thread, click here.

Common Types of Trading Algorithms

March 7th, 2014   Posted by: Alisa Deychman

This is a short overview of common types of quantitative finance algorithms that are traded today.  Of course, this is only an overview, and not comprehensive! Let me know if you think there are other algo types I should cover.

  • Momentum - The trend is your friend
    Momentum investing looks for the market trend to move significantly in one direction on high volume and join the parade. Investors try to ride the “hot stocks” and sell the “cold ones”. Stock prices can surge suddenly, and continue to rise when people try to reap profit from the upward trend. This stems from the rate of information release and people’s herd mentality. An insider (including executives, suppliers, stakeholders) may have information on a company and share it, causing the stock to move. As the news cascades, the price is driven up while more traders hear of the news and the news cascades through all the channels. A simple example of this strategy is to buy a stock when the recent price is above a moving average and sell it when it’s below the moving average.  A simple strategy is to rank the sectors and buy the top stocks when their trailing moving average exceeds a threshold. See example strategy
  • Mean Reversion - What goes up must come down
    Mean reversion investors assume that the price of the stock will over time revert back to its long-time average price.  They use stock price analysis to determine the trading bounds of statistical significance. If the stock is trading significantly above the moving average, they will short it. On the other hand, if the stock is trending significantly below its moving average, they will buy it. See example strategy
  • Valuation - Bargain Shopping
    Valuation strategies use fundamental analysis to identify stocks trading at a discount (or premium) and buy (or sell) them accordingly. Investors determine the fundamental value of the stock and compare to its market price. They scrutinize companies’ financial statements for the revenue, gross margin, operating cash flow, EBITDA, pro forma earnings to name a few.  In a simple example, investors can use the price to earnings ratio as a proxy for value where low P/E looks ‘cheap’ and high P/E looks ‘expensive’.  They can continue to rank a universe 1-100 based on the P/E ratio and take either of two positions: Long only or market neutral. In a long only strategy, the investor buys the bottom decile, which is the lowest P/E segment. The market neutral strategy will buy the bottom decile while simultaneously selling the top decile. See example strategy
  • Seasonality - Sell in May and go away
    Investors create strategies that depend on the time of year. It’s well documented that markets tend of have better returns at the end of the year and during the summer months, while September is usually a month with lower returns. In order to avoid capital loss, some investors choose to sell their positions with losses at the end of December to benefit from tax leniency. In January, investors return in triumph and purchase small-cap and value stocks, driving up their prices. Stock prices also trend differently around holidays and quarter close periods. A simple strategy is to buy and hold equities (SPY) from October – April and then rotate to buy and hold bonds (BSV) from May – September. See example strategy
  • Sentiment - Buy the rumor, sell the news
    Sentiment Analysis trading derives from crowd psychology, where investors stay up-to-date on recent news and purchase stocks predict the crowd’s reaction. They attempt to capture short term price changes and reap the quick benefits. Investors may monitor sources including Google search trends, media outlets, blogs/forums, and Twitter posts. See example strategy
  • Fundamental Investing
    This is a way of evaluating the true intrinsic value of a stock by examining macro-level factors such as econonmic indicators, industry and sector comparisons, and analyzing company’s financial statements. The calculations derived from real data attempt to model the stock’s true value, which is then compared to the  stock’s market price – driving the decision to buy or sell. Example data points for fundamental analysis include companies revenues, earnings, future growth, return on equity, and profit margins.
  • Technical Investing
    This method examines past market activity for changes in the stock’s price and volume, believing that historical performance is indicative of future results. Investors use charts, statistics, and other tools to discover patterns in the data to predict future price movements. This style of investing does not analyze the intrisic value of the stock, but rather the future movement of the security. To add technical analysis to your Quantopian code, see the ta-lib open source library.

Real Money

February 25th, 2014   Posted by: fawce

It’s been about a month now since Quantopian opened live trading up to a public/private beta – we figured it’s about time to share an update on how things are going.

First, I’d like to say thank you to all the Quantopian community members who have joined the pilot and taken the time to provide invaluable feedback, insight and encouragement. The community is, without a doubt, the most amazing part of Quantopian – we couldn’t do this without you, and it is a privilege to work with such talented quants, hackers, scientists and analysts on a daily basis.

I’d like to share two of the key metrics we are starting to track for live trading:

              80 – Number of Quantopians who have connected their brokerage account.

              20 – Number of live (IB-backed) algorithms running today.

I’m delighted to report the first real dollars directed by Quantopian-hosted algorithms. Making the leap from simulation to real trading is a huge deal. I’m incredibly grateful that our beta group has trusted us with their capital. I’d like to congratulate each pilot trader who has launched a real-money algo and claimed their free lifetime subscription so far —  the rest of you have until March 31st to join that club.

We have also been listening closely to conversations in the larger community about live trading– it is great to see spirited and open discussions like this one unfold and evolve on the forums. My takeaway is that the key questions about live trading on Quantopian can be divided into three thematic buckets: Trust, Technology and Trading capital.

Trust: Do I trust my research and my code enough to trade? Do I trust Quantopian to protect my IP and place trades as I expect?

Technology: Are all the nuts and bolts required to implement my algorithm the way I need it in place?

Trading capital: Do I have the trading capital and the risk tolerance to live trade my algorithm?

The commonalities across these three themes are that first, the answers are personal, rather than universal and second, the answers are going to keep changing (hopefully from ‘Not yet.’ to ‘Yes!’) as we build the platform out together.

Finally I wanted to share a screenshot of my own live money algo performance (we shared the code for this a few weeks ago here). This is a conservative, market-tracking strategy that holds an equal weighted portfolio of 9 sector ETFs and rebalances as often as daily. I’ve been running this algo live since Jan 23rd when I started with a capital base of just under $30,000, I grabbed this update after the market closed yesterday.

V2 Jess live algo

Quant Strategies Implemented by the Quantopian Community

February 10th, 2014   Posted by: Jess Stauth

Last week I gave a quant finance meet-up talk at the Hacker Dojo in Mountain View, CA. The format was inspired by some analysis I did on the types of algorithms shared and cloned in the Quantopian community – initially I wanted to ask: What are the most popular strategies coded up on Quantopian? To answer this question I ranked all public forum posts three ways, first on number of replies, second on number of views, and third on number of times cloned. I averaged these scores and re-ranked the list to come up with the top 25 ‘Most popular posts of all time’. (NB: I did not do any correction for the date of the original post, so the amount of time the thread has been alive has not been normalized.)

Combo Rank Post Reply Count View Count Clone Count
1 Google Search Terms predict market movements 64 32121 821
2 OLMAR implementation – fixed bug 64 26216 701
3 Easy Volatility Investing by Tony Cooper @ Double-Digit Numerics 57 15211 846
4 discuss the sample algorithm 16 18701 2930
5 Global Minimum Variance Portfolio 28 10230 702
6 ML – Stochastic Gradient Descent Method Using Hinge Loss Function 10 20421 973
7 Mebane Faber Relative Strength Strategy with MA Rule 22 11199 622
8 OLMAR w/ NASDAQ 100 & dollar-volume 31 7766 701
9 Using the CNN Fear & Greed Index as a trading signal 22 9914 367
10 New Sample Algorithm 33 8336 328
11 Bollinger Bands With Trading 18 8390 566
12 Brent/WTI Spread Fetcher Example 17 10892 327
13 Ernie Chan’s Gold Pairs Trade 15 10420 329
14 Ranking and Trading on “Days to Cover” 4 24976 384
15 Determining price direction using exponential and log-normal distributions 9 9781 624
16 Time to “sell in may and go away”? 27 8231 263
18 Simple Mean Reversion Strategy 6 11861 275
19 Neural Network that tests for mean-reversion or momentum trending 4 10101 407
20 Momentum Trade 5 8816 457
21 Using weather as a trading signal 6 11958 199
22 Global market rotation strategy 53 7629 95
23 trading earnings surprises with Estimize data 34 7506 130
24 Trading Strategy: Mean-reversion 13 8252 216
25 Turtle Trading Strategy 11 8012 318
 TOTALS:   569 306940 13581

Starting from this list, I worked backwards and used examples from the Quantopian community to introduce 5 basic quant strategy types: Mean Reversion, Momentum, Value, Sentiment and Seasonality. While this list is not technically ‘mutually exclusive and collectively exhaustive’, it covers a large fraction of intraday to lower frequency quant strategies and provides a good overview of the way equity focused quants think about predicting market prices. I went back to my Top 25 list and categorized each algo into one of these five buckets and then created this pie chart based on the aggregated number of views for each strategy type.

 

algos_by_category

 

There are a number of interesting conclusions to be drawn from this initial overview of community activity. Perhaps the most obvious and predictable of these is that price based strategies are currently in the lead by a large margin – due, I expect, to the easy access to minute-level equity pricing and the accessibility of the logic for momentum and mean-reversion. Indeed there were no value-based strategies that made their way into the Top 25 – which in my view represents a key opportunity space right now.

More subtle and, from my admittedly biased point of view, more compelling is the diversity and quality of content and collaboration in the public sphere. Having joined the Quantopian team from a large corporate setting working with a small group of institutional clients, seeing that the Top 25 algos have been cloned over 13,000 times, an average of over 500 clones per strategy is… well it’s pretty damn cool.

Below you can find the slide deck from my presentation:

DIY Quant Strategies on Quantopian from Jess Stauth

Paper Trading With Interactive Brokers – Open Beta Launch

January 22nd, 2014   Posted by: Dan Dunn

If you have a brokerage account with Interactive Brokers* (IB), you can integrate Quantopian with your IB account and start paper trading today.

(If you don’t have an IB account, that’s OK. You can still paper trade with Quantopian! And, if you’d also like to have an IB account, you can create an account on their website.)

Do You Want to Trade With Real Money, Not Paper?

Our pilot program for real money trading is still in private beta. If you haven’t requested a spot in our real money pilot program, you can do that by filling out this short form.

How Much Does This Cost?

Paper trading and live trading through IB* are both free while we run this beta program. In the future we plan to charge a flat monthly fee per live algorithm.

We also have a special offer. If you get your algorithm up and running with real money by 3/31/14, we’ll give you a lifetime free subscription**.

Live Trading Pilot Progam

Our pilot live trading program has been growing rapidly in the last few weeks, and we’re excited to grow the program even more by opening up paper trading to everyone.

We think that live trading on Quantopian is going to change the way you trade forever.

  • Quantopian lets you test before you invest. Your investment strategy can be vetted before committing any money.
  • Quantopian helps you make decisions based on data and execute those decisions without emotion.
  • Quantopian helps you focus on the big picture and create new ideas, instead of operating a manual, tedious system.

* Interactive Brokers LLC is not affiliated with and does not endorse or recommend Quantopian, Inc. Interactive Brokers provides execution and clearing services to customers who integrate their Interactive Brokers account with their Quantopian account. For more information regarding Interactive Brokers LLC, please visit www.interactivebrokers.com.

** Free, life-time subscription offer is for one algorithm, up to $100,000 in account balance. Additional algorithms and larger account balances are not covered by this offer.

P.S. Attached is a sample algorithm that’s geared for live trading. It takes a list of stocks and rebalances them every day. You can paper trade it today:

  1. Clone the algo below
  2. Run a “Full Backtest”
  3. When the backtest finishes, “Live Trade Algorithm”

live-blog-image

Recently I was asked to comment on a piece in the Economist about the rise and supposed fall of quant investing in the last ten years.

 

The thesis of the article, which equates ‘quant’ with trend following or technical analysis, seems to be: See, those pointy-headed geeks aren’t as smart as they think they are! Sure, they racked up outsized returns in the mid-2000s and the market downturn of ’08. But now it’s different, the market has changed, their tricks are broken, and since it’s impossible to understand what quants are really doing anyway, there’s also no way to predict whether their techniques will ever work again.

 

The logic of this argument is dubious at best; it is at once overly simplistic – relegating the entirety of data-driven systematic investment to the trend-following label – while at the same time providing no convincing counterpoint or real basis for comparison of the relative performance of systematic versus discretionary investing, or ‘stock picking’ over some meaningful historical time frame (you be the judge, the full list of comparable trailing performance for all HFRI funds is available here and a more in depth analysis on HF underperformance in general here). In short, the piece suffers from a clear case of stumbling into a “complexity valley”. Former GETCO trader Nancy Hua gave a nice explanation of this phenomenon as it relates to quant finance in a recent Quora post:

 

[Quant] suffers from a complexity valley: If people don’t know anything about it, when they hear about it in passing, they don’t understand it and they don’t think much of it; if people think about it a medium amount and read news articles about it, when they are armchair philosophizing they hate it; when people actively participate in algorithmic trading and electronic markets, they get obsessed with it and want to integrate it with every aspect of their existing trading.”

 

Does the Economist really intend to take the position that we’ve given the whole ‘computers in investing’ thing a shot and it just didn’t really work out? This data-driven analysis stuff is just not as satisfying as a good ‘ole fashioned hunch! The Economist is one of my favorite magazines, full of bold analysis and expertise. This particular article isn’t to The Economist’s usual standard.

 

Far from being a passing fad – I’d argue that the technological stars are aligning for systematic investing (done right*) to bring a massive wave of disintermediation to the financial industry. As individual investors reach the other side of this particular complexity valley and recognize the accessibility and economy of automated investing – they are going to wonder why 2 and 20 on an ONGOING BASIS ever seemed even remotely reasonable for a job that, thought through once can absolutely be – indeed already IS — run day to day by computers.

 

*Take a look at this simple sector rotation strategy for a nice example borrowed from Meb Faber’s research page.

Screen Shot 2013-12-31 at 12.40.57 PM

Post-mortem of 2013-11-15 security breach

December 13th, 2013   Posted by: jik

A natural outgrowth of what we do at Quantopian is that our users tend to be technically sophisticated and love data. Therefore, when we have a significant event, such as the recent security breach, we are as open and transparent as possible about what transpired. In that spirit, we present this detailed analysis of the security breach and what we’ve since done to strengthen our site security.

In presenting such an analysis, there is a fine line between between being sufficiently open and transparent with our users and with the maintainers of other web sites, who might benefit from our lessons learned, and providing information that could aid future attacks. While we don’t believe in relying on security by obscurity, we also don’t think it’s a good idea to tell the bad guys exactly where to focus their efforts. Therefore, some details have been omitted.

Background

Essential to our mission as an advanced algorithmic trading platform is the fact that our servers execute arbitrary Python code written by our users. Therefore, while we share with every other web application the requirement to code our application securely, we have an additional, challenging requirement most sites don’t: preventing our users’ code from compromising our security. This makes security more challenging and also makes our site a more attractive target.

We run users’ algorithms in a sandbox which is isolated from the rest of our application in numerous ways, including:

  • separate process;
  • separate Python namespace;
  • limits on available Python modules; and
  • chroot().

The algorithm sandbox obviously cannot be entirely isolated from the rest of our application, because we need to send pricing and universe data to the algorithm and receive in return the logging data and results that it generates.

Malicious users attempt on a regular basis to break out of our algorithm sandbox. We actively monitor them, evaluate them on a case-by-case basis, and take additional steps when necessary.

Timeline

On the afternoon of Thursday, November 14, our monitoring systems alerted us to the fact that an attacker was attempting to escape from the sandbox using a technique we knew about and had already blocked. Many other users had tried the same technique unsuccessfully, so we weren’t particularly worried.

However, two things about this particular attacker surprised us. First of all, he was more persistent than most; he tried pretty much every conceivable variety of this particular attack, when most attackers give up after it fails a few times. Second, because of his persistence, he actually found a minor chink in our armor: he was able to get a peek at some internals of our sandbox due to a typographical error in one of our source files.

The information he was able to retrieve was relatively inconsequential, but we we were obviously alarmed by the unintended exposure of that data. We immediately blocked the attacker’s access to the site, tracked down the root cause of the vulnerability, fixed it, and released the fix.

Shortly after we blocked the attacker, he circumvented the block, returned to the site, and kept working on the attack, so we blocked him again. After that, he tried unsuccessfully for a while to get back into the site, and then went away. We thought that was the end of it, but unfortunately we were wrong.

On Friday morning, the attacker returned. He tried several techniques unsuccessfully, but then he found one that worked, a method for partially circumventing the limitations on Python modules accessible from user algorithm code. We had known about this particular vulnerability before his attack, and it was on our roadmap to fix soon, but we hadn’t yet done so because we thought that it was limited in scope and it was unlikely that anyone would find it. With 20/20 hindsight, we were clearly wrong.

After detecting the attacker’s return, we began to analyze what he had been able to accomplish, and it quickly became clear that the breach was significant. We shut down the application, notified our users on our blog, and set to completing our analysis to determine the precise magnitude of the compromise and what needed to be done to eliminate the vulnerabilities that had enabled it.

Within an hour of detecting the compromise, we were able to identify the root cause and a remediation plan which we immediately began implementing. Within a few hours of detecting the compromise, we were able to confirm that the attacker had not accessed any user data.

We left the application shut down all day Friday and Saturday and most of the day Sunday while we implemented and tested the enhancements needed to prevent this kind of attack in the future. After developing and testing the fixes, we brought the site back up at around 8:00pm on Sunday night.

The attacker made several more unsuccessful attempts to compromise the site on Sunday night.

On Monday morning, we had a smooth market open for our live traders and paper traders.

We continue to aggressively monitor the application.

The vulnerability

Before this breach, access to Python modules and methods within using algorithm code was limited by three different mechanisms:

  1. Any module not already imported into the Python namespace before executing the user’s algorithm could not be used within the algorithm, since after the chroot() the files necessary to import said module are no longer available.
  2. Import statements were filtered and only certain, whitelisted modules could be imported.
  3. Keywords for potentially dangerous methods and module attributes were banned.

The flaw in this logic, which the attacker discovered, is that if a whitelisted module imports a module that is not whitelisted, then that module is accessible as an attribute of the whitelisted module’s object. For example:

>>> import pytz
>>> import sys
>>> print sys
<module 'sys' (built-in)>
>>> print pytz.sys
<module 'sys' (built-in)>
>>>

We couldn’t add every single module name to the keyword blacklist, since that would have prevented users from using too many variable names. And even if we had been able to do that, a module could also import a blacklisted module under a different name, e.g., “import sys as sysmodule”, and we would have had to fully audit all whitelisted modules for such references, as well as re-auditing them each time we upgraded our module versions, which wasn’t practical.

The attacker accessed the sys module through another, whitelisted module and used it to gain access to other Python modules containing sensitive data. Fortunately, however, he did not access any user data. He also didn’t get our data encryption keys, which we guard extremely carefully for obvious reasons, so even if he had accessed user data he would not have been able to decrypt it.

How we fixed it

We’ve made a number of changes to our application to eliminate this particular vulnerability and mitigate the risk of other vulnerabilities as well. These include:

  • Our module whitelist / blacklist functionality is much more powerful now and includes both compile-time and run-time enforcement. With this change there is no longer any way — that we know of — for algorithm code to break out of the algorithm sandbox.
  • Nevertheless, since good security is all about layers, we’ve removed much of the sensitive data that could be accessed by code that  does somehow manage to escape the sandbox, and we are planning additional changes to remove even more.
  • We now have automated processes in place to detect when new attributes have been added to a module we allow, so that we can audit them and determine whether they should be whitelisted or blacklisted.
  • We’ve enhanced our monitoring to alert us more quickly and more aggressively to suspicious user algorithm code, and we are continuing to expand and improve our monitoring and alerting capabilities.

Conclusion

In closing, we first and foremost want to reiterate that no user data were compromised during this incident.

We take very seriously our responsibility to safeguard our members’ intellectual property. Security is an ongoing process, not a one-time thing, and we will continue to evolve our practices to stay current with the state of the art. We believe the best way to earn your trust is to be open and transparent, and we will continue to do that even when we are sharing unpleasant news.

If you have any questions or concerns, please let us know. We always reply to email received at [email protected]. We monitor [email protected] for emails concerning our security. You are always welcome to reach me personally at [email protected].

Sincerely,

Jonathan Kamens
Vice President of Operations

For those of you who weren’t able to join us in NYC last month here are some highlight clips from our event.

We kicked off with an intro from Fawce. First here’s the general overview including a look at the live trading dashboard:

Fawce on the two most common questions people ask him about Quantopian:

Gary Chan gave our keynote talk – in this clip he explains his motivation for sharing his work in a public meetup forum:

I really liked the examples Gary used for what his definitions of Easy vs. Hard are. Spoiler alert: Getting into Harvard is hard. Opening a successful restaurant is hard. Algorithmic trading is (by comparison) easy.

Gary walks through his trading infrastructure and toolkit:

Finally Gary shares the results of his actual backtests:

You can view the full slide deck from Gary’s presentation here:

You can sign up for future Meetups in NYC, Boston, and San Francisco. We’d love to see you in the crowd next time.

Probabilistic Programming in Quantitative Finance

December 3rd, 2013   Posted by: Thomas Wiecki

Probabilistic Programming (aka Bayesian statistics) allows flexible construction of statistical models to gain insight from data. Estimation of best fitting parameter values — as well as uncertainty in these estimations — can be automated by sampling algorithms such as Markov chain Monte Carlo (MCMC). The high interpretability and flexibility of this approach has already lead to a huge paradigm shift in scientific fields ranging from Cognitive Science to Data Science. In this blog post I will highlight what Probabilistic Programming can offer for algorithmic traders.

I recently gave a presentation about Bayesian Data Analysis in PyMC3 at PyData NYC’13 with a special focus on financial applications which should give a good primer — check it out here:

Bayesian Data Analysis with PyMC3 – Thomas Wiecki from PyData on Vimeo.

Inferring latent processes in Quantitative Finance

Bayesian statistics has many benefits, some of which I discuss in my talk. Most relevant to Quantitative Finance, however, is the fact that you can very flexibly model latent, unobservable processes and how they relate to observable events. As an example, we could model fear of investors. We can’t measure it directly but it certainly has a bearing on market behavior and the stock price.

Bayes’ formula allows us to then infer backwards: given the observable data (e.g. the stock price), what is the probability of the latent construct (e.g. investor fear)?

Stochastic volatility model

One elegant example of how Probabilistic Programming can infer unobservable quantities of the stock market is the stochastic volatility model. Volatility is an important concept of quantitative finance as it relates to risk. Unfortunately, as Tony Cooper reminds us:

“Volatility is a strange thing – it exists but you can’t measure it.”

Tony Cooper on Quantopian forums.

If we can’t measure something, the next best thing we can do is to try and model it. One way to do this is in a probabilistic framework is the concept of stochastic volatility: If we assume that returns are normally distributed, the volatility would be captured as the standard deviation of the normal distribution. Thus, the standard deviation gives rise to stochastic volatility. Intuitively we would assume that the standard deviation is high during times of market turmoil like the 2008 crash.

So the trivial thing to do would be to look at the rolling standard deviation of returns. But this is unsatisfying for multiple reasons:

  • it often lags behind volatility,
  • is a rather unstable measure,
  • strongly dependent on the window size with no principled way of choosing it.

These properties are outlined in the plot below. The larger the window size, the more lag, the smaller the window size, the more unstable the estimate gets.

rolling_std

As we will see, the stochastic volatility model does a better job at all of them. But before we look at that we need to establish one more insight into the nature of volatility: it tends to cluster. This is nicely demonstrated by looking at the returns of the S&P 500 above. As you can see, during the 2008 financial crisis there is a lot volatility in the stock market (huge positive and negative daily returns) that gradually decreases over time.

So how do we model this clustering property? The Stochastic Volatility model assumes that that the standard-deviation of the returns follow a random-walk process. You can read the Wikipedia article I linked but essentially this process allows for slow, gradual changes over time.

What is interesting is that we can model the standard deviation itself to follow a random walk. Intuitively, we allow standard deviation to change over time but only ever so slightly at each time-point. For the mathematical and implementational details of the model applied see this IPython notebook which uses PyMC3 (a new, flexible probabilistic programming framework for Python).

The plot below shows the latent volatility (in orange) inferred from the model based on the market data (in mint). The orange  lines represent the standard deviation of the Normal distribution we assume for the daily returns.

stoch_vol

As you can see, the estimated volatility is much more robust. Moreover, all parameters are estimated from the data while we would have to find a reasonable window length for the rolling standard deviation ourselves. Finally, we do not just get a single estimate as with the rolling standard deviation but rather many solutions that are likely. This provides us with a measure of the uncertainty in our estimates and is represented by the width of the orange line above.

In summary, Bayesian statistics and Probabilistic Programming is an extremely powerful framework for Quantitative Finance as it provides:

  • a principled framework for modeling latent, unobservable causes, as demonstrated by the stochastic volatility model;
  • a measure of uncertainty in our estimates;
  • powerful sampling methods that allow automatic estimation of highly complex models.

We are currently working on making PyMC3 available on Quantopian to allow usage of this type of models — so sign up and start getting familiar with our platform, discuss on the forums, and follow @Quantopian and me on Twitter. If you are interested in Probabilistic Programming I also sometimes blog about it here.

  • Follow Us!

    Friend me on FacebookFollow my company on LinkedInRSS FeedFollow me on Twitter
  • Quantopian

    Get email updates about Quantopian:
  • Recent Posts

  • Recent Comments

  • Categories

  • Archives