Artificial Intelligence for Trading Nanodegree
Alpha Research and Factor Modelling
Project: Multi-Factor Model
Table of Contents
- Project Overview
- Statistical Risk Model
- Alpha Factors
- The Combined Alpha Factor
- Evaluate Alpha Factors
- Optimal Portfolio Constrained by Risk Model
In this project, I will build a statistical risk model using PCA. I’ll use this model to build a portfolio along with 5 alpha factors. I’ll create these factors, then evaluate them using factor-weighted returns, quantile analysis, sharpe ratio, and turnover analysis. At the end of the project, I’ll optimize the portfolio using the risk model and factors using multiple optimization formulations.
Udacity doesn't have a license to redistribute the data to us. They are working on alternatives to this problem.
Statistical Risk Model
Portfolio risk is calculated using this formula:
- X is the portfolio weights (weights assigned to each stock)
- B is the factor betas (exposure of factors)
- F is the factor covariance matrix (combined with factor betas gives systematic risk)
- S is the idiosyncratic variance matrix (specific risk)
After calculating the profile risk, the following five alpha factors were created:
- Momentum 1 Year Factor 
Each factor has a hypothesis that goes with it. For this factor, it is "Higher past 12-month (252 days) returns are proportional to future return". Using that hypothesis, we've generate this code:
from zipline.pipeline.factors import Returns def momentum_1yr(window_length, universe, sector): return Returns(window_length=window_length, mask=universe) \ .demean(groupby=sector) \ .rank() \ .zscore()
- Mean Reversion 5 Day Sector Neutral Factor 
I have implemented
mean_reversion_5day_sector_neutral using the hypothesis "Short-term outperformers (underperformers) compared to their sector will revert." Using the returns data from universe, demean using the sector data to partition, rank, then converted to a zscore.
def mean_reversion_5day_sector_neutral(window_length, universe, sector): """ Generate the mean reversion 5 day sector neutral factor Parameters ---------- window_length : int Returns window length universe : Zipline Filter Universe of stocks filter sector : Zipline Classifier Sector classifier Returns ------- factor : Zipline Factor Mean reversion 5 day sector neutral factor """ return -Returns(window_length=window_length, mask = universe)\ .demean(groupby=sector)\ .rank()\ .zscore()
- Mean Reversion 5 Day Sector Neutral Smoothed Factor
Taking the output of the previous factor, we create a smoothed version.
mean_reversion_5day_sector_neutral_smoothed generates a mean reversion 5 day sector neutral smoothed factor. Calling the mean_reversion_5day_sector_neutral function to get the unsmoothed factor, then using
SimpleMovingAverage function to smooth it. We'll have to apply rank and zscore again.
from zipline.pipeline.factors import SimpleMovingAverage def mean_reversion_5day_sector_neutral_smoothed(window_length, universe, sector): """ Generate the mean reversion 5 day sector neutral smoothed factor Parameters ---------- window_length : int Returns window length universe : Zipline Filter Universe of stocks filter sector : Zipline Classifier Sector classifier Returns ------- factor : Zipline Factor Mean reversion 5 day sector neutral smoothed factor """ mean_reversion = mean_reversion_5day_sector_neutral(window_length, universe, sector) return SimpleMovingAverage(inputs=[mean_reversion], window_length = window_length).rank().zscore()
- Overnight Sentiment Factor 
For this factor, were using the hypothesis from the paper Overnight Returns and Firm-Specific Investor Sentiment.
from zipline.pipeline.data import USEquityPricing class CTO(Returns): """ Computes the overnight return, per hypothesis from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2554010 """ inputs = [USEquityPricing.open, USEquityPricing.close] def compute(self, today, assets, out, opens, closes): """ The opens and closes matrix is 2 rows x N assets, with the most recent at the bottom. As such, opens[-1] is the most recent open, and closes is the earlier close """ out[:] = (opens[-1] - closes) / closes class TrailingOvernightReturns(Returns): """ Sum of trailing 1m O/N returns """ window_safe = True def compute(self, today, asset_ids, out, cto): out[:] = np.nansum(cto, axis=0) def overnight_sentiment(cto_window_length, trail_overnight_returns_window_length, universe): cto_out = CTO(mask=universe, window_length=cto_window_length) return TrailingOvernightReturns( inputs=[cto_out],window_length=trail_overnight_returns_window_length )\ .rank().zscore()
- Overnight Sentiment Smoothed
Just like the implemented factor, we'll also smooth this factor.
def overnight_sentiment_smoothed(cto_window_length, trail_overnight_returns_window_length, universe): unsmoothed_factor = overnight_sentiment(cto_window_length, trail_overnight_returns_window_length, universe) return SimpleMovingAverage( inputs=[unsmoothed_factor], window_length=trail_overnight_returns_window_length ) \ .rank() \ .zscore()
Combined Alpha Factor
With all the factor implementations done, let's add them to a zipline pipeline.
universe = AverageDollarVolume(window_length=120).top(500) sector = project_helper.Sector() pipeline = Pipeline(screen=universe) pipeline.add( momentum_1yr(252, universe, sector), 'Momentum_1YR') pipeline.add( mean_reversion_5day_sector_neutral(5, universe, sector), 'Mean_Reversion_5Day_Sector_Neutral') pipeline.add( mean_reversion_5day_sector_neutral_smoothed(5, universe, sector), 'Mean_Reversion_5Day_Sector_Neutral_Smoothed') pipeline.add( overnight_sentiment(2, 5, universe), 'Overnight_Sentiment') pipeline.add( overnight_sentiment_smoothed(2, 5, universe), 'Overnight_Sentiment_Smoothed') all_factors = engine.run_pipeline(pipeline, factor_start_date, universe_end_date) # all_factors.head()
Evaluate Alpha Factors
Note: We're evaluating the alpha factors using delay of 1
Let's view the factor returns over time. It looks like moving up and to the right.
It is not enough to look just at the factor weighted return. A good alpha is also monotonic in quantiles. Let's looks the basis points for the factor returns.
- None of these alphas are strictly monotonic; this should lead you to question why this is? Further research and refinement of the alphas needs to be done. What is it about these alphas that leads to the highest ranking stocks in all alphas except MR 5D smoothed to not perform the best.
- The majority of the return is coming from the short side in all these alphas. The negative return in quintile 1 is very large in all alphas. This could also a cause for concern becuase when you short stocks, you need to locate the short; shorts can be expensive or not available at all.
- If you look at the magnitude of the return spread (i.e., Q1 minus Q5), we are working with daily returns in the 0.03%, i.e., 3 basis points, neighborhood before all transaction costs, shorting costs, etc.. Assuming 252 days in a year, that's 7.56% return annualized. Transaction costs may cut this in half. As such, it should be clear that these alphas can only survive in an institutional setting and that leverage will likely need to be applied in order to achieve an attractive return.
Without doing a full and formal backtest, we can analyze how stable the alphas are over time. Stability in this sense means that from period to period, the alpha ranks do not change much. Since trading is costly, we always prefer, all other things being equal, that the ranks do not change significantly per period. We can measure this with the factor rank autocorrelation (FRA).
Sharpe Ratio of the Alphas
The last analysis we'll do on the factors will be sharpe ratio. Function
sharpe_ratio calculate the sharpe ratio of factor returns.
def sharpe_ratio(factor_returns, annualization_factor): """ Get the sharpe ratio for each factor for the entire period Parameters ---------- factor_returns : DataFrame Factor returns for each factor and date annualization_factor: float Annualization Factor Returns ------- sharpe_ratio : Pandas Series of floats Sharpe ratio """ return annualization_factor * np.mean(factor_returns)/np.std(factor_returns, ddof=1)
Let's see what the sharpe ratio for the factors are. Generally, a Sharpe Ratio of near 1.0 or higher is an acceptable single alpha for this universe.
Sharpe Ratio of 1.13 for momentum factor is good but if we look at the auto-correlation plots, FRA for momentum factor looks stable. So smoothing the momentum factor will not have any significant change.
The Combined Alpha Vector
To use these alphas in a portfolio, we need to combine them somehow so we get a single score per stock. This is a area where machine learning can be very helpful. In this module, however, we will take the simplest approach of combination: simply averaging the scores from each alpha.
Optimal Portfolio Constrained by Risk Model
Objective and Constraints
This is the list of contraints that will optimize against:
Where x is the portfolio weights, B is the factor betas, and r is the portfolio risk
The first constraint is that the predicted risk be less than some maximum limit. The second and third constraints are on the maximum and minimum portfolio factor exposures. The fourth constraint is the "market neutral constraint: the sum of the weights must be zero. The fifth constraint is the leverage constraint: the sum of the absolute value of the weights must be less than or equal to 1.0. The last are some minimum and maximum limits on individual holdings.
Weights generated after applying those constraints:
Yikes. It put most of the weight in a few stocks.
Optimize with a Regularization Parameter
This is the weights distribution after applying regularization to the objective function.
Nice. Well diverfied.
Optimize with a Strict Factor Constraints and Target Weighting
Another common formulation is to take a predefined target weighting(e.g., a quantile portfolio), and solve to get as close to that portfolio while respecting portfolio-level constraints.
This project used Python 3.6.3. The necessary libraries are mentioned in