# Intro to Python Workshop #

## Pandas, Numpy and Matplotlib for Financial Data Analysis ## 

### Intro to Pandas ###

Before we can use the functions of third party modules, we need to import the module. Then, the methods of the module will be available for various use.


In [None]:
import pandas_datareader.data as web
import datetime as dt
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib as mpl

A string can be converted into datetime using the datetime function of datetime module. 

In [None]:
start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 12, 31)

We can create a list of stocks whose data we want to analyze.

In [None]:
stock_tickers = ['AMZN', 'MSFT', 'AAPL', 'GOOG', 'IBM']

We load the data using the DataReader functionality.

In [None]:
df = web.DataReader(stock_tickers[0], 'yahoo', start, end)
df.head()

In [None]:
df.info()

Pandas dataframes have index for both rows and columns.

In [None]:
df.index

In [None]:
df.columns

.loc and .iloc are recommended method for indexing, slicing and filtering dataframes.

In [None]:
df.loc[df.index<='2017-10-25', ['Adj Close']]

In [None]:
df.iloc[:5,-1]

In [None]:
df.loc[df.index<='2017-10-25', ['Adj Close']]

From the data we imported we will only use two columns - Volume and Adj Close.

In [None]:
df = df[['Adj Close', 'Volume']]

We will use matplotlib module to see stock prices over time in a line plot. 

In [None]:
mpl.rc('figure', figsize=(10, 5))
plt.plot(df['Adj Close'])
plt.title('Daily lcosing price price of ' + stock_tickers[0])
plt.show()

### Financial Data Analysis ### 

A datetime format gives us more ways of extracting information from date which can then be assigned back to the Pandas' dataframes as columns. 


In [None]:
df['Year']=df.index.year
df['Month']=df.index.month
df['Weekday']=df.index.weekday
df.head()

Methods such as .groupby, .apply and .agg are some of the most widely used methods for data analysis. 

In [None]:
df.groupby('Year')['Adj Close'].mean()

Let's calculate the daily return rate of this stock.

In [None]:
df['returns'] = df['Adj Close'] / df['Adj Close'].shift(1) -1
plt.plot(df['returns'])
plt.title('Daily change in price of ' + stock_tickers[0])
plt.show()

A __(simple) moving average (MA)__ is a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random short-term price fluctuations. It is a trend-following, or lagging, indicator because it is based on past prices.

The length of the moving average to use depends on the trading objectives, with shorter moving averages used for short-term trading and longer-term moving averages more suited for long-term investors. The 50-day and 200-day MAs are widely followed by investors and traders, with breaks above and below this moving average considered to be important trading signals.

Source: https://www.investopedia.com/terms/m/movingaverage.asp

In [None]:
df['MA_50d'] = df['Adj Close'].rolling(window=50).mean()
df['MA_100d'] = df['Adj Close'].rolling(window=100).mean()
df.tail()

Now, let's plot the adjusted closing price along with the two moving averages we calculated above in one plot. 

In [None]:
df['Adj Close'].plot(label=stock_tickers[0])
df['MA_50d'].plot(label='50-day moving avg', linewidth=3)
df['MA_100d'].plot(label='100-day moving avg',  linewidth=3, alpha=.5)
plt.legend()
plt.show()

__Volatility__ is a statistical measure of the dispersion of returns for a given security or market index. In most cases, the higher the volatility, the riskier the security. Volatility is often measured as either the standard deviation or variance between returns from that same security or market index.

Source: https://www.investopedia.com/terms/v/volatility.asp

In [None]:
df['volatility'] = df['Adj Close'].rolling(50).std()
plt.plot(df['volatility'])
plt.title('50-Day volatility of ' + stock_tickers[0])
plt.show()

### Predicting Stock Prices ### 

We can build a linear regression model to predict future stock prices and see how it performs by comparing it against the actual data. Let's prepare the data for this task. 

In [None]:
df['prediction'] = df['Adj Close'].shift(-1)
df.tail(10)

In [None]:
X = df[['Volume','MA_50d', 'prediction']].copy() #'Volume', 'returns', 'Month', 
X.dropna(inplace=True)
y = X[['prediction']]
X.drop('prediction', axis=1, inplace=True)

In [None]:
npred = 200
X_train = X[:-npred]
X_test = X[-npred:]
y_train = y[:-npred]
y_test = y[-npred:]

In [None]:
from sklearn.linear_model import LinearRegression 

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

In [None]:
y_pred = pd.DataFrame( y_pred, index=y_test.index)

In [None]:
plt.plot(df.loc[df.index >= '2018-01-01', ['prediction']])
plt.plot(y_pred.shift(1))
plt.show()

Now that we have the results for the first stock in the list, we can use the same methodology to obtains results for the rest of the stocks in the stock_tickers list. However, we can avoid having to repeat these codes over and over again for each stock on the list by creating a function that binds these operations.

In [None]:
import pandas_datareader.data as web
import datetime
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib as mpl

from sklearn.linear_model import LinearRegression 

start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 12, 31)

# list of companies whose stock you want to analyze
stock_tickers = ['AMZN', 'MSFT', 'AAPL', 'GOOG', 'IBM']

In [None]:
def lrpredictions(stockticker, start, end, npred, show_result=False):
    
    # preprocessing
    df = web.DataReader(stockticker, 'yahoo', start, end)
    df = df[['Adj Close', 'Volume']]
    df['Year']=df.index.year
    df['Month']=df.index.month
    df['Weekday']=df.index.weekday
    df['returns'] = df['Adj Close'] / df['Adj Close'].shift(1) -1
    df['MA_50d'] = df['Adj Close'].rolling(window=50).mean()
    df['MA_100d'] = df['Adj Close'].rolling(window=100).mean()
    df['volatility'] = df['Adj Close'].rolling(50).std()
    df['prediction'] = df['Adj Close'].shift(-1)
  
    # data for linear regression
    X = df[['Volume', 'MA_50d', 'prediction']].copy()
    X.dropna(inplace=True)
    y = X[['prediction']]
    X.drop('prediction', axis=1, inplace=True)

    # split train and test sets
    X_train = X[:-npred]
    X_test = X[-npred:]
    y_train = y[:-npred]
    y_test = y[-npred:]

    # linear regression model fitting and prediction
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)

    # plot real vs prediction values
    y_pred = pd.DataFrame( y_pred, index=y_test.index)
    mpl.rc('figure', figsize=(10, 5))
    plt.plot(df.loc[df.index >= '2018-01-01', ['prediction']])
    plt.plot(y_pred.shift(1))
    plt.title('Actual vs. Prediction for '+ stockticker)
    plt.ylabel('Stock price')
    plt.show()

    if show_result:
        return [X, y, y_pred]

In [None]:
lrpredictions(stock_tickers[1], start, end, 200)

We can use a loop to do this all at once. 

In [None]:
for i in range(0, len(stock_tickers)):
    lrpredictions(stock_tickers[i], start, end, 200)

# Contact for Questions: tdmdal@rotman.utoronto.ca