Intro to R Time Series (and a few Finance Packages)

Author

Jay Cao

Published

April 3, 2024

1. Introduction

How would you store a time series dataset in R? Given the data structures we have seen so far, you can store it in at least two ways.

  1. store it as a vector or matrix (name or rowname can be used for timestamps if needed)
  2. store it as a dataframe/tibble (timestamps can be a column variable).

These are both reasonable ways of storing time series. However, more than just storing time series data, we also need to process them efficiently, i.e. we need associated time-aware functions that can process time series (e.g. taking lags and differences, subsetting based on time, etc.). As a result, people working with time series in R tend to use specialized time-series data structures because many time-aware functions have been developed for them over the years. Just like matrices and dataframes, these specialized time-series data structures are built upon basic data structures together with special attributes and tailored functions/methods for time series. Several time series classes(/data structures) are popular and widely supported by many econometrics and finance packages. See CRAN time series task view for details.

  1. ts class. ts is a class for equi-spaced time series using numeric timestamps. It’s supported in base R (out-of-the-box R)
  2. zoo class. zoo provides infrastructure for regular- and irregular-spaced time series using arbitrary classes for the timestamps (there are many R classes for dates and times)
  3. xts class. xts extends zoo. It provides uniform handling of R’s time-based data classes.
  4. many more classes such as timeSeries in timeSeries package, tis in tis package, etc.
  5. recently, tibble-like time series classes and associated packages started to gain attraction, for example, the tsibble class from tsibble package. The tsibble package in turn belongs to a big family of time series toolsets called tidyverts. (Recall tibble is almost like dataframe, and it’s the data structure used by the Tidyverse eco-system.)
  6. “time-aware” tibble. This is the approach the tidyquant package takes. Tidyquant uses tibble as its main data structure, but it makes tibble “time-aware” whenever needed. For example, it converts tibble to xts when doing portfolio performance analysis as the underlying function call is taken from PerformanceAnalytics package, which operates on xts data structure.

I will leave you to learn the (5) and (6) above on your own as they both follow the tidyverse principle, which you are already familiar with.

In this notebook, I will (briefly) discuss ts and xts classes. They are the core data structures used by many wonderful time series and finance R packages. You will see that the way we operate on ts and xts objects is quite different from what we learned so far, i.e. the tidyverse way of manipulating dataframes/tibbles. Along learning ts and xts, we will also get a chance to check out a few useful R packages for time series and financial analysis, namely forecast, quantmod, PerformanceAnalytics, and RQuantLib.

Note: Although the forecast package is now retired in favour of the fable package, which works with tsibble (see 5 above), that doesn’t mean the xts data structure is retired. We use the forecast package here to learn xts. In any case, forecast is still a great R package nevertheless.

2. ts class (and forecast package).

2.1. Basics

ts is a basic class for equi-spaced time series supported by base R. Essentially, it’s a vector or matrix with class attribute of “ts” (and an additional attribute). In the matrix case, each column is assumed to contain a single univariate time series. A ts object concisely stores the time series index using three “time series parameters” (tsp): start, end, and frequency.

We can create a ts object using the ts() function.

ts_obj <- ts(1:10, frequency = 4, start = c(2017, 2)) # 2nd Quarter of 1917
ts_obj
     Qtr1 Qtr2 Qtr3 Qtr4
2017         1    2    3
2018    4    5    6    7
2019    8    9   10     

Take a look at ts data structure

# take a look at the data structure
print(typeof(ts_obj))
[1] "integer"
print(class(ts_obj))
[1] "ts"
print(attributes(ts_obj))
$tsp
[1] 2017.25 2019.50    4.00

$class
[1] "ts"
str(ts_obj)
 Time-Series [1:10] from 2017 to 2020: 1 2 3 4 5 6 7 8 9 10

In most cases, you will load data from an external source (files or online data). Let’s load a time series dataset (stored online) and turn it into ts. The dataset records the number of births per month in New York city, from January 1946 to December 1959.

# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))
birth_ts
        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
        Nov    Dec
1946 21.672 21.870
1947 21.759 22.073
1948 21.059 21.573
1949 21.519 22.025
1950 22.084 22.991
1951 22.964 23.981
1952 23.162 24.707
1953 25.246 25.180
1954 24.712 25.688
1955 25.693 26.881
1956 26.291 26.987
1957 26.634 27.735
1958 25.912 26.619
1959 26.992 27.897
class(birth_ts)
[1] "ts"

2.2 ts-aware base R functions

We can process and analyze ts objects just by using base R functions. Let’s try a few of them.

# plot the time series
plot(birth_ts)

plot() is a generic function that plots different graphs depending on its input. The plot() here actually invokes plot.ts(), the plotting method for ts objects.

# plot the first difference of birth_ts
plot.ts(diff(birth_ts))

# time() create a vector of time from the time series
time(birth_ts)
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1946 1946.000 1946.083 1946.167 1946.250 1946.333 1946.417 1946.500 1946.583
1947 1947.000 1947.083 1947.167 1947.250 1947.333 1947.417 1947.500 1947.583
1948 1948.000 1948.083 1948.167 1948.250 1948.333 1948.417 1948.500 1948.583
1949 1949.000 1949.083 1949.167 1949.250 1949.333 1949.417 1949.500 1949.583
1950 1950.000 1950.083 1950.167 1950.250 1950.333 1950.417 1950.500 1950.583
1951 1951.000 1951.083 1951.167 1951.250 1951.333 1951.417 1951.500 1951.583
1952 1952.000 1952.083 1952.167 1952.250 1952.333 1952.417 1952.500 1952.583
1953 1953.000 1953.083 1953.167 1953.250 1953.333 1953.417 1953.500 1953.583
1954 1954.000 1954.083 1954.167 1954.250 1954.333 1954.417 1954.500 1954.583
1955 1955.000 1955.083 1955.167 1955.250 1955.333 1955.417 1955.500 1955.583
1956 1956.000 1956.083 1956.167 1956.250 1956.333 1956.417 1956.500 1956.583
1957 1957.000 1957.083 1957.167 1957.250 1957.333 1957.417 1957.500 1957.583
1958 1958.000 1958.083 1958.167 1958.250 1958.333 1958.417 1958.500 1958.583
1959 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417 1959.500 1959.583
          Sep      Oct      Nov      Dec
1946 1946.667 1946.750 1946.833 1946.917
1947 1947.667 1947.750 1947.833 1947.917
1948 1948.667 1948.750 1948.833 1948.917
1949 1949.667 1949.750 1949.833 1949.917
1950 1950.667 1950.750 1950.833 1950.917
1951 1951.667 1951.750 1951.833 1951.917
1952 1952.667 1952.750 1952.833 1952.917
1953 1953.667 1953.750 1953.833 1953.917
1954 1954.667 1954.750 1954.833 1954.917
1955 1955.667 1955.750 1955.833 1955.917
1956 1956.667 1956.750 1956.833 1956.917
1957 1957.667 1957.750 1957.833 1957.917
1958 1958.667 1958.750 1958.833 1958.917
1959 1959.667 1959.750 1959.833 1959.917
# subset the series to make it start from 1947 (this is just to show you how to slice a ts)
# plot the new series and add a regression line (y is birth_ts_start1947 and x is time)
birth_ts_start1947 <- window(birth_ts, start = 1947)
plot(birth_ts_start1947)
abline(reg = lm(birth_ts_start1947 ~ time(birth_ts_start1947)))

# print out the cycle of the time series
cycle(birth_ts)
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1946   1   2   3   4   5   6   7   8   9  10  11  12
1947   1   2   3   4   5   6   7   8   9  10  11  12
1948   1   2   3   4   5   6   7   8   9  10  11  12
1949   1   2   3   4   5   6   7   8   9  10  11  12
1950   1   2   3   4   5   6   7   8   9  10  11  12
1951   1   2   3   4   5   6   7   8   9  10  11  12
1952   1   2   3   4   5   6   7   8   9  10  11  12
1953   1   2   3   4   5   6   7   8   9  10  11  12
1954   1   2   3   4   5   6   7   8   9  10  11  12
1955   1   2   3   4   5   6   7   8   9  10  11  12
1956   1   2   3   4   5   6   7   8   9  10  11  12
1957   1   2   3   4   5   6   7   8   9  10  11  12
1958   1   2   3   4   5   6   7   8   9  10  11  12
1959   1   2   3   4   5   6   7   8   9  10  11  12
# find mean number of births by year and display the year on year trend
# type = "b" mean plot both point and line
plot(aggregate(birth_ts, FUN = mean), type = "b")

# acf of the series
acf(birth_ts)

ACF - Autocorrelation Function. ACF calculates the correlation of a time series with its own lagged values. (see wiki)

The above plot starts with lag 0, and then lag 1, 2 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.

The ACF of our birth time series decreases slowly because it is not stationary. (It has an obvious upward trend and seasonality.)

You can plot PACF easily using pacf() function.

# pacf on the series
pacf(diff(birth_ts))

PACF - Partial ACF. PACF “gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags.” (wiki).

The above plot starts with lag 1, and then lag 2, 3 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.

ACF and PACF help identify parameters for time series models, e.g., p, i, q in AR(p), MA(q), or ARIMA(p, i, q) model.

2.3. the forecast package

From the previous time-series plots, we see the series has

  1. an upward year-by-year trend
  2. a seasonal component with cycle of 1 year
  3. a relatively stable variance over time

Let’s fit the series with a seasonal ARIMA model arima(p, i, q)(P, I, Q)[S]. Instead of manually identify the model parameters (through ACF and PACF graphs) and fit the model using base R functions, we will use the auto.arima() function from the forcast package.

library(forecast)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
# fit a seasonal ARIMA model arima(p, i, q)(P, I, Q)[S]
fit <- auto.arima(birth_ts, approximation = FALSE)
fit
Series: birth_ts 
ARIMA(3,1,0)(2,1,1)[12] 

Coefficients:
          ar1      ar2      ar3     sar1     sar2     sma1
      -0.0784  -0.2068  -0.1828  -0.4316  -0.2916  -0.6614
s.e.   0.0815   0.0800   0.0820   0.1186   0.1096   0.1170

sigma^2 = 0.3971:  log likelihood = -154.75
AIC=323.51   AICc=324.27   BIC=344.81

Predict based on the fitted model.

# predict 36 months ahead
pred <- forecast(fit, h = 36)
autoplot(pred)

We can chain the process using pipe operator (%>%) as well.

birth_ts %>%
  auto.arima(approximation = FALSE) %>%
  forecast(h = 36) %>%
  autoplot()

Decomposing and visualizing trend, seasonality, and remaining errors can be done easily using base R decompose() and forecast package’s autoplot().

birth_ts %>%
  decompose(type="multiplicative") %>% 
  autoplot()

You certainly should read forecast package document to understand the theory behind auto.arima() and decompose().

  1. forecast package website
  2. Automatic Time Series Forcasting: the forecast Package for R
  3. Classical decomposition chapter in the book Forecasting: Principles and Practice.

Forecast package has many more useful functions (e.g. better ACF and PACF plots), I’ll leave you to explore them on your own.

3. zoo and xts classes (and quantmod & PerformanceAnalytics packages)

3.1. Basics

zoo can handle regular- and irregular-spaced time series using arbitrary classes for timestamps. xts extends zoo. The main benefit of xts is its seamless compatibility with other packages using different time-series classes (timeSeries, zoo, …).” (see xts faq.)

You can think of xts object as a matrix plus a time index.

library(xts)
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
# create a 3x2 matrix
x <- matrix(1:6, ncol = 2)
print(x)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
# create time index
idx <- as.Date(c("2019-01-01", "2019-01-02", "2019-01-05"))
print(idx)
[1] "2019-01-01" "2019-01-02" "2019-01-05"
# create an xts objet
xts_obj <- xts(x, order.by = idx)
xts_obj
           [,1] [,2]
2019-01-01    1    4
2019-01-02    2    5
2019-01-05    3    6

Let’s take a look at the xts data structure.

print(typeof(xts_obj))
[1] "integer"
print(class(xts_obj))
[1] "xts" "zoo"
str(xts_obj)
An xts object on 2019-01-01 / 2019-01-05 containing: 
  Data:    integer [3, 2]
  Index:   Date [3] (TZ: "UTC")
attributes(xts_obj)
$dim
[1] 3 2

$index
[1] 1546300800 1546387200 1546646400
attr(,"tzone")
[1] "UTC"
attr(,"tclass")
[1] "Date"

$class
[1] "xts" "zoo"

Type ?xts in the R console to see other options in the xts() function/constructor.

You can get back the matrix and time index in an xts object by deconstructing it.

# coredata() extracts the matrix
coredata(xts_obj)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
# index() extracts the time index
index(xts_obj)
[1] "2019-01-01" "2019-01-02" "2019-01-05"

Again, in most cases, you will load your data from external sources. Let’s load the same New York City birth data, and turn it into an xts object.

# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")

# first turn it to ts. ts constructor only need start and frequecy inputs so it's easy. 
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))

# now turn it to xts using the as.xts() function
birth_xts <- as.xts(birth_ts)
head(birth_xts)
           [,1]
Jan 1946 26.663
Feb 1946 23.598
Mar 1946 26.931
Apr 1946 24.740
May 1946 25.806
Jun 1946 24.364

To deal with irregular spaced time series, you can first import the data using zoo.read() and obtain a zoo object and then turn the zoo object into xts object using as.xts().

msft_zoo <- read.zoo("https://raw.githubusercontent.com/eijoac/datasets/master/msft_stock.csv", 
                     index.column = 1, sep = ",", format = "%Y-%m-%d",
                     read = read.csv)
msft_xts <- as.xts(msft_zoo)
head(msft_xts)
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2007-01-03     29.91     30.25    29.40      29.86    76935100      22.57483
2007-01-04     29.70     29.97    29.44      29.81    45774500      22.53703
2007-01-05     29.63     29.75    29.45      29.64    44607200      22.40850
2007-01-08     29.65     30.10    29.53      29.93    50220200      22.62775
2007-01-09     30.00     30.18    29.73      29.96    44636600      22.65043
2007-01-10     29.80     29.89    29.43      29.66    55017400      22.42362
# you can export an xts object to a csv file using write.zoo() as xts object is a zoo object as well
# write.zoo(msft_xts, sep = ",", file = "msft_stock.csv")

The point of having a special data structure xts is not just to store time series, functions tailored for xts objects also make processing them easy. For example, subsetting is time-aware.

# get data from 2018
head(msft_xts["2018"])
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2018-01-02     86.13     86.31    85.50      85.95    22483800      84.48741
2018-01-03     86.06     86.51    85.97      86.35    26061400      84.88061
2018-01-04     86.59     87.66    86.57      87.11    21912000      85.62768
2018-01-05     87.66     88.41    87.43      88.19    23407100      86.68930
2018-01-08     88.20     88.58    87.60      88.28    22113000      86.77776
2018-01-09     88.65     88.73    87.86      88.22    19484300      86.71879
# get data from last week of the series
last(msft_xts, "1 week")
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2019-02-04    102.87    105.80   102.77     105.74    31315100        105.74
2019-02-05    106.06    107.27   105.96     107.22    27325400        107.22
2019-02-06    107.00    107.00   105.53     106.03    20609800        106.03
2019-02-07    105.19    105.59   104.29     105.27    29747000        105.27
# get data for last day
last(msft_xts, 1)
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2019-02-07    105.19    105.59   104.29     105.27    29747000        105.27
# get data for all Monday in 2019
msft_xts[.indexyear(msft_xts) == (2019 - 1900) & .indexwday(msft_xts) == 1]
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2019-01-07    101.64    103.27   100.98     102.06    35656100        102.06
2019-01-14    101.90    102.87   101.26     102.05    28437100        102.05
2019-01-28    106.26    106.48   104.66     105.08    29476700        105.08
2019-02-04    102.87    105.80   102.77     105.74    31315100        105.74

There are many more ways to manipulate xts object (subset, merge, get lead & lag). Explore them on your own. Here is a good start.

3.2. Finance packages that work well with xts

Let’s first see if auto.arima() in the forecast package works on xts?

birth_xts %>%
  auto.arima(approximation = FALSE) %>%
  forecast(h = 36) %>%
  autoplot()

It looks like it works. However, why is the x-axis numbers, not dates? (Discussion here)

Many finance packages work well with xts. For example, PerformanceAnalytics is the go-to package for performance and risk analysis of financial instruments or portfolios. It understand xts.

Let’s calculate annual returns for Microsoft stocks.

library(PerformanceAnalytics)

Attaching package: 'PerformanceAnalytics'
The following object is masked from 'package:graphics':

    legend
# calculate annual returns

# step 1. convert OHLC (Open, High, Low, Close) to annual frequency
msft_yearly <- to.yearly(msft_xts) 
print(msft_yearly)
           msft_xts.Open msft_xts.High msft_xts.Low msft_xts.Close
2007-12-31         29.91         37.50        26.60          35.60
2008-12-31         35.79         35.96        17.50          19.44
2009-12-31         19.53         31.50        14.87          30.48
2010-12-31         30.62         31.58        22.73          27.91
2011-12-30         28.05         29.46        23.65          25.96
2012-12-31         26.55         32.95        26.26          26.71
2013-12-31         27.25         38.98        26.28          37.41
2014-12-31         37.35         50.05        34.63          46.45
2015-12-31         46.66         56.85        39.72          55.48
2016-12-30         54.32         64.10        48.04          62.14
2017-12-29         62.79         87.50        61.95          85.54
2018-12-31         86.13        116.18        83.83         101.57
2019-02-07         99.55        107.90        97.20         105.27
           msft_xts.Volume msft_xts.Adjusted
2007-12-31     15661695600          27.28008
2008-12-31     21296026200          15.17165
2009-12-31     15732659600          24.34551
2010-12-31     15892924000          22.75706
2011-12-30     15292282000          21.72942
2012-12-31     11984490100          22.98949
2013-12-31     12251098000          33.17337
2014-12-31      8399600600          42.31747
2015-12-31      9049822100          51.92010
2016-12-30      7812683200          59.74847
2017-12-29      5630979800          84.08439
2018-12-31      7928965600         101.57000
2019-02-07       858745800         105.27000
# step 2. calculate annual return using annual close price
msft_annual_return <- CalculateReturns(msft_yearly$msft_xts.Close)
msft_annual_return
           msft_xts.Close
2007-12-31             NA
2008-12-31    -0.45393253
2009-12-31     0.56790115
2010-12-31    -0.08431759
2011-12-30    -0.06986747
2012-12-31     0.02889060
2013-12-31     0.40059908
2014-12-31     0.24164665
2015-12-31     0.19440256
2016-12-30     0.12004324
2017-12-29     0.37656908
2018-12-31     0.18739769
2019-02-07     0.03642805

Let’s try one more exercise. Calculate portfolio returns. (This time, We will use quantmod package to load Microsoft and Apple stock data from Yahoo finance.)

library(quantmod)
Loading required package: TTR
# get data. Both are xts objects
aapl <- getSymbols("AAPL", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)
msft <- getSymbols("MSFT", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)

getSymbols() loads data into an xts object.

# aapl is an xts object
class(aapl)
[1] "xts" "zoo"
# combine aapl and msft close price as my portfolio price
my_portfolio <- cbind(aapl$AAPL.Close, msft$MSFT.Close)
colnames(my_portfolio) <- c("aapl", "msft")
head(my_portfolio)
              aapl  msft
2015-12-31 26.3150 55.48
2016-01-04 26.3375 54.80
2016-01-05 25.6775 55.05
2016-01-06 25.1750 54.05
2016-01-07 24.1125 52.17
2016-01-08 24.2400 52.33
# calcuate daily return
return_daily <- Return.calculate(my_portfolio)
return_daily <- return_daily[-1, ] # remove the first row (NA)
head(return_daily)
                    aapl          msft
2016-01-04  0.0008549908 -0.0122566747
2016-01-05 -0.0250593206  0.0045620439
2016-01-06 -0.0195696831 -0.0181653045
2016-01-07 -0.0422045693 -0.0347826289
2016-01-08  0.0052877362  0.0030669670
2016-01-11  0.0161922412 -0.0005733345
# initial weights of my portfolio
init_weights <- c(0.5, 0.5)

# buy and hold
pf_bh <- Return.portfolio(return_daily, weights = init_weights, verbose = TRUE)

# rebalancing quaterly
pf_rebal <- Return.portfolio(return_daily, weights = init_weights, rebalance_on = "weeks", verbose = TRUE)

# plot daily returns for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$returns)
plot(pf_rebal$returns)

# plot end of period values for each assets and for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$EOP.Value)
plot(pf_rebal$EOP.Value)

PerformanceAnalytics package can certain do a lot more (see its (document)[https://cran.r-project.org/web/packages/PerformanceAnalytics/PerformanceAnalytics.pdf]). Explore it more on your own.

4. dataframe/tibble with time series

As the tidyverse eco-system becomes more and more popular. Developers start to bridge the gap between the traditional time series world and the tidyverse world. See the links below for a few examples.

  1. tissible: a new time series class (tbl_ts) built on tibble.
  2. fable: an extension of forecast package to work with tissible
  3. tidyquant: “integrates the best resources for collecting and analyzing financial data, zoo, xts, quantmod, TTR, and PerformanceAnalytics, with the tidy data infrastructure of the tidyverse, allowing for seamless interaction between each.”

txbox provides conversion between many time series data structures, an attempt to unite time series data structure in R.

5. a few more R finance packages

We have seen forecast, quantmod, and PerformanceAnalytics packages so far. Refer to R taskview, in particular, finance taskview and time series taskview for an overview on what R has to offer for financial analysis. Below are a brief intro to quantmod and RQuantLib packages.

5.1 quantmod

quantmod is an R package designed to assist quantitative traders in “development, testing, and deployment of statistically based trading models.” We will take a quick look at it.

library(quantmod)

# get MSFT OHLC data from yahoo
getSymbols("MSFT", src="yahoo")
[1] "MSFT"
# display a few row of MSFT dataset
head(MSFT)
           MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
2007-01-03     29.91     30.25    29.40      29.86    76935100      21.39742
2007-01-04     29.70     29.97    29.44      29.81    45774500      21.36160
2007-01-05     29.63     29.75    29.45      29.64    44607200      21.23977
2007-01-08     29.65     30.10    29.53      29.93    50220200      21.44759
2007-01-09     30.00     30.18    29.73      29.96    44636600      21.46909
2007-01-10     29.80     29.89    29.43      29.66    55017400      21.25410
# check its data structure
str(MSFT)
An xts object on 2007-01-03 / 2024-04-02 containing: 
  Data:    double [4341, 6]
  Columns: MSFT.Open, MSFT.High, MSFT.Low, MSFT.Close, MSFT.Volume ... with 1 more column
  Index:   Date [4341] (TZ: "UTC")
  xts Attributes:
    $ src    : chr "yahoo"
    $ updated: POSIXct[1:1], format: "2024-04-03 12:34:29"
# plot using the build-in chartSeries() function
chartSeries(MSFT, theme = chartTheme("white"), subset = "last 6 months")

# add MACD plot, https://en.wikipedia.org/wiki/MACD
addMACD()

# add Bollinger bands, https://en.wikipedia.org/wiki/Bollinger_Bands
addBBands()

# plot the weekly return using plot() function, open price
plot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), main = "MSFT Weekly Return (Open Price)")

weeklyReturn(MSFT['2018-01::']) returns an xts object, ggplot() can handle it too.

# plot the weekly returns using ggplot()
library(ggplot2)
ggplot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), aes(x = Index, y = weekly.returns)) + 
  geom_line()

5.2 RQuantLib

QuantLib is a comprehensive free/open-source library for quantitative finance. RQuantLib makes part of QuantLib accessible from R (i.e. RQuantLib is an R interface to QuantLib). It provides many functions for option pricing and fixed income. It also provides a few utility functions (such as isHoliday(), isWeekend(), etc.).

Let’s try a few functions in the RQuantLib package.

library(RQuantLib)

# European Option pricing based on BS formula
# EuropeanOption(type, underlying, strike, dividendYield, 
#                riskFreeRate, maturity, volatility, discreteDividends, 
#                discreteDividendsTimeUntil)
EO <- EuropeanOption("call", 100, 100, 0.01, 0.03, 0.5, 0.4)
EO
Concise summary of valuation for EuropeanOption 
   value    delta    gamma     vega    theta      rho   divRho 
 11.6365   0.5673   0.0138  27.6336 -11.8390  22.5475 -28.3657 
# calculate European Option Implied Volatility
# EuropeanOptionImpliedVolatility(type, value, underlying, strike, dividendYield, 
#                                 riskFreeRate, maturity, volatility)
# note: the volatility argument is the Initial guess for the volatility of the underlying stock
EOImpVol <- EuropeanOptionImpliedVolatility("call", value=EO$value+0.50, 100, 100, 0.01, 0.03, 0.5, 0.4)
EOImpVol
[1] 0.4181015
attr(,"class")
[1] "EuropeanOptionImpliedVolatility" "ImpliedVolatility"              
isHoliday(calendar = "Canada", dates = as.Date("2019-07-01"))
[1] TRUE
isHoliday(calendar = "UnitedStates", dates = as.Date("2019-07-01"))
[1] FALSE

References

  1. Using R for Time Series Analysis from a Little Book of R for Time Series.
  2. Forecasting: Principles and Practice 2nd Ed (uses the forecast package)
  3. Forecasting: Principles and Practice 3nd Ed (uses the fable package)
  4. Daniel Lee’s Financial Modeling