How would you store a time series dataset in R? Given the data structures we have seen so far, you can store it in at least two ways.
These are both reasonable ways of storing time series. However, more than just storing time series data, we also need to process them efficiently, i.e. we need associated time-aware functions that can process time series (e.g. taking lags and differences, subsetting based on time, etc.). That’s why people working with time series in R tend to use specialized time-series data structures. Just like matrices and dataframes, they are built upon basic data structures together with special attributes and tailored functions/methods for time series. Several time series classes(/data structures) are popular and widely supported by many econometrics and finance packages. See CRAN time series task view for details.
ts
class. ts
is a class for equi-spaced time series using numeric timestamps. It’s supported in base R (out-of-the-box R)zoo
class. zoo
provides infrastructure for regular- and irregular-spaced time series using arbitrary classes for the timestamps (there are many R classes for dates and times)xts
class. xts
extends zoo
. It provides uniform handling of R’s time-based data classes.timeSeries
in timeSeries package, tis
in tis package, etc.tsibble
class. (Recall tibble
is almost like dataframe
, and it’s the data structure used by the tidyverse
eco-system.)We will briefly discuss ts
and xts
classes. You will see that the way we operate on ts
and xts
objects is quite different from what we learned so far, i.e. the tidyverse
way of manipulating dataframes/tibbles. Along learning ts
and xts
, we will also get a chance to check out a few useful R packages for financial analysis.
ts
class (and forecast
package).ts
is a basic class for equi-spaced time series supported by base R. Essentially, it’s a vector or matrix with class attribute of “ts” (and an additional attribute). In the matrix case, each column is assumed to contain a single univariate time series. A ts
object concisely stores the time series index using three “time series parameters” (tsp): start
, end
, and frequency
.
We can create a ts
object using the ts()
function.
ts_obj <- ts(1:10, frequency = 4, start = c(2017, 2)) # 2nd Quarter of 1917
ts_obj
## Qtr1 Qtr2 Qtr3 Qtr4
## 2017 1 2 3
## 2018 4 5 6 7
## 2019 8 9 10
Take a look at ts
data structure
# take a look at the data structure
print(typeof(ts_obj))
## [1] "integer"
print(class(ts_obj))
## [1] "ts"
print(attributes(ts_obj))
## $tsp
## [1] 2017.25 2019.50 4.00
##
## $class
## [1] "ts"
str(ts_obj)
## Time-Series [1:10] from 2017 to 2020: 1 2 3 4 5 6 7 8 9 10
In most cases, you will load data from an external source (files or online data). Let’s load a time series dataset (stored online) and turn it into ts
. The dataset records the number of births per month in New York city, from January 1946 to December 1959.
# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))
birth_ts
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
## Nov Dec
## 1946 21.672 21.870
## 1947 21.759 22.073
## 1948 21.059 21.573
## 1949 21.519 22.025
## 1950 22.084 22.991
## 1951 22.964 23.981
## 1952 23.162 24.707
## 1953 25.246 25.180
## 1954 24.712 25.688
## 1955 25.693 26.881
## 1956 26.291 26.987
## 1957 26.634 27.735
## 1958 25.912 26.619
## 1959 26.992 27.897
class(birth_ts)
## [1] "ts"
ts
-aware base R functionsWe can process and analyze ts
objects just by using base R functions. Let’s try a few of them.
# plot the time series
plot(birth_ts)
plot()
is a generic function that plots different graphs depending on its input. The plot()
here actually invokes plot.ts()
, the plotting method for ts
objects.
# plot the first difference of birth_ts
plot.ts(diff(birth_ts))
# time() create a vector of time from the time series
time(birth_ts)
## Jan Feb Mar Apr May Jun Jul
## 1946 1946.000 1946.083 1946.167 1946.250 1946.333 1946.417 1946.500
## 1947 1947.000 1947.083 1947.167 1947.250 1947.333 1947.417 1947.500
## 1948 1948.000 1948.083 1948.167 1948.250 1948.333 1948.417 1948.500
## 1949 1949.000 1949.083 1949.167 1949.250 1949.333 1949.417 1949.500
## 1950 1950.000 1950.083 1950.167 1950.250 1950.333 1950.417 1950.500
## 1951 1951.000 1951.083 1951.167 1951.250 1951.333 1951.417 1951.500
## 1952 1952.000 1952.083 1952.167 1952.250 1952.333 1952.417 1952.500
## 1953 1953.000 1953.083 1953.167 1953.250 1953.333 1953.417 1953.500
## 1954 1954.000 1954.083 1954.167 1954.250 1954.333 1954.417 1954.500
## 1955 1955.000 1955.083 1955.167 1955.250 1955.333 1955.417 1955.500
## 1956 1956.000 1956.083 1956.167 1956.250 1956.333 1956.417 1956.500
## 1957 1957.000 1957.083 1957.167 1957.250 1957.333 1957.417 1957.500
## 1958 1958.000 1958.083 1958.167 1958.250 1958.333 1958.417 1958.500
## 1959 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417 1959.500
## Aug Sep Oct Nov Dec
## 1946 1946.583 1946.667 1946.750 1946.833 1946.917
## 1947 1947.583 1947.667 1947.750 1947.833 1947.917
## 1948 1948.583 1948.667 1948.750 1948.833 1948.917
## 1949 1949.583 1949.667 1949.750 1949.833 1949.917
## 1950 1950.583 1950.667 1950.750 1950.833 1950.917
## 1951 1951.583 1951.667 1951.750 1951.833 1951.917
## 1952 1952.583 1952.667 1952.750 1952.833 1952.917
## 1953 1953.583 1953.667 1953.750 1953.833 1953.917
## 1954 1954.583 1954.667 1954.750 1954.833 1954.917
## 1955 1955.583 1955.667 1955.750 1955.833 1955.917
## 1956 1956.583 1956.667 1956.750 1956.833 1956.917
## 1957 1957.583 1957.667 1957.750 1957.833 1957.917
## 1958 1958.583 1958.667 1958.750 1958.833 1958.917
## 1959 1959.583 1959.667 1959.750 1959.833 1959.917
# subset the series to make it start from 1947 (this is just to show you how to slice a ts)
# plot the new series and add a regression line (y is birth_ts_start1947 and x is time)
birth_ts_start1947 <- window(birth_ts, start = 1947)
plot(birth_ts_start1947)
abline(reg = lm(birth_ts_start1947 ~ time(birth_ts_start1947)))
# print out the cycle of the time series
cycle(birth_ts)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1946 1 2 3 4 5 6 7 8 9 10 11 12
## 1947 1 2 3 4 5 6 7 8 9 10 11 12
## 1948 1 2 3 4 5 6 7 8 9 10 11 12
## 1949 1 2 3 4 5 6 7 8 9 10 11 12
## 1950 1 2 3 4 5 6 7 8 9 10 11 12
## 1951 1 2 3 4 5 6 7 8 9 10 11 12
## 1952 1 2 3 4 5 6 7 8 9 10 11 12
## 1953 1 2 3 4 5 6 7 8 9 10 11 12
## 1954 1 2 3 4 5 6 7 8 9 10 11 12
## 1955 1 2 3 4 5 6 7 8 9 10 11 12
## 1956 1 2 3 4 5 6 7 8 9 10 11 12
## 1957 1 2 3 4 5 6 7 8 9 10 11 12
## 1958 1 2 3 4 5 6 7 8 9 10 11 12
## 1959 1 2 3 4 5 6 7 8 9 10 11 12
# find mean number of births by year and display the year on year trend
# type = "b" mean plot both point and line
plot(aggregate(birth_ts, FUN = mean), type = "b")
# acf of the series
acf(birth_ts)
ACF - Autocorrelation Function. ACF calculates the correlation of a time series with its own lagged values. (see here)
The above plot starts with lag 0, and then lag 1, 2 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.
The ACF of our birth time series decreases slowly because it is not stationary. (It has an obvious upward trend and seasonality.)
You can plot PACF easily using pacf() function.
# pacf on the series
pacf(diff(birth_ts))
PACF - Partial ACF. PACF “gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags.” (wiki.
The above plot starts with lag 1, and then lag 2, 3 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.
ACF and PACF help identify parameters for time series models, e.g., p, i, q in AR(p), MA(q), or ARIMA(p, i, q) model.
forecast
packageFrom the early time-series plot, we see the series has
Let’s fit the series with a seasonal ARIMA model arima(p, i, q)(P, I, Q)[S]
. Instead of manually identify the model parameters (through ACF and PACF graphs) and fit the model using base R functions, we will use the auto.arima()
function from the forcast
package.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# fit a seasonal ARIMA model arima(p, i, q)(P, I, Q)[S]
fit <- auto.arima(birth_ts, approximation = FALSE)
fit
## Series: birth_ts
## ARIMA(3,1,0)(2,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 sar1 sar2 sma1
## -0.0784 -0.2068 -0.1828 -0.4316 -0.2916 -0.6614
## s.e. 0.0815 0.0800 0.0820 0.1186 0.1096 0.1170
##
## sigma^2 estimated as 0.3971: log likelihood=-154.75
## AIC=323.51 AICc=324.27 BIC=344.81
Predict based on the fitted model.
# predict 36 months ahead
pred <- forecast(fit, h = 36)
autoplot(pred)
We can chain the process using pipe operator (%>%
) as well.
birth_ts %>%
auto.arima(approximation = FALSE) %>%
forecast(h = 36) %>%
autoplot()
Decomposing and visualizing trend, seasonality, and remaining errors can be done easily using base R decompose()
and forecast package’s autoplot()
.
birth_ts %>%
decompose(type="multiplicative") %>%
autoplot()
You certainly should read forecast
package document to understand the theory behind auto.arima()
and decompose()
.
Forecast package has many more useful functions (e.g. better ACF and PACF plots), I’ll leave you to explore them on your own.
zoo
and xts
classes (and quantmod
& PerformanceAnalytics
packages)zoo
can handle regular- and irregular-spaced time series using arbitrary classes for timestamps. xts
extends zoo
. The main benefit of xts
is its seamless compatibility with other packages using different time-series classes (timeSeries
, zoo
, …)." (see xts
faq.)
You can think of xts
object as a matrix plus a time index.
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# create a 3x2 matrix
x <- matrix(1:6, ncol = 2)
print(x)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# create time index
idx <- as.Date(c("2019-01-01", "2019-01-02", "2019-01-05"))
print(idx)
## [1] "2019-01-01" "2019-01-02" "2019-01-05"
# create an xts objet
xts_obj <- xts(x, order.by = idx)
xts_obj
## [,1] [,2]
## 2019-01-01 1 4
## 2019-01-02 2 5
## 2019-01-05 3 6
Let’s take a look at the xts
data structure.
print(typeof(xts_obj))
## [1] "integer"
print(class(xts_obj))
## [1] "xts" "zoo"
str(xts_obj)
## An 'xts' object on 2019-01-01/2019-01-05 containing:
## Data: int [1:3, 1:2] 1 2 3 4 5 6
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
attributes(xts_obj)
## $dim
## [1] 3 2
##
## $index
## [1] 1546300800 1546387200 1546646400
## attr(,"tzone")
## [1] "UTC"
## attr(,"tclass")
## [1] "Date"
##
## $class
## [1] "xts" "zoo"
Type ?xts
in the R console to see other options in the xts()
function/constructor.
You can get back the matrix and time index in an xts object by deconstructing it.
# coredata() extracts the matrix
coredata(xts_obj)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# index() extracts the time index
index(xts_obj)
## [1] "2019-01-01" "2019-01-02" "2019-01-05"
Again, in most cases, you will load your data from external sources. Let’s load the same New York City birth data, and turn it into an xts object.
# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
# first turn it to ts. ts constructor only need start and frequecy inputs so it's easy.
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))
# now turn it to xts using the as.xts() function
birth_xts <- as.xts(birth_ts)
head(birth_xts)
## [,1]
## Jan 1946 26.663
## Feb 1946 23.598
## Mar 1946 26.931
## Apr 1946 24.740
## May 1946 25.806
## Jun 1946 24.364
To deal with irregular spaced time series, you can first import the data using zoo.read()
and obtain a zoo
object and then turn the zoo
object into xts
object using as.xts()
.
msft_zoo <- read.zoo("https://raw.githubusercontent.com/eijoac/datasets/master/msft_stock.csv",
index.column = 1, sep = ",", format = "%Y-%m-%d",
read = read.csv)
msft_xts <- as.xts(msft_zoo)
head(msft_xts)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2007-01-03 29.91 30.25 29.40 29.86 76935100
## 2007-01-04 29.70 29.97 29.44 29.81 45774500
## 2007-01-05 29.63 29.75 29.45 29.64 44607200
## 2007-01-08 29.65 30.10 29.53 29.93 50220200
## 2007-01-09 30.00 30.18 29.73 29.96 44636600
## 2007-01-10 29.80 29.89 29.43 29.66 55017400
## MSFT.Adjusted
## 2007-01-03 22.57483
## 2007-01-04 22.53703
## 2007-01-05 22.40850
## 2007-01-08 22.62775
## 2007-01-09 22.65043
## 2007-01-10 22.42362
# you can export an xts object to a csv file using write.zoo() as xts object is a zoo object as well
# write.zoo(msft_xts, sep = ",", file = "msft_stock.csv")
The point of having a special data structure xts
is not just to store time series, functions tailored for xts
objects also make processing them easy. For example, subsetting is time-aware.
# get data from 2018
head(msft_xts["2018"])
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2018-01-02 86.13 86.31 85.50 85.95 22483800
## 2018-01-03 86.06 86.51 85.97 86.35 26061400
## 2018-01-04 86.59 87.66 86.57 87.11 21912000
## 2018-01-05 87.66 88.41 87.43 88.19 23407100
## 2018-01-08 88.20 88.58 87.60 88.28 22113000
## 2018-01-09 88.65 88.73 87.86 88.22 19484300
## MSFT.Adjusted
## 2018-01-02 84.48741
## 2018-01-03 84.88061
## 2018-01-04 85.62768
## 2018-01-05 86.68930
## 2018-01-08 86.77776
## 2018-01-09 86.71879
# get data from last week of the series
last(msft_xts, "1 week")
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2019-02-04 102.87 105.80 102.77 105.74 31315100
## 2019-02-05 106.06 107.27 105.96 107.22 27325400
## 2019-02-06 107.00 107.00 105.53 106.03 20609800
## 2019-02-07 105.19 105.59 104.29 105.27 29747000
## MSFT.Adjusted
## 2019-02-04 105.74
## 2019-02-05 107.22
## 2019-02-06 106.03
## 2019-02-07 105.27
# get data for last day
last(msft_xts, 1)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2019-02-07 105.19 105.59 104.29 105.27 29747000
## MSFT.Adjusted
## 2019-02-07 105.27
# get data for all Monday in 2019
msft_xts[.indexyear(msft_xts) == (2019 - 1900) & .indexwday(msft_xts) == 1]
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2019-01-07 101.64 103.27 100.98 102.06 35656100
## 2019-01-14 101.90 102.87 101.26 102.05 28437100
## 2019-01-28 106.26 106.48 104.66 105.08 29476700
## 2019-02-04 102.87 105.80 102.77 105.74 31315100
## MSFT.Adjusted
## 2019-01-07 102.06
## 2019-01-14 102.05
## 2019-01-28 105.08
## 2019-02-04 105.74
There are many more ways to manipulate xts
object (subset, merge, get lead & lag). Explore them on your own. Here is a good start, https://rpubs.com/mohammadshadan/288218.
xts
Let’s first see if auto.arima()
in the forecast
package works on xts?
birth_xts %>%
auto.arima(approximation = FALSE) %>%
forecast(h = 36) %>%
autoplot()
It looks like it works. However, why is the x-axis numbers, not dates? (Discussion here)
Many finance packages work well with xts
. For example, PerformanceAnalytics
is the go-to package for performance and risk analysis of financial instruments or portfolios. It understand xts
.
Let’s calculate annual returns for Microsoft stocks.
library(PerformanceAnalytics)
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
# calculate annual returns
# step 1. convert OHLC (Open, High, Low, Close) to annual frequency
msft_yearly <- to.yearly(msft_xts)
print(msft_yearly)
## msft_xts.Open msft_xts.High msft_xts.Low msft_xts.Close
## 2007-12-31 29.91 37.50 26.60 35.60
## 2008-12-31 35.79 35.96 17.50 19.44
## 2009-12-31 19.53 31.50 14.87 30.48
## 2010-12-31 30.62 31.58 22.73 27.91
## 2011-12-30 28.05 29.46 23.65 25.96
## 2012-12-31 26.55 32.95 26.26 26.71
## 2013-12-31 27.25 38.98 26.28 37.41
## 2014-12-31 37.35 50.05 34.63 46.45
## 2015-12-31 46.66 56.85 39.72 55.48
## 2016-12-30 54.32 64.10 48.04 62.14
## 2017-12-29 62.79 87.50 61.95 85.54
## 2018-12-31 86.13 116.18 83.83 101.57
## 2019-02-07 99.55 107.90 97.20 105.27
## msft_xts.Volume msft_xts.Adjusted
## 2007-12-31 15661695600 27.28008
## 2008-12-31 21296026200 15.17165
## 2009-12-31 15732659600 24.34551
## 2010-12-31 15892924000 22.75706
## 2011-12-30 15292282000 21.72942
## 2012-12-31 11984490100 22.98949
## 2013-12-31 12251098000 33.17337
## 2014-12-31 8399600600 42.31747
## 2015-12-31 9049822100 51.92010
## 2016-12-30 7812683200 59.74847
## 2017-12-29 5630979800 84.08439
## 2018-12-31 7928965600 101.57000
## 2019-02-07 858745800 105.27000
# step 2. calculate annual return using annual close price
msft_annual_return <- CalculateReturns(msft_yearly$msft_xts.Close)
msft_annual_return
## msft_xts.Close
## 2007-12-31 NA
## 2008-12-31 -0.45393253
## 2009-12-31 0.56790115
## 2010-12-31 -0.08431759
## 2011-12-30 -0.06986747
## 2012-12-31 0.02889060
## 2013-12-31 0.40059908
## 2014-12-31 0.24164665
## 2015-12-31 0.19440256
## 2016-12-30 0.12004324
## 2017-12-29 0.37656908
## 2018-12-31 0.18739769
## 2019-02-07 0.03642805
Let’s try one more exercise. Calculate portfolio returns. (This time, We will use quantmod
package to load Microsoft and Apple stock data from Yahoo finance.)
library(quantmod)
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
# get data. Both are xts objects
aapl <- getSymbols("AAPL", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
msft <- getSymbols("MSFT", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)
getSymbols() loads data into an xts
object.
# aapl is an xts object
class(aapl)
## [1] "xts" "zoo"
# combine aapl and msft close price as my portfolio price
my_portfolio <- cbind(aapl$AAPL.Close, msft$MSFT.Close)
colnames(my_portfolio) <- c("aapl", "msft")
head(my_portfolio)
## aapl msft
## 2015-12-31 105.26 55.48
## 2016-01-04 105.35 54.80
## 2016-01-05 102.71 55.05
## 2016-01-06 100.70 54.05
## 2016-01-07 96.45 52.17
## 2016-01-08 96.96 52.33
# calcuate daily return
return_daily <- Return.calculate(my_portfolio)
return_daily <- return_daily[-1, ] # remove the first row (NA)
head(return_daily)
## aapl msft
## 2016-01-04 0.0008549876 -0.0122566871
## 2016-01-05 -0.0250593170 0.0045620439
## 2016-01-06 -0.0195696818 -0.0181653046
## 2016-01-07 -0.0422045693 -0.0347826278
## 2016-01-08 0.0052877347 0.0030669735
## 2016-01-11 0.0161922444 -0.0005733422
# initial weights of my portfolio
init_weights <- c(0.5, 0.5)
# buy and hold
pf_bh <- Return.portfolio(return_daily, weights = init_weights, verbose = TRUE)
# rebalancing quaterly
pf_rebal <- Return.portfolio(return_daily, weights = init_weights, rebalance_on = "weeks", verbose = TRUE)
# plot daily returns for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$returns)
plot(pf_rebal$returns)
# plot end of period values for each assets and for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$EOP.Value)
plot(pf_rebal$EOP.Value)
PerformanceAnalytics
package can certain do a lot more (https://github.com/braverock/PerformanceAnalytics). Explore it more on your own.
As the tidyverse
eco-system becomes more and more popular. Developers start to bridge the gap between the traditional time series world and the tidyverse
world. See the links below for a few examples.
tbl_ts
) built on tibble
.forecast
package to work with tissible
zoo
, xts
, quantmod
, TTR
, and PerformanceAnalytics
, with the tidy data infrastructure of the tidyverse
, allowing for seamless interaction between each.”"txbox provides conversion between many time series data structures, an attempt to unite time series data structure in R.
We have seen forecast
, quantmod
, and PerformanceAnalytics
packages so far. Refer to R taskview, in particular, finance taskview and time series taskview for an overview on what R has to offer for financial analysis. Below are a brief intro to quantmod
and RQuantLib
packages.
quantmod is an R package designed to assist quantitative traders in “development, testing, and deployment of statistically based trading models.” We will take a quick look at it.
library(quantmod)
# get MSFT OHLC data from yahoo
getSymbols("MSFT", src="yahoo")
## [1] "MSFT"
# display a few row of MSFT dataset
head(MSFT)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
## 2007-01-03 29.91 30.25 29.40 29.86 76935100
## 2007-01-04 29.70 29.97 29.44 29.81 45774500
## 2007-01-05 29.63 29.75 29.45 29.64 44607200
## 2007-01-08 29.65 30.10 29.53 29.93 50220200
## 2007-01-09 30.00 30.18 29.73 29.96 44636600
## 2007-01-10 29.80 29.89 29.43 29.66 55017400
## MSFT.Adjusted
## 2007-01-03 22.18531
## 2007-01-04 22.14815
## 2007-01-05 22.02185
## 2007-01-08 22.23731
## 2007-01-09 22.25961
## 2007-01-10 22.03671
# check its data structure
str(MSFT)
## An 'xts' object on 2007-01-03/2020-02-24 containing:
## Data: num [1:3308, 1:6] 29.9 29.7 29.6 29.6 30 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2020-02-25 11:33:37"
# plot using the build-in chartSeries() function
chartSeries(MSFT, theme = chartTheme("white"), subset = "last 6 months")
# add MACD plot, https://en.wikipedia.org/wiki/MACD
addMACD()
# add Bollinger bands, https://en.wikipedia.org/wiki/Bollinger_Bands
addBBands()
# plot the weekly return using plot() function, open price
plot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), main = "MSFT Weekly Return (Open Price)")
weeklyReturn(MSFT['2018-01::'])
returns an xts
object, ggplot()
can handle it too.
# plot the weekly returns using ggplot()
library(ggplot2)
ggplot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), aes(x = Index, y = weekly.returns)) +
geom_line()
QuantLib is a comprehensive free/open-source library for quantitative finance. RQuantLib makes part of QuantLib accessible from R (i.e. RQuantLib is an R interface to QuantLib). It provides many functions for option pricing and fixed income. It also provides a few utility functions (such as isHoliday()
, isWeekend()
, etc.).
Let’s try a few functions in RQuantLib
package.
library(RQuantLib)
# European Option pricing based on BS formula
# EuropeanOption(type, underlying, strike, dividendYield,
# riskFreeRate, maturity, volatility, discreteDividends,
# discreteDividendsTimeUntil)
EO <- EuropeanOption("call", 100, 100, 0.01, 0.03, 0.5, 0.4)
EO
## Concise summary of valuation for EuropeanOption
## value delta gamma vega theta rho divRho
## 11.6365 0.5673 0.0138 27.6336 -11.8390 22.5475 -28.3657
# calculate European Option Implied Volatility
# EuropeanOptionImpliedVolatility(type, value, underlying, strike, dividendYield,
# riskFreeRate, maturity, volatility)
# note: the volatility argument is the Initial guess for the volatility of the underlying stock
EOImpVol <- EuropeanOptionImpliedVolatility("call", value=EO$value+0.50, 100, 100, 0.01, 0.03, 0.5, 0.4)
EOImpVol
## [1] 0.4181015
## attr(,"class")
## [1] "EuropeanOptionImpliedVolatility" "ImpliedVolatility"
isHoliday(calendar = "Canada", dates = as.Date("2019-07-01"))
## [1] TRUE
isHoliday(calendar = "UnitedStates", dates = as.Date("2019-07-01"))
## [1] FALSE