How would you store a time series dataset in R? Given the data structures we have seen so far, you can store it in at least two ways.
These are both reasonable ways of storing time series. However, more than just storing time series data, we also need to process them efficiently, i.e. we need associated time-aware functions that can process time series (e.g. taking lags and differences, subsetting based on time, etc.). That’s why people working with time series in R tend to use specialized time-series data structures. Just like matrices and dataframes, they are built upon basic data structures together with special attributes and tailored functions/methods for time series. Several time series classes(/data structures) are popular and widely supported by many econometrics and finance packages. See CRAN time series task view for details.
ts
class. ts
is a class for equi-spaced
time series using numeric timestamps. It’s supported in base R
(out-of-the-box R)zoo
class. zoo
provides infrastructure for
regular- and irregular-spaced time series using arbitrary classes for
the timestamps (there are many R classes for dates and times)xts
class. xts
extends zoo
.
It provides uniform handling of R’s time-based data classes.timeSeries
in timeSeries
package, tis
in tis
package, etc.tsibble
class
from tsibble package. The
tsibble package in turn belongs to a big family of time series toolsets
called tidyverts. (Recall
tibble
is almost like dataframe
, and it’s the
data structure used by the Tidyverse eco-system.)tibble
as its main data
structure, but it makes tibble
“time-aware” whenever
needed. For example, it converts tibble to xts
when doing
portfolio performance analysis as the underlying function call is taken
from PerformanceAnalytics
package, which operates on xts
data structure.I will leave you to learn the (5) and (6) above on your own as they both follow the tidyverse principle, which you are already familiar with.
In this notebook, I will (briefly) discuss ts
and
xts
classes. They are the core data structures used by many
wonderful time series and finance R packages. You will see that the way
we operate on ts
and xts
objects is quite
different from what we learned so far, i.e. the tidyverse
way of manipulating dataframes/tibbles. Along learning ts
and xts
, we will also get a chance to check out a few
useful R packages for time series and financial analysis, namely forecast, quantmod, PerformanceAnalytics,
and RQuantLib.
Note: Although the forecast package is now
retired in favour of the fable
package, which works with tsibble
(see 5 above), that
doesn’t mean the xts
data structure is retired. We use the
forecast package
here to learn xts
. In any case, forecast is still a
great R package nevertheless.
ts
class (and forecast
package).ts
is a basic class for equi-spaced time series
supported by base R. Essentially, it’s a vector or matrix with class
attribute of “ts” (and an additional attribute). In the matrix case,
each column is assumed to contain a single univariate time series. A
ts
object concisely stores the time series index using
three “time series parameters” (tsp): start
,
end
, and frequency
.
We can create a ts
object using the ts()
function.
ts_obj <- ts(1:10, frequency = 4, start = c(2017, 2)) # 2nd Quarter of 1917
ts_obj
## Qtr1 Qtr2 Qtr3 Qtr4
## 2017 1 2 3
## 2018 4 5 6 7
## 2019 8 9 10
Take a look at ts
data structure
# take a look at the data structure
print(typeof(ts_obj))
## [1] "integer"
print(class(ts_obj))
## [1] "ts"
print(attributes(ts_obj))
## $tsp
## [1] 2017.25 2019.50 4.00
##
## $class
## [1] "ts"
str(ts_obj)
## Time-Series [1:10] from 2017 to 2020: 1 2 3 4 5 6 7 8 9 10
In most cases, you will load data from an external source (files or
online data). Let’s load a time series dataset (stored online) and turn
it into ts
. The dataset records the number of births per
month in New York city, from January 1946 to December 1959.
# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))
birth_ts
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012
## Nov Dec
## 1946 21.672 21.870
## 1947 21.759 22.073
## 1948 21.059 21.573
## 1949 21.519 22.025
## 1950 22.084 22.991
## 1951 22.964 23.981
## 1952 23.162 24.707
## 1953 25.246 25.180
## 1954 24.712 25.688
## 1955 25.693 26.881
## 1956 26.291 26.987
## 1957 26.634 27.735
## 1958 25.912 26.619
## 1959 26.992 27.897
class(birth_ts)
## [1] "ts"
ts
-aware base R functionsWe can process and analyze ts
objects just by using base
R functions. Let’s try a few of them.
# plot the time series
plot(birth_ts)
plot()
is a generic function that plots different graphs
depending on its input. The plot()
here actually invokes
plot.ts()
, the plotting method for ts
objects.
# plot the first difference of birth_ts
plot.ts(diff(birth_ts))
# time() create a vector of time from the time series
time(birth_ts)
## Jan Feb Mar Apr May Jun Jul Aug
## 1946 1946.000 1946.083 1946.167 1946.250 1946.333 1946.417 1946.500 1946.583
## 1947 1947.000 1947.083 1947.167 1947.250 1947.333 1947.417 1947.500 1947.583
## 1948 1948.000 1948.083 1948.167 1948.250 1948.333 1948.417 1948.500 1948.583
## 1949 1949.000 1949.083 1949.167 1949.250 1949.333 1949.417 1949.500 1949.583
## 1950 1950.000 1950.083 1950.167 1950.250 1950.333 1950.417 1950.500 1950.583
## 1951 1951.000 1951.083 1951.167 1951.250 1951.333 1951.417 1951.500 1951.583
## 1952 1952.000 1952.083 1952.167 1952.250 1952.333 1952.417 1952.500 1952.583
## 1953 1953.000 1953.083 1953.167 1953.250 1953.333 1953.417 1953.500 1953.583
## 1954 1954.000 1954.083 1954.167 1954.250 1954.333 1954.417 1954.500 1954.583
## 1955 1955.000 1955.083 1955.167 1955.250 1955.333 1955.417 1955.500 1955.583
## 1956 1956.000 1956.083 1956.167 1956.250 1956.333 1956.417 1956.500 1956.583
## 1957 1957.000 1957.083 1957.167 1957.250 1957.333 1957.417 1957.500 1957.583
## 1958 1958.000 1958.083 1958.167 1958.250 1958.333 1958.417 1958.500 1958.583
## 1959 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417 1959.500 1959.583
## Sep Oct Nov Dec
## 1946 1946.667 1946.750 1946.833 1946.917
## 1947 1947.667 1947.750 1947.833 1947.917
## 1948 1948.667 1948.750 1948.833 1948.917
## 1949 1949.667 1949.750 1949.833 1949.917
## 1950 1950.667 1950.750 1950.833 1950.917
## 1951 1951.667 1951.750 1951.833 1951.917
## 1952 1952.667 1952.750 1952.833 1952.917
## 1953 1953.667 1953.750 1953.833 1953.917
## 1954 1954.667 1954.750 1954.833 1954.917
## 1955 1955.667 1955.750 1955.833 1955.917
## 1956 1956.667 1956.750 1956.833 1956.917
## 1957 1957.667 1957.750 1957.833 1957.917
## 1958 1958.667 1958.750 1958.833 1958.917
## 1959 1959.667 1959.750 1959.833 1959.917
# subset the series to make it start from 1947 (this is just to show you how to slice a ts)
# plot the new series and add a regression line (y is birth_ts_start1947 and x is time)
birth_ts_start1947 <- window(birth_ts, start = 1947)
plot(birth_ts_start1947)
abline(reg = lm(birth_ts_start1947 ~ time(birth_ts_start1947)))
# print out the cycle of the time series
cycle(birth_ts)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1946 1 2 3 4 5 6 7 8 9 10 11 12
## 1947 1 2 3 4 5 6 7 8 9 10 11 12
## 1948 1 2 3 4 5 6 7 8 9 10 11 12
## 1949 1 2 3 4 5 6 7 8 9 10 11 12
## 1950 1 2 3 4 5 6 7 8 9 10 11 12
## 1951 1 2 3 4 5 6 7 8 9 10 11 12
## 1952 1 2 3 4 5 6 7 8 9 10 11 12
## 1953 1 2 3 4 5 6 7 8 9 10 11 12
## 1954 1 2 3 4 5 6 7 8 9 10 11 12
## 1955 1 2 3 4 5 6 7 8 9 10 11 12
## 1956 1 2 3 4 5 6 7 8 9 10 11 12
## 1957 1 2 3 4 5 6 7 8 9 10 11 12
## 1958 1 2 3 4 5 6 7 8 9 10 11 12
## 1959 1 2 3 4 5 6 7 8 9 10 11 12
# find mean number of births by year and display the year on year trend
# type = "b" mean plot both point and line
plot(aggregate(birth_ts, FUN = mean), type = "b")
# acf of the series
acf(birth_ts)
ACF - Autocorrelation Function. ACF calculates the correlation of a time series with its own lagged values. (see here)
The above plot starts with lag 0, and then lag 1, 2 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.
The ACF of our birth time series decreases slowly because it is not stationary. (It has an obvious upward trend and seasonality.)
You can plot PACF easily using pacf() function.
# pacf on the series
pacf(diff(birth_ts))
PACF - Partial ACF. PACF “gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags.” (wiki.
The above plot starts with lag 1, and then lag 2, 3 and so on. The x-axis is in year unit, i.e., 1 is 12th/1-year lag.
ACF and PACF help identify parameters for time series models, e.g., p, i, q in AR(p), MA(q), or ARIMA(p, i, q) model.
forecast
packageFrom the early time-series plot, we see the series has
Let’s fit the series with a seasonal ARIMA model
arima(p, i, q)(P, I, Q)[S]
. Instead of manually identify
the model parameters (through ACF and PACF graphs) and fit the model
using base R functions, we will use the auto.arima()
function from the forcast
package.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# fit a seasonal ARIMA model arima(p, i, q)(P, I, Q)[S]
fit <- auto.arima(birth_ts, approximation = FALSE)
fit
## Series: birth_ts
## ARIMA(3,1,0)(2,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 sar1 sar2 sma1
## -0.0784 -0.2068 -0.1828 -0.4316 -0.2916 -0.6614
## s.e. 0.0815 0.0800 0.0820 0.1186 0.1096 0.1170
##
## sigma^2 = 0.3971: log likelihood = -154.75
## AIC=323.51 AICc=324.27 BIC=344.81
Predict based on the fitted model.
# predict 36 months ahead
pred <- forecast(fit, h = 36)
autoplot(pred)
We can chain the process using pipe operator (%>%
) as
well.
birth_ts %>%
auto.arima(approximation = FALSE) %>%
forecast(h = 36) %>%
autoplot()
Decomposing and visualizing trend, seasonality, and remaining errors
can be done easily using base R decompose()
and forecast
package’s autoplot()
.
birth_ts %>%
decompose(type="multiplicative") %>%
autoplot()
You certainly should read forecast
package document to
understand the theory behind auto.arima()
and
decompose()
.
Forecast package has many more useful functions (e.g. better ACF and PACF plots), I’ll leave you to explore them on your own.
zoo
and xts
classes (and
quantmod
& PerformanceAnalytics
packages)zoo
can handle regular- and irregular-spaced time series
using arbitrary classes for timestamps. xts
extends
zoo
. The main benefit of xts
is its seamless
compatibility with other packages using different time-series classes
(timeSeries
, zoo
, …).” (see xts
faq.)
You can think of xts
object as a matrix plus a time
index.
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# create a 3x2 matrix
x <- matrix(1:6, ncol = 2)
print(x)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# create time index
idx <- as.Date(c("2019-01-01", "2019-01-02", "2019-01-05"))
print(idx)
## [1] "2019-01-01" "2019-01-02" "2019-01-05"
# create an xts objet
xts_obj <- xts(x, order.by = idx)
xts_obj
## [,1] [,2]
## 2019-01-01 1 4
## 2019-01-02 2 5
## 2019-01-05 3 6
Let’s take a look at the xts
data structure.
print(typeof(xts_obj))
## [1] "integer"
print(class(xts_obj))
## [1] "xts" "zoo"
str(xts_obj)
## An 'xts' object on 2019-01-01/2019-01-05 containing:
## Data: int [1:3, 1:2] 1 2 3 4 5 6
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
attributes(xts_obj)
## $dim
## [1] 3 2
##
## $index
## [1] 1546300800 1546387200 1546646400
## attr(,"tzone")
## [1] "UTC"
## attr(,"tclass")
## [1] "Date"
##
## $class
## [1] "xts" "zoo"
Type ?xts
in the R console to see other options in the
xts()
function/constructor.
You can get back the matrix and time index in an xts object by deconstructing it.
# coredata() extracts the matrix
coredata(xts_obj)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# index() extracts the time index
index(xts_obj)
## [1] "2019-01-01" "2019-01-02" "2019-01-05"
Again, in most cases, you will load your data from external sources. Let’s load the same New York City birth data, and turn it into an xts object.
# scan() is a base R function to read data into a vector or list.
birth <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
# first turn it to ts. ts constructor only need start and frequecy inputs so it's easy.
birth_ts <- ts(birth, frequency = 12, start = c(1946, 1))
# now turn it to xts using the as.xts() function
birth_xts <- as.xts(birth_ts)
head(birth_xts)
## [,1]
## Jan 1946 26.663
## Feb 1946 23.598
## Mar 1946 26.931
## Apr 1946 24.740
## May 1946 25.806
## Jun 1946 24.364
To deal with irregular spaced time series, you can first import the
data using zoo.read()
and obtain a zoo
object
and then turn the zoo
object into xts
object
using as.xts()
.
msft_zoo <- read.zoo("https://raw.githubusercontent.com/eijoac/datasets/master/msft_stock.csv",
index.column = 1, sep = ",", format = "%Y-%m-%d",
read = read.csv)
msft_xts <- as.xts(msft_zoo)
head(msft_xts)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2007-01-03 29.91 30.25 29.40 29.86 76935100 22.57483
## 2007-01-04 29.70 29.97 29.44 29.81 45774500 22.53703
## 2007-01-05 29.63 29.75 29.45 29.64 44607200 22.40850
## 2007-01-08 29.65 30.10 29.53 29.93 50220200 22.62775
## 2007-01-09 30.00 30.18 29.73 29.96 44636600 22.65043
## 2007-01-10 29.80 29.89 29.43 29.66 55017400 22.42362
# you can export an xts object to a csv file using write.zoo() as xts object is a zoo object as well
# write.zoo(msft_xts, sep = ",", file = "msft_stock.csv")
The point of having a special data structure xts
is not
just to store time series, functions tailored for xts
objects also make processing them easy. For example, subsetting is
time-aware.
# get data from 2018
head(msft_xts["2018"])
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2018-01-02 86.13 86.31 85.50 85.95 22483800 84.48741
## 2018-01-03 86.06 86.51 85.97 86.35 26061400 84.88061
## 2018-01-04 86.59 87.66 86.57 87.11 21912000 85.62768
## 2018-01-05 87.66 88.41 87.43 88.19 23407100 86.68930
## 2018-01-08 88.20 88.58 87.60 88.28 22113000 86.77776
## 2018-01-09 88.65 88.73 87.86 88.22 19484300 86.71879
# get data from last week of the series
last(msft_xts, "1 week")
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2019-02-04 102.87 105.80 102.77 105.74 31315100 105.74
## 2019-02-05 106.06 107.27 105.96 107.22 27325400 107.22
## 2019-02-06 107.00 107.00 105.53 106.03 20609800 106.03
## 2019-02-07 105.19 105.59 104.29 105.27 29747000 105.27
# get data for last day
last(msft_xts, 1)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2019-02-07 105.19 105.59 104.29 105.27 29747000 105.27
# get data for all Monday in 2019
msft_xts[.indexyear(msft_xts) == (2019 - 1900) & .indexwday(msft_xts) == 1]
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2019-01-07 101.64 103.27 100.98 102.06 35656100 102.06
## 2019-01-14 101.90 102.87 101.26 102.05 28437100 102.05
## 2019-01-28 106.26 106.48 104.66 105.08 29476700 105.08
## 2019-02-04 102.87 105.80 102.77 105.74 31315100 105.74
There are many more ways to manipulate xts
object
(subset, merge, get lead & lag). Explore them on your own. Here is a
good start, https://rpubs.com/mohammadshadan/288218.
xts
Let’s first see if auto.arima()
in the
forecast
package works on xts?
birth_xts %>%
auto.arima(approximation = FALSE) %>%
forecast(h = 36) %>%
autoplot()
It looks like it works. However, why is the x-axis numbers, not dates? (Discussion here)
Many finance packages work well with xts
. For example,
PerformanceAnalytics
is the go-to package for performance
and risk analysis of financial instruments or portfolios. It understand
xts
.
Let’s calculate annual returns for Microsoft stocks.
library(PerformanceAnalytics)
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
# calculate annual returns
# step 1. convert OHLC (Open, High, Low, Close) to annual frequency
msft_yearly <- to.yearly(msft_xts)
print(msft_yearly)
## msft_xts.Open msft_xts.High msft_xts.Low msft_xts.Close
## 2007-12-31 29.91 37.50 26.60 35.60
## 2008-12-31 35.79 35.96 17.50 19.44
## 2009-12-31 19.53 31.50 14.87 30.48
## 2010-12-31 30.62 31.58 22.73 27.91
## 2011-12-30 28.05 29.46 23.65 25.96
## 2012-12-31 26.55 32.95 26.26 26.71
## 2013-12-31 27.25 38.98 26.28 37.41
## 2014-12-31 37.35 50.05 34.63 46.45
## 2015-12-31 46.66 56.85 39.72 55.48
## 2016-12-30 54.32 64.10 48.04 62.14
## 2017-12-29 62.79 87.50 61.95 85.54
## 2018-12-31 86.13 116.18 83.83 101.57
## 2019-02-07 99.55 107.90 97.20 105.27
## msft_xts.Volume msft_xts.Adjusted
## 2007-12-31 15661695600 27.28008
## 2008-12-31 21296026200 15.17165
## 2009-12-31 15732659600 24.34551
## 2010-12-31 15892924000 22.75706
## 2011-12-30 15292282000 21.72942
## 2012-12-31 11984490100 22.98949
## 2013-12-31 12251098000 33.17337
## 2014-12-31 8399600600 42.31747
## 2015-12-31 9049822100 51.92010
## 2016-12-30 7812683200 59.74847
## 2017-12-29 5630979800 84.08439
## 2018-12-31 7928965600 101.57000
## 2019-02-07 858745800 105.27000
# step 2. calculate annual return using annual close price
msft_annual_return <- CalculateReturns(msft_yearly$msft_xts.Close)
msft_annual_return
## msft_xts.Close
## 2007-12-31 NA
## 2008-12-31 -0.45393253
## 2009-12-31 0.56790115
## 2010-12-31 -0.08431759
## 2011-12-30 -0.06986747
## 2012-12-31 0.02889060
## 2013-12-31 0.40059908
## 2014-12-31 0.24164665
## 2015-12-31 0.19440256
## 2016-12-30 0.12004324
## 2017-12-29 0.37656908
## 2018-12-31 0.18739769
## 2019-02-07 0.03642805
Let’s try one more exercise. Calculate portfolio returns. (This time,
We will use quantmod
package to load Microsoft and Apple
stock data from Yahoo finance.)
library(quantmod)
## Loading required package: TTR
# get data. Both are xts objects
aapl <- getSymbols("AAPL", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)
msft <- getSymbols("MSFT", from = "2015-12-31", to = "2018-12-31", auto.assign = FALSE)
getSymbols() loads data into an xts
object.
# aapl is an xts object
class(aapl)
## [1] "xts" "zoo"
# combine aapl and msft close price as my portfolio price
my_portfolio <- cbind(aapl$AAPL.Close, msft$MSFT.Close)
colnames(my_portfolio) <- c("aapl", "msft")
head(my_portfolio)
## aapl msft
## 2015-12-31 26.3150 55.48
## 2016-01-04 26.3375 54.80
## 2016-01-05 25.6775 55.05
## 2016-01-06 25.1750 54.05
## 2016-01-07 24.1125 52.17
## 2016-01-08 24.2400 52.33
# calcuate daily return
return_daily <- Return.calculate(my_portfolio)
return_daily <- return_daily[-1, ] # remove the first row (NA)
head(return_daily)
## aapl msft
## 2016-01-04 0.0008549876 -0.0122566871
## 2016-01-05 -0.0250593261 0.0045620439
## 2016-01-06 -0.0195697011 -0.0181653046
## 2016-01-07 -0.0422045697 -0.0347826278
## 2016-01-08 0.0052877555 0.0030669735
## 2016-01-11 0.0161922442 -0.0005733422
# initial weights of my portfolio
init_weights <- c(0.5, 0.5)
# buy and hold
pf_bh <- Return.portfolio(return_daily, weights = init_weights, verbose = TRUE)
# rebalancing quaterly
pf_rebal <- Return.portfolio(return_daily, weights = init_weights, rebalance_on = "weeks", verbose = TRUE)
# plot daily returns for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$returns)
plot(pf_rebal$returns)
# plot end of period values for each assets and for both strategies
par(mfrow = c(2, 1), mar = c(2, 4, 2, 2))
plot(pf_bh$EOP.Value)
plot(pf_rebal$EOP.Value)
PerformanceAnalytics
package can certain do a lot more
(https://github.com/braverock/PerformanceAnalytics).
Explore it more on your own.
As the tidyverse
eco-system becomes more and more popular. Developers start to bridge
the gap between the traditional time series world and the
tidyverse
world. See the links below for a few
examples.
tbl_ts
) built on tibble
.forecast
package to work with tissible
zoo
, xts
, quantmod
,
TTR
, and PerformanceAnalytics
, with the tidy
data infrastructure of the tidyverse
, allowing for seamless
interaction between each.”txbox provides conversion between many time series data structures, an attempt to unite time series data structure in R.
We have seen forecast
, quantmod
, and
PerformanceAnalytics
packages so far. Refer to R taskview, in
particular, finance
taskview and time series
taskview for an overview on what R has to offer for financial
analysis. Below are a brief intro to quantmod
and
RQuantLib
packages.
quantmod is an R package designed to assist quantitative traders in “development, testing, and deployment of statistically based trading models.” We will take a quick look at it.
library(quantmod)
# get MSFT OHLC data from yahoo
getSymbols("MSFT", src="yahoo")
## [1] "MSFT"
# display a few row of MSFT dataset
head(MSFT)
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2007-01-03 29.91 30.25 29.40 29.86 76935100 21.57304
## 2007-01-04 29.70 29.97 29.44 29.81 45774500 21.53692
## 2007-01-05 29.63 29.75 29.45 29.64 44607200 21.41409
## 2007-01-08 29.65 30.10 29.53 29.93 50220200 21.62361
## 2007-01-09 30.00 30.18 29.73 29.96 44636600 21.64529
## 2007-01-10 29.80 29.89 29.43 29.66 55017400 21.42855
# check its data structure
str(MSFT)
## An 'xts' object on 2007-01-03/2023-03-17 containing:
## Data: num [1:4080, 1:6] 29.9 29.7 29.6 29.6 30 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2023-03-20 15:50:45"
# plot using the build-in chartSeries() function
chartSeries(MSFT, theme = chartTheme("white"), subset = "last 6 months")
# add MACD plot, https://en.wikipedia.org/wiki/MACD
addMACD()
# add Bollinger bands, https://en.wikipedia.org/wiki/Bollinger_Bands
addBBands()
# plot the weekly return using plot() function, open price
plot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), main = "MSFT Weekly Return (Open Price)")
weeklyReturn(MSFT['2018-01::'])
returns an
xts
object, ggplot()
can handle it too.
# plot the weekly returns using ggplot()
library(ggplot2)
ggplot(weeklyReturn(MSFT['2018-01::', 'MSFT.Open']), aes(x = Index, y = weekly.returns)) +
geom_line()
QuantLib is a comprehensive
free/open-source library for quantitative finance. RQuantLib
makes part of QuantLib accessible from R (i.e. RQuantLib is an R
interface to QuantLib). It provides many functions for option pricing
and fixed income. It also provides a few utility functions (such as
isHoliday()
, isWeekend()
, etc.).
Let’s try a few functions in RQuantLib
package.
library(RQuantLib)
# European Option pricing based on BS formula
# EuropeanOption(type, underlying, strike, dividendYield,
# riskFreeRate, maturity, volatility, discreteDividends,
# discreteDividendsTimeUntil)
EO <- EuropeanOption("call", 100, 100, 0.01, 0.03, 0.5, 0.4)
EO
## Concise summary of valuation for EuropeanOption
## value delta gamma vega theta rho divRho
## 11.6365 0.5673 0.0138 27.6336 -11.8390 22.5475 -28.3657
# calculate European Option Implied Volatility
# EuropeanOptionImpliedVolatility(type, value, underlying, strike, dividendYield,
# riskFreeRate, maturity, volatility)
# note: the volatility argument is the Initial guess for the volatility of the underlying stock
EOImpVol <- EuropeanOptionImpliedVolatility("call", value=EO$value+0.50, 100, 100, 0.01, 0.03, 0.5, 0.4)
EOImpVol
## [1] 0.4181015
## attr(,"class")
## [1] "EuropeanOptionImpliedVolatility" "ImpliedVolatility"
isHoliday(calendar = "Canada", dates = as.Date("2019-07-01"))
## [1] TRUE
isHoliday(calendar = "UnitedStates", dates = as.Date("2019-07-01"))
## [1] FALSE
Using R for Time Series Analysis from a Little Book of R for Time Series.
Forecasting: Principles and Practice 2nd Ed (uses the forecast package)
Forecasting: Principles and Practice 3nd Ed (uses the fable package)
Daniel Lee’s Financial Modeling