Introduction to R
R is an open-source programming language for statistical computing and graphics. R is extensible, and has a large collection of high-quality user contributed packages that provides easy tools for common data analysis tasks. This mini-course will introduce you the fundamentals of R programming with a focus on data management, data visualization and quantitative finance applications in R.
What To Prepare
Please install R and RStudio Desktop before the first session.
RStudio is the most popular IDE (Integrated Development Environment) for R. Most people use RStudio if they want to write some R code.
If you encounter technical difficulties installing the software, you can instead create a free RStudio Cloud account so you can run R and RStudio in the cloud via your browser.
An alternative to RStudio Cloud is the UofT JupyterHub/RStudio system. Go to its home page, choose the RStudio option, and click Log in to start. You will need your UTORid.
We will also use Google Colab, and I assume you all have a Google account.
Google Colab lets you combine code and notes in a “notebook”. It is a great setup to get started with R programming.
UofT JupyterHub offers a similar notebook setting. Go to its home page, choose the Jupyter Notebook option, and click Log in to start. You will need your UTORid.
Part 1 (Overview & Basics)
- Slides (intro1-overview and intro2-basics)
- Motivation Examples
- Analyze portfolio performance (R Script: performance_analysis.R)
- Perform sentiment analysis on an earnings call transcript (sentiment dictionary vs language model)
- R script: earnings_call.R
- Microsoft Q2 2024 earnings call Word Cloud
- Recognize handwritten digits, a deep learning “Hello World” example
- R script 1 (using Tensorflow for R): dl_hello_world.R
- R script 2 (using Torch for R): dl_hello_world_torch.R
- Basic Data and Programming Structures
- Data Science Workflow - A Regression Example (Housing prices and clean air)
- Linear Regression - Base R (R Notebook )
- Linear Regression - Tidyverse & Others (R Notebook )
- Data Import
- Data Manipulation
- Modelling
- Report (regression report example)
- Additional materials on data and programming structures (from past workshops; optional)
- Data structures
- Programming structures
- Reading list
- R for Data Science (Chapter 1 Intro, 4 Workflow: basics, 5 Data transformation, 6 Workflow: scripts, 8 Workflow: projects, 10 Tibbles, 11 Data import, 18 Pipes, 19 Functions, 20 Vectors, and 21 Iteration.)
Part 2 (Data Manipulation)
- Slides
- Code
- Reading list
- R for Data Science (Chapter 5 Data transformation, 12 Tidy data, and 13 Relational data.)
- Handling larger-than-memory data on your laptop using Arrow (optional)
- Arrow, parquet, duckdb and dbplyr, the Arrow chapter in R for Data Science (2ed)
- Efficient Data Analysis on Larger-than-Memory Data with DuckDB and Arrow by Thomas Mock
- Doing More with Data: An Introduction to Arrow for R Users by Nanielle Navarro
Part 3 (Visualization)
- Slides
- Code (for ggplot2)
- R notebook:
- Quarto: part3_visualization.qmd (Let’s take this opportunity to learn about Quarto.)
- Code (for tidyquant charting, dygraph, and R markdown flexdashboard)
- R script: finance_charting.R
- R markdown: time_series_flexboard.Rmd (HTML output: time_series_flexboard.html; Let’s take this opportunity to learn about R Markdown.)
- Reading list
- R for Data Science (Chapter 3 Data visualization, 7 Exploratory Data Analysis (EDS), and 28 Graphics for communication.)
- Inspiration
- Tanya Shapiro’s GitHub Gallery
- Fronkonstin
- Data Imaginist
Part 4 (Tidymodels, Time Series and Some R Finance Packages)
- Slides (1. tidymodels; 2. time series and finance packages)
- Code
- Tidymodels (Quarto: tidymodels_intro.qmd; HTML: tidymodels_intro.html)
- Time series and finance packages (Quarto: part4_timeseries_finance_pkg.qmd; HTML: part4_timeseries_finance_pkg.html)
Resources
- From Zero to Hero
- Step 1: Hands-on Programming with R (start here if you never programmed before.)
- Step 2: R for Data Science (data science with R’s Tidyverse eco-system; 2nd ed here.)
- Step 3: Advanced R (master R)
- R Graphics
- R Graphics Cookbook
- ggplot2: Elegant Graphics for Data Analysis
- The R Graph Gallery (R graph samples with code)
- R Econometrics & Finance
- Tidy Finance with R
- Introduction to Econometrics with R
- Forecasting: Principles and Practice (2nd ed.; 3rd ed.)
- Financial Engineering Analytics: A Practice Manual Using R
- Financial Risk Modelling and Portfolio Optimization with R (free access via UofT library)
- Statistics and Data Analysis for Financial Engineering with R examples (book download; book site)
- R Machine Learning
- An Introduction to Statistical Learning / with Applications in R (you can download the book and its R code)
- Tensorflow for R (deep learning with R)
- Torch for R (deep learning with R)
- Others
- A Short R Tutorial by Germán Rodríguez
- Introductory Econometrics Examples (data and examples from Wooldridge)
- STAT545 by Jenny Bryan : Data wrangling, exploration, and analysis with R
- Programming with R (from software carpentry)
- DoSStoolkit (self-paced interactive learning modules from Uoft Dept. of Statistical Science)
- R Cheat Sheets (cheat sheets for many popular R packages)
- Many more R books here
Back to TDMDAL Computing Page