1. Intro

In this boot camp, we will start with the basics of R. While it’s not hard to load a dataset and start doing simple analysis in R right away, learning the basics will benefit you in the long term.

2. Data Structure

Since you all have taken Datacamp’s Introduction to R course. This section will be mostly a review for you. We’ll introduce a few new concepts, and emphasize how different data structures are related.

R’s basic data structure can be summarized as below.

Dimension Homogeneous Heterogeneous
1D Atomic vector List
2D Matrix Dataframe
nD Array -

2.1 Atomic vector

2.1.1 Creating atomic vectors

There are four common types of atomic vectors.

vec_character <- c("Hello,", "World!")
vec_integer <- c(1L, 2L, 3L)
vec_double <- c(1.1, 2.2, 3.3)
vec_logical <- c(TRUE, TRUE, FALSE)

# you don't need to use print. However, using print sometimes gives better display format.
# in databricks, sometimes only the last output will be displayed if not using print()
print(vec_character)
## [1] "Hello," "World!"
print(vec_integer)
## [1] 1 2 3
print(vec_double)
## [1] 1.1 2.2 3.3
print(vec_logical)
## [1]  TRUE  TRUE FALSE

<- is the assignment operator. c() is a function in base R. It combines values into a vector.

You can retrieve an element in a vector using the [] with a numeric index. R’s vector indexing starts with 1.

# retrieve the first element of vec_double
print(vec_double[1])
## [1] 1.1

Select multiple elements in a vector is also easy.

# select multiple elements
print(vec_double[c(1, 3)])
## [1] 1.1 3.3
print(vec_double[1:2])
## [1] 1.1 2.2
print(vec_double[c(-1, -2)])
## [1] 3.3
print(vec_double[c(TRUE, FALSE, TRUE)])
## [1] 1.1 3.3

Exercises

  1. Define two vectors (1, 2, -3) and (4, -5, -6). Perform an element-wise multiplication, and sum up the positive elements in the resulting vector.
# Your code here
x1 <- c(1, 2, -3)
x2 <- c(4, -5, -6)

tp <- x1 * x2
sum(tp[tp > 0])
## [1] 22
  1. Create a double vector vec_d with values (2.0, 3.123e+2, 4.1), and try the following to see what happens: vec_d[0], vec_d[4], vec_d[-2], vec_d[c(TRUE, FALSE, NA)], vec_d[c(TRUE, FALSE)], and vec_d[].
# Test out here
vec_d <- c(2.0, 3.123e+2, 4.1)

print(vec_d[0])
## numeric(0)
print(vec_d[4])
## [1] NA
print(vec_d[-2])
## [1] 2.0 4.1
print(vec_d[c(TRUE, FALSE, NA)])
## [1]  2 NA
print(vec_d[c(TRUE, FALSE)])
## [1] 2.0 4.1
print(vec_d[])
## [1]   2.0 312.3   4.1

2.1.2 Scalar

Scalar in R is just a vector of length one.

a <- 1L
b <- c(1L)

print(a == b)
## [1] TRUE
print(a[1] == b[1])
## [1] TRUE

2.1.3 Display the structure of a vector (or object)

Given a vector (or an object in general), str() is a useful function to inspect what it is composed of.

str(vec_character)
##  chr [1:2] "Hello," "World!"
str(vec_integer)
##  int [1:3] 1 2 3
str(vec_double)
##  num [1:3] 1.1 2.2 3.3
str(vec_logical)
##  logi [1:3] TRUE TRUE FALSE

2.1.4 Coercion

Atomic vectors must be atomic. If you mix up character and numeric elements when creating an atomic vector, implicit type coercion will happen. The numeric element will be converted to character element. In general, the order of implicit coercion is character << double << integer << logical.

str(c("hello", 1L))
##  chr [1:2] "hello" "1"

Implicit conversion can happen in other ways.

x <- c(TRUE, TRUE, FALSE)
sum(x)
## [1] 2

You can explicitly convert a vector to a different type too. Use the as.? function family, as.character(), as.numeric(), as.double(), as.integer(), or as.logical().

x <- c(TRUE, TRUE, FALSE)
str(as.numeric(x))
##  num [1:3] 1 1 0

2.1.5 Properties of vectors

Vectors have three properties, type, length and attributes.

  • determine a vector’s type: typeof(), and the is.? function family, is.character(), is.double(), is.integer(), is.logical(), is.numeric(), and is.atomic().

  • determine a vector’s length: length()

  • determine a vector’s attributes: attr() and attributes()

Attributes provide additional metadata for a vector. We will talk more about it a bit later.

a <- 1:3

print(is.double(a))
## [1] FALSE
print(is.integer(a))
## [1] TRUE
print(length(a))
## [1] 3
print(attributes(a))
## NULL

2.1.6 NULL, NA and NaN

NULL has its own type, NULL. It has length 0, and it is used to represent an empty vector.

# NULL
print(typeof(NULL))
## [1] "NULL"
print(length(NULL))
## [1] 0
print(is.null(c()))
## [1] TRUE

NA is missing value / missing data. Each type of atomic vector has its own version of NA.

# NAs
na_char <- NA_character_
na_double <- NA_real_
na_int <- NA_integer_
na_logical <- NA

str(na_char)
##  chr NA
print(is.na(na_char))
## [1] TRUE
print(is.character(na_char))
## [1] TRUE

NaN is Not a Number. For example, 0/0 would give you NaN.

0/0
## [1] NaN

Be careful using is.na and is.nan on NA and NaN.

print(is.na(NA))
## [1] TRUE
print(is.na(NaN))
## [1] TRUE
print(is.nan(NA))
## [1] FALSE
print(is.nan(NaN))
## [1] TRUE

2.2 List

List is another kind of 1-dimension vector. It can contain different types of elements.

l1 <- list(
  1:3, 
  "a", 
  c(TRUE, FALSE, TRUE), 
  c(2.3, 5.9),
  c(1L, 2L)
)

str(l1)
## List of 5
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9
##  $ : int [1:2] 1 2
typeof(l1)
## [1] "list"

List can contain list as well. It’s recursive.

l2 <- list(list(list(1)))
str(l2)
## List of 1
##  $ :List of 1
##   ..$ :List of 1
##   .. ..$ : num 1

Retrieving elements in a list is similar as retrieving elements in a vector.

print(l1[1:2])
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a"
print(l1[c(1, 3)])
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1]  TRUE FALSE  TRUE
print(l1[c(TRUE, FALSE, FALSE, FALSE, TRUE)])
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 1 2

Note that using [] always returns a list. To return the element in a list as it is, use [[]]. [[]] only returns one single element, so you can only specify a single index.

print(l1[[2]])
## [1] "a"

Exercise

  1. Retrieve the vector (2.3, 5.9) in the below list and sum up its elements.
l_ex <- list(
  1:3, 
  list(
    "a", 
    c(2.3, 5.9)
  )
)

# your code below
sum(l_ex[[2]][[2]])
## [1] 8.2
  1. What’s the difference between x <- 1:3, y <- list(x) and z <- as.list(x).
# Test out here
x <- 1:3
y <- list(x)
z <- as.list(x)

str(x)
##  int [1:3] 1 2 3
str(y)
## List of 1
##  $ : int [1:3] 1 2 3
str(z)
## List of 3
##  $ : int 1
##  $ : int 2
##  $ : int 3
  1. What if you combine a list and a vector (using the c() function)?
# Test out here
a <- list(1, 2)
b <- c(3, 4)

# now combine a and b using c() function
a <- list(1, 2)
b <- c(3, 4)

str(c(a, b))
## List of 4
##  $ : num 1
##  $ : num 2
##  $ : num 3
##  $ : num 4