In this boot camp, we will start with the basics of R. While it’s not hard to load a dataset and start doing simple analysis in R right away, learning the basics will benefit you in the long term.
Since you all have taken Datacamp’s Introduction to R course. This section will be mostly a review for you. We’ll introduce a few new concepts, and emphasize how different data structures are related.
R’s basic data structure can be summarized as below.
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1D | Atomic vector | List |
2D | Matrix | Dataframe |
nD | Array | - |
There are four common types of atomic vectors.
vec_character <- c("Hello,", "World!")
vec_integer <- c(1L, 2L, 3L)
vec_double <- c(1.1, 2.2, 3.3)
vec_logical <- c(TRUE, TRUE, FALSE)
# you don't need to use print. However, using print sometimes gives better display format.
# in databricks, sometimes only the last output will be displayed if not using print()
print(vec_character)
## [1] "Hello," "World!"
print(vec_integer)
## [1] 1 2 3
print(vec_double)
## [1] 1.1 2.2 3.3
print(vec_logical)
## [1] TRUE TRUE FALSE
<-
is the assignment operator. c()
is a function in base R. It combines values into a vector.
You can retrieve an element in a vector using the []
with a numeric index. R’s vector indexing starts with 1
.
# retrieve the first element of vec_double
print(vec_double[1])
## [1] 1.1
Select multiple elements in a vector is also easy.
# select multiple elements
print(vec_double[c(1, 3)])
## [1] 1.1 3.3
print(vec_double[1:2])
## [1] 1.1 2.2
print(vec_double[c(-1, -2)])
## [1] 3.3
print(vec_double[c(TRUE, FALSE, TRUE)])
## [1] 1.1 3.3
(1, 2, -3)
and (4, -5, -6)
. Perform an element-wise multiplication, and sum up the positive elements in the resulting vector.# Your code here
x1 <- c(1, 2, -3)
x2 <- c(4, -5, -6)
tp <- x1 * x2
sum(tp[tp > 0])
## [1] 22
vec_d
with values (2.0, 3.123e+2, 4.1)
, and try the following to see what happens: vec_d[0]
, vec_d[4]
, vec_d[-2]
, vec_d[c(TRUE, FALSE, NA)]
, vec_d[c(TRUE, FALSE)]
, and vec_d[]
.# Test out here
vec_d <- c(2.0, 3.123e+2, 4.1)
print(vec_d[0])
## numeric(0)
print(vec_d[4])
## [1] NA
print(vec_d[-2])
## [1] 2.0 4.1
print(vec_d[c(TRUE, FALSE, NA)])
## [1] 2 NA
print(vec_d[c(TRUE, FALSE)])
## [1] 2.0 4.1
print(vec_d[])
## [1] 2.0 312.3 4.1
Scalar in R is just a vector of length one.
a <- 1L
b <- c(1L)
print(a == b)
## [1] TRUE
print(a[1] == b[1])
## [1] TRUE
Given a vector (or an object in general), str()
is a useful function to inspect what it is composed of.
str(vec_character)
## chr [1:2] "Hello," "World!"
str(vec_integer)
## int [1:3] 1 2 3
str(vec_double)
## num [1:3] 1.1 2.2 3.3
str(vec_logical)
## logi [1:3] TRUE TRUE FALSE
Atomic vectors must be atomic. If you mix up character and numeric elements when creating an atomic vector, implicit type coercion will happen. The numeric element will be converted to character element. In general, the order of implicit coercion is character << double << integer << logical.
str(c("hello", 1L))
## chr [1:2] "hello" "1"
Implicit conversion can happen in other ways.
x <- c(TRUE, TRUE, FALSE)
sum(x)
## [1] 2
You can explicitly convert a vector to a different type too. Use the as.?
function family, as.character()
, as.numeric()
, as.double()
, as.integer()
, or as.logical()
.
x <- c(TRUE, TRUE, FALSE)
str(as.numeric(x))
## num [1:3] 1 1 0
Vectors have three properties, type, length and attributes.
determine a vector’s type: typeof()
, and the is.?
function family, is.character()
, is.double()
, is.integer()
, is.logical()
, is.numeric()
, and is.atomic()
.
determine a vector’s length: length()
determine a vector’s attributes: attr()
and attributes()
Attributes provide additional metadata for a vector. We will talk more about it a bit later.
a <- 1:3
print(is.double(a))
## [1] FALSE
print(is.integer(a))
## [1] TRUE
print(length(a))
## [1] 3
print(attributes(a))
## NULL
NULL
has its own type, NULL. It has length 0, and it is used to represent an empty vector.
# NULL
print(typeof(NULL))
## [1] "NULL"
print(length(NULL))
## [1] 0
print(is.null(c()))
## [1] TRUE
NA
is missing value / missing data. Each type of atomic vector has its own version of NA
.
# NAs
na_char <- NA_character_
na_double <- NA_real_
na_int <- NA_integer_
na_logical <- NA
str(na_char)
## chr NA
print(is.na(na_char))
## [1] TRUE
print(is.character(na_char))
## [1] TRUE
NaN
is Not a Number. For example, 0/0 would give you NaN
.
0/0
## [1] NaN
Be careful using is.na
and is.nan
on NA
and NaN
.
print(is.na(NA))
## [1] TRUE
print(is.na(NaN))
## [1] TRUE
print(is.nan(NA))
## [1] FALSE
print(is.nan(NaN))
## [1] TRUE
List is another kind of 1-dimension vector. It can contain different types of elements.
l1 <- list(
1:3,
"a",
c(TRUE, FALSE, TRUE),
c(2.3, 5.9),
c(1L, 2L)
)
str(l1)
## List of 5
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
## $ : int [1:2] 1 2
typeof(l1)
## [1] "list"
List can contain list as well. It’s recursive.
l2 <- list(list(list(1)))
str(l2)
## List of 1
## $ :List of 1
## ..$ :List of 1
## .. ..$ : num 1
Retrieving elements in a list is similar as retrieving elements in a vector.
print(l1[1:2])
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] "a"
print(l1[c(1, 3)])
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] TRUE FALSE TRUE
print(l1[c(TRUE, FALSE, FALSE, FALSE, TRUE)])
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] 1 2
Note that using []
always returns a list. To return the element in a list as it is, use [[]]
. [[]]
only returns one single element, so you can only specify a single index.
print(l1[[2]])
## [1] "a"
(2.3, 5.9)
in the below list and sum up its elements.l_ex <- list(
1:3,
list(
"a",
c(2.3, 5.9)
)
)
# your code below
sum(l_ex[[2]][[2]])
## [1] 8.2
x <- 1:3
, y <- list(x)
and z <- as.list(x)
.# Test out here
x <- 1:3
y <- list(x)
z <- as.list(x)
str(x)
## int [1:3] 1 2 3
str(y)
## List of 1
## $ : int [1:3] 1 2 3
str(z)
## List of 3
## $ : int 1
## $ : int 2
## $ : int 3
c()
function)?# Test out here
a <- list(1, 2)
b <- c(3, 4)
# now combine a and b using c() function
a <- list(1, 2)
b <- c(3, 4)
str(c(a, b))
## List of 4
## $ : num 1
## $ : num 2
## $ : num 3
## $ : num 4