Data Structures

Data structures are used to store data in an organized fashion in order to make data manipulation and other data operations more efficient.
There are five types of Data Structures in R Programming which are mentioned below:

  • Vector
  • List
  • Matrix
  • Data Frame
  • Factor

Vector

Vector is one of the basic data structures in R programming. It is homogenous in nature, which means that it only contains elements of the same data type. Data types can be numeric, integer, character, complex or logical.
The vector in R programming is created using the c() function. Coercion takes place in a vector from lower to top, if the elements passed are of different data types from Logical to Integer to Double to Character.
The typeof() function is used to check the data type of the vector, and class() function is used to check the class of a vector.
For example:

Vec1 <- c(44, 25, 64, 96, 30)
Vec2 <- c(1, FALSE, 9.8, "hello world")
typeof(Vec1)
typeof(Vec2)

Output:

[1] "double"
[1] "character"

To delete a vector, we simply do the following

Vec1 <- NULL
Vec2 <- NULL

Enroll yourself in R Programming Training and give a head-start to your career in R Programming!

Accessing Vector Elements

Elements of a vector can be accessed by using their respective indexes.[ ] brackets are used to specify indexes of the elements to be accessed.
For example:

x <- c("Jan","Feb","March","Apr","May","June","July")
y <- x[c(3,2,7)]
print(y)
Output:
[1] "March" "Feb"   "July"

We can also use logical indexing, negative indexing, and 0/1 to access the elements of a vector:
For example:

x <- c("Jan","Feb","March","Apr","May","June","July")
y <- x[c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE)]z <- x[c(-3,-7)]c <- x[c(0,0,0,1,0,0,1)]
print(y)
print(z)
print(c)

Output:

[1] "Jan"   "March" "June" "July"(All TRUE values are printed)
[1] "Jan" "Feb" "Apr" "May" "June"(All corresponding values for negative indexes are dropped)
[1] "Jan" "Jan"(All corresponding values are printed)

Vector Arithmetic

We can perform addition, subtraction, multiplication, and division on vectors having the same number of elements in the following ways:

v1 <- c(4,6,7,31,45)
v2 <- c(54,1,10,86,14,57)
add.v <- v1+v2
print(add.v)
sub.v <- v1-v2
print(sub.v)
multi.v <- v1*v2
print(multi.v)
divi.v <- v1/v2
print(divi.v)

Output:

[1]  58   7  17 117  59  66
[1] -50   5  -3 -55  31 -48
[1]  216    6   70 2666  630  513
[1] 0.07407407 6.00000000 0.70000000 0.36046512 3.21428571 0.15789474

Watch this R Programming for Beginners Video Tutorials

Data Structures in R Programming Data Structures Data structures are used to store data in an organized fashion in order to make data manipulation and other data operations more efficient. There are five types of Data Structures in R Programming which are mentioned below: Vector List Matrix Data Frame Factor Vector Vector is one of

Recycling Vector Elements

If arithmetic operations are performed on vectors having unequal lengths, then the vector’s elements which are shorter in number as compared to the other vector, are recycled. For example:

v1 <- c(8,7,6,5,0,1)
v2 <- c(7,15)                               
add.v <- v1+v2                                     
(v2 becomes c(7,15,7,15,7,15))
print(add.v)
sub.v <- v1-v2
print(sub.v)

Output:

[1] 15 22 13 20  7 16
[1]   1  -8  -1 -10  -7 -14

Want to get certified in R! Learn R Programming from top R Programming experts and excel in your career with Intellipaat’s R Programming Certification!

Sorting a Vector

We can sort the elements of a vector by using the sort() function in the following way.

v <- c(4,78,-45,6,89,678)
sort.v <- sort(v)
print(sort.v)
#Sort the elements in the reverse order
revsort.v <- sort(v, decreasing = TRUE)
print(revsort.v) 
#Sorting character vectors
v <- c("Jan","Feb","March","April")
sort.v <- sort(v)
print(sort.v) 
#Sorting character vectors in reverse order
revsort.v <- sort(v, decreasing = TRUE)
print(revsort.v)

Output:

[1] -45   4   6 78 89 678
[1] 678 89 78   6   4 -45
[1] "April" "Feb" "Jan"   "March"
[1] "March" "Jan"   "Feb"   "April"

List in R Programming

A list in R programming is a non-homogenous data structure, which implies that it can contain elements of different data types. It accepts numbers, characters, lists, and even matrices and functions inside it. It is created using the list() function.
For example:

list1<- list("Sam", "Green", c(8,2,67), TRUE, 51.99, 11.78,FALSE)
print(list1)

Output:

[[1]]
[1] "Sam"
[[2]]
[1] "Green"
[[3]]
[1]  8  2 67
[[4]]
[1] TRUE
[[5]]
[1] 51.99
[[6]]
[1] 11.78
[7]]
[1] FALSE

Accessing Elements of a List

Elements of a list can be accessed by using the indices of those elements.
For Example:

list2 <- list(matrix(c(3,9,5,1,-2,8), nrow = 2), c("Jan","Feb","Mar"), list(3,4,5))
print(list2[1])
print(list2[2])
print(list2[3])

Output:

[[1]]
[,1] [,2] [,3]          (First element of the list)
[1,]    3    5   -2
[2,]    9    1    8
[[1]]
[1] "Jan" "Feb" "Mar"        (Second element of the list)
[1,]    3    5   -2
[[1]]
[[1]][[1]]
[1] 3
[[1]][[2]]                      (Third element of the list)
[1] 4
[[1]][[3]]
[1] 5

Wish to crack R Programming job interviews? Intellipaat’s Top Apache R Programming Interview Questions are meant only for you!

Adding, Deleting elements of a List

We can add and delete elements only at the end of a list.
For example:

list2 <- list(matrix(c(3,9,5,1,-2,8), nrow = 2), c("Jan","Feb","Mar"), list(3,4,5))
list2[4] <- “HELLO”
print(list2[4])

Output:

[[1]]
[1] "Hello"

Similarly,

list2[4] <- NULL
print(list2[4])

Output:

[[1]]
NULL

Updating Elements of a List

To update a value in a list, use the following syntax:

list2[3] <- "Element Updated"
print(list2[3])

Output:

[[1]]
[1] "Element Updated"

Matrix in R Programming

The matrix in R programming is a 2-dimensional data structure that is homogenous in nature, which means that it only accepts elements of the same data type. Coercion takes place if elements of different data types are passed. It is created using the matrix() function.
The basic syntax to create a matrix is given below:
matrix(data, nrow, ncol, byrow, dimnames)
where,
data = the input element of a matrix given as a vector.
nrow = the number of rows to be created.
ncol = the number of columns to be created.
byrow = the row-wise arrangement of the elements instead of column-wise
dimnames = the names of columns/rows to be created.
For example:

M1 <- matrix(c(1:9), nrow = 3, ncol =3, byrow= TRUE)
print(M1)

Output:

[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
M2 <-  matrix(c(1:9), nrow = 3, ncol =3, byrow= FALSE)
print(M2)

Output:

[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

By using row and column names, a matrix can be created as follows:

rownames = c("row1", "row2", "row3")
colnames = c("col1", "col2", "col3")
M3 <- matrix(c(1:9), nrow = 3, byrow = TRUE, dimnames = list(rownames, colnames))
print(M3)

Output:

col1 col2 col3
row1    1    2    3
row2    4    5    6
row3    7    8    9

Have you got more queries? Come to our R Programming Community and get them clarified today!

Accessing Elements of a Matrix

To access the elements of a matrix, row and column indices are used in the following ways:
For accessing the elements of the matrix M3 created above, use the following syntax:

print(M3[1,1])
print(M3[3,3])
print(M3[2,3])

Output:

[1] 1 (Element at first row and first column)
[1] 9 (Element at third row and third column)
[1] 6 (Element at second row and third column)

Data Frame

A data frame in R programming is a 2-dimensional array-like structure that also resembles a table, in which each column contains values of one variable and each row contains one set of values from each column.
A data frame has the following characteristics:

  • The column names of a data frame should not be empty.
  • Row names should be unique.
  • Data stored in a data frame can be numeric, factor or character type.
  • Each column should contain the same number of data items.

Learn more about R Programming from this R Programming Training in Toronto to get ahead in your career!

Creating a Data Frame

Use the following syntax for creating a data frame in R programming:

empid <- c(1:4)
empname <- c("Sam","Rob","Max","John")
empdept <- c("Sales","Marketing","HR","R & D")
emp.data <- data.frame(empid,empname,empdept)
print(emp.data)

Output:

Sl.No.empidempnameempdept
11SamSales
22RobMarketing
33MaxHR
44JohnR & D

Extracting Columns/Rows from a Data Frame

To extract a specific column from a data frame, use the following syntax:

result <- data.frame(emp.data$empname,emp.data$empdept)
print(result)

Output:

Sl.No.emp.data.empnameemp.data.empdept
1SamSales
2RobMarketing
3MaxHR
4JohnR & D

To extract specific rows from a data frame, use the following syntax:

result <- emp.data[1:2,]
print(result)

Output:

Sl.No.empidempnameempdept
11SamSales
22RobMarketing

The following code extracts the first and third rows with second and third columns respectively.

result <- emp.data[c(1,2),c(2,3)]
print(result)

Output:

Sl.No.empnameempdept
1SamSales
2MaxHR

Adding a Column to a Data Frame

To add a salary column to the above Data Frame, use the following syntax:

emp.data$salary <- c(20000,30000,40000,27000)
n <- emp.data
print(n)
Sl.No.empidempnameempdeptSalary
11SamSales20000
22RobMarketing30000
33MaxHR40000
44JohnR & D27000

Adding a Row to a Data Frame

To add new rows to an existing Data Frame, we need to create a new data frame, which contains the new rows, and then merge it with the existing data frame using the rbind() function. This way, we will get the final Data Frame.

Creating a new Data Frame
emp.newdata <-   data.frame(
empid = c(5:7),
empname = c("Frank","Tony","Eric"),
empdept = c("IT","Operations","Finance"),
salary = c(32000,51000,45000)
)

Merging the Created Data Frame with the Existing One:

emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

Output:

Sl.No.empidempnameempdeptSalary
11SamSales20000
22RobMarketing30000
33MaxHR40000
44JohnR & D27000
55FrankIT32000
66TonyOperations51000
77EricFinance45000

Are you interested in learning R programming from experts? Enroll in our R programming Course in Bangalore now!

Factor

Factors in R programming are used in data analysis for statistical modeling. They are used to categorize unique values in columns, like “Male, “Female”, “TRUE”, “FALSE”etc., and store them as levels. They can store both strings and integers. They are useful in columns that have a limited number of unique values.
Factors can be created using the factor() function and they take vectors as inputs.
For example:

data <- c("Male","Female","Male","Child","Child","Male","Female","Female")
print(data)
factor.data <- factor(data)
print(factor.data)

Output:

[1] Male   Female Male   Child  Child  Male   Female Female
Levels: Child Female Male

For any Data Frame, R treats the text column as categorical data and creates factors on it.
For example: For the emp.finaldata Data Frame R treats empdept as a factor.

print(is.factor(emp.finaldata$empdept))
print(emp.finaldata$empdept)

Output:

[1] TRUE
[1] Sales      Marketing  HR         R & D      IT         Operations Finance   
Levels: HR     Marketing     R & D     Sales    Finance     IT Operations

In this tutorial, we learned what data structures in R programming are, their different types, and how to perform simple data manipulation using data structures. In the next session, we are going to talk about Control Flow statements in R. Let’s meet there!

Recommended Videos

Leave a Reply

Your email address will not be published. Required fields are marked *