Data Structures in R Programming

Data structures and Algorithms are known to make data accessing and operations easier. They are also selected or designed to be used with different algorithms. In some scenarios, it has been observed that the algorithm’s base operations have closely adhered to the design of the data structures.

What are Data Structures in R Programming?

A data structure is essentially a way to organize data in a system to facilitate effective usage of the same. The whole idea is to reduce the complexities of space and time in various tasks.

While using a programming language, different variables are essential to store different data. These variables are reserved in a memory location for storing values. Once a variable is created, some area in the memory is reserved.

Data structures are the objects that are manipulated regularly in R. They are used to store data in an organized fashion to make data manipulation and other data operations more efficient. R has many data structures. The following section will discuss them in detail.

Take Your Data Science Skills to the Next Level

Make Your Mark in Data Science

Explore Program

Vectors

Vector is one of the basic data structures in R. It is homogenous, which means that it only contains elements of the same data type. Data types can be numeric, integer, character, complex, or logical.

Vectors are created by using the c() function. Coercion takes place in a vector, from bottom to top, if the elements passed are of different data types, from logical to integer to double to character.

The typeof() function is used to check the data type of the vector, and the class() function is used to check the class of the vector.

Vec1 &amp;lt;- c(44, 25, 64, 96, 30)<br>
Vec2 &amp;lt;- c(1, FALSE, 9.8, "hello world")<br>
typeof(Vec1)<br>
typeof(Vec2)

Output:

[1] "double"<br>
[1] "character"

To delete a vector, you simply have to do the following:

Vec1 &amp;lt;- NULL<br>
Vec2 &amp;lt;- NULL

Methods to Access Vector Elements

Vectors can be accessed in the following ways:

Elements of a vector can be accessed by using their respective indexes. [ ] brackets are used to specify indexes of the elements to be accessed.

For example:

<br>
x &amp;lt;- c("Jan","Feb","March","Apr","May","June","July")<br>
y &amp;lt;- x[c(3,2,7)]<br>
print(y)<br>

Output:

[1] "March" "Feb" "July"<br>

Logical indexing, negative indexing, and 0/1 can also be used to access the elements of a vector.

For example:

<br>
x &amp;lt;- c("Jan","Feb","March","Apr","May","June","July")<br>
y &amp;lt;- x[c(TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE)]z &amp;lt;- x[c(-3,-7)]c &amp;lt;- x[c(0,0,0,1,0,0,1)]<br>
print(y)<br>
print(z)<br>
print(c)<br>
Output:<br>
[1] "Jan" "March" "June" "July"(All TRUE values are printed)<br>
[1] "Jan" "Feb" "Apr" "May" "June"(All corresponding values for negative indexes are dropped)<br>
[1] "Jan" "Jan"(All corresponding values are printed)<br>

Get 100% Hike!

Master Most in Demand Skills Now!

Vector Arithmetic

You can perform addition, subtraction, multiplication, and division on the vectors having the same number of elements in the following ways:

v1 &amp;lt;- c(4,6,7,31,45)<br>
v2 &amp;lt;- c(54,1,10,86,14,57)<br>
add.v &amp;lt;- v1+v2<br>
print(add.v)<br>
sub.v &amp;lt;- v1-v2<br>
print(sub.v)<br>
multi.v &amp;lt;- v1*v2<br>
print(multi.v)<br>
divi.v &amp;lt;- v1/v2<br>
print(divi.v)

Output:

[1]&amp;nbsp; 58&amp;nbsp;&amp;nbsp; 7&amp;nbsp; 17 117&amp;nbsp; 59 &amp;nbsp;66<br>
[1] -50&amp;nbsp;&amp;nbsp; 5&amp;nbsp; -3 -55&amp;nbsp; 31 -48<br>
[1]&amp;nbsp; 216&amp;nbsp;&amp;nbsp;&amp;nbsp; 6&amp;nbsp;&amp;nbsp; 70 2666&amp;nbsp; 630&amp;nbsp; 513<br>
[1] 0.07407407 6.00000000 0.70000000 0.36046512 3.21428571 0.15789474

Recycling Vector Elements

If arithmetic operations are performed on vectors having unequal lengths, then a vector’s elements, which are shorter in number as compared to the elements of other vectors, are recycled. For example:

v1 &amp;lt;- c(8,7,6,5,0,1)<br>
v2 &amp;lt;- c(7,15)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;<br>
add.v &amp;lt;- v1+v2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;<br>
(v2 becomes c(7,15,7,15,7,15))<br>
print(add.v)<br>
sub.v &amp;lt;- v1-v2<br>
print(sub.v)

Output:

[1] 15 22 13 20&amp;nbsp; 7 16<br>
[1]&amp;nbsp;&amp;nbsp; 1&amp;nbsp; -8&amp;nbsp; -1 -10&amp;nbsp; -7 -14

Sorting a Vector

You can sort the elements of a vector by using the sort() function in the following way:

v &amp;lt;- c(4,78,-45,6,89,678)<br>
sort.v &amp;lt;- sort(v)<br>
print(sort.v)
#Sort the elements in the reverse order<br>
revsort.v &amp;lt;- sort(v, decreasing = TRUE)<br>
print(revsort.v)&amp;nbsp;<br>
#Sorting character vectors<br>
v &amp;lt;- c("Jan","Feb","March","April")<br>
sort.v &amp;lt;- sort(v)<br>
print(sort.v)&amp;nbsp;<br>
#Sorting character vectors in reverse order<br>
revsort.v &amp;lt;- sort(v, decreasing = TRUE)<br>
print(revsort.v)

Output:

[1] -45&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp; 6 78 89 678<br>
[1] 678 89 78&amp;nbsp;&amp;nbsp; 6&amp;nbsp;&amp;nbsp; 4 -45<br>
[1] "April" "Feb" "Jan"&amp;nbsp;&amp;nbsp; "March"<br>
[1] "March" "Jan"&amp;nbsp;&amp;nbsp; "Feb"&amp;nbsp;&amp;nbsp; "April"

Lists

A list is a non-homogeneous data structure, which implies that it can contain elements of different data types. It accepts numbers, characters, lists, and even matrices and functions inside it. It is created by using the list() function.

For example:

list1&amp;lt;- list("Sam", "Green", c(8,2,67), TRUE, 51.99, 11.78,FALSE)<br>
print(list1)

Output:

[[1]]<br>
[1] "Sam"<br>
[[2]]<br>
[1] "Green"<br>
[[3]]<br>
[1]&amp;nbsp; 8&amp;nbsp; 2 67<br>
[[4]]<br>
[1] TRUE<br>
[[5]]<br>
[1] 51.99<br>
[[6]]<br>
[1] 11.78<br>
[7]]<br>
[1] FALSE

Accessing the Elements of a List

The elements of a list can be accessed by using the indices of those elements.

For example:

list2 &amp;lt;- list(matrix(c(3,9,5,1,-2,8), nrow = 2), c("Jan","Feb","Mar"), list(3,4,5))<br>
print(list2[1])<br>
print(list2[2])<br>
print(list2[3])

Output:

[[1]]<br>
[,1] [,2] [,3]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (First element of the list)<br>
[1,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp; -2<br>
[2,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 9&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp; 8<br>
[[1]]<br>
[1] "Jan" "Feb" "Mar"&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Second element of the list)<br>
[1,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp; -2<br>
[[1]]<br>
[[1]][[1]]<br>
[1] 3<br>
[[1]][[2]]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Third element of the list)<br>
[1] 4<br>
[[1]][[3]]<br>
[1] 5

Adding and Deleting the Elements of a List

You can add and delete elements only at the end of a list.

For example:

list2 &amp;lt;- list(matrix(c(3,9,5,1,-2,8), nrow = 2), c("Jan","Feb","Mar"), list(3,4,5))<br>
list2[4] &amp;lt;- “HELLO”<br>
print(list2[4])

Output:

[[1]]<br>
[1] "Hello"

Similarly,

list2[4] &amp;lt;- NULL<br>
print(list2[4])

Output:

[[1]]<br>
NULL

Updating the Elements of a List

To update a value in a list, use the following syntax:

list2[3] &amp;lt;- "Element Updated"<br>
print(list2[3])

Output:

[[1]]<br>
[1] "Element Updated"

Matrices

Matrix is a two-dimensional data structure that is homogenous, meaning that it only accepts elements of the same data type. Coercion takes place if elements of different data types are passed. It is created by using the matrix() function.

The basic syntax to create a matrix is given below:

<br>
matrix(data, nrow, ncol, byrow, dimnames)<br>
where,<br>
data = the input element of a matrix given as a vector.<br>
nrow = the number of rows to be created.<br>
ncol = the number of columns to be created.<br>
byrow = the row-wise arrangement of the elements instead of column-wise<br>
dimnames = the names of columns or rows to be created.<br>

For example:

M1 &amp;lt;- matrix(c(1:9), nrow = 3, ncol =3, byrow= TRUE)<br>
print(M1)

Output:

[,1] [,2] [,3]<br>
[1,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp; 3<br>
[2,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp; 6<br>
[3,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 7&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp; 9<br>
M2 &amp;lt;-&amp;nbsp; matrix(c(1:9), nrow = 3, ncol =3, byrow= FALSE)<br>
print(M2)

Output:

[,1] [,2] [,3]<br>
[1,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp; 7<br>
[2,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp; 8<br>
[3,]&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp; 6&amp;nbsp;&amp;nbsp;&amp;nbsp; 9

By using row and column names, a matrix can be created as follows:

rownames = c("row1", "row2", "row3")<br>
colnames = c("col1", "col2", "col3")<br>
M3 &amp;lt;- matrix(c(1:9), nrow = 3, byrow = TRUE, dimnames = list(rownames, colnames))<br>
print(M3)

Output:

col1 col2 col3<br>
row1&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp; 3<br>
row2&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp; 6<br>
row3&amp;nbsp;&amp;nbsp;&amp;nbsp; 7&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp; 9

Accessing the Elements of a Matrix

To access the elements of a matrix, row and column indices are used in the following ways:
For accessing the elements of the matrix M3 created above, use the following syntax:

print(M3[1,1])<br>
print(M3[3,3])<br>
print(M3[2,3])

Output:

[1] 1 (Element at first row and first column)<br>
[1] 9 (Element at third row and third column)<br>
[1] 6 (Element at second row and third column)

Factor

Factors are used in data analysis for statistical modeling. They are used to categorize unique values in columns, such as “Male”, “Female”, “TRUE”, “FALSE”, etc., and store them as levels. They can store both strings and integers. They are useful in columns that have a limited number of unique values.

Factors can be created using the factor() function and they take vectors as inputs.
For example:

data &amp;lt;- c("Male","Female","Male","Child","Child","Male","Female","Female")<br>
print(data)<br>
factor.data &amp;lt;- factor(data)<br>
print(factor.data)

Output:

[1] Male&amp;nbsp;&amp;nbsp; Female Male&amp;nbsp;&amp;nbsp; Child&amp;nbsp; Child&amp;nbsp; Male&amp;nbsp;&amp;nbsp; Female Female<br>
Levels: Child Female Male

For any data frame, R treats the text column as categorical data and creates factors on it.

For example: For the emp.finaldata data frame, R treats empdept as the factor.

print(is.factor(emp.finaldata$empdept))<br>
print(emp.finaldata$empdept)

Output:

[1] TRUE<br>
[1] Sales&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Marketing&amp;nbsp; HR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R &amp;amp; D&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IT&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Operations Finance&amp;nbsp; &amp;nbsp;<br>
Levels: HR Marketing R &amp;amp; D Sales Finance IT Operations

Explore the power of strip chart in R and transform raw data into insights

Data Frame

Data frame is a two-dimensional array-like structure that also resembles a table, in which each column contains values of one variable and each row contains one set of values from each column.

A data frame has the following characteristics:

The column names of a data frame should not be empty.
The row names of a data frame should be unique.
The data stored in a data frame can be a numeric, factor, or character type.
Each column should contain the same number of data items.

Creating a Data Frame

You can use the following syntax for creating a data frame in R programming:

<br>
empid &amp;lt;- c(1:4)<br>
empname &amp;lt;- c("Sam","Rob","Max","John")<br>
empdept &amp;lt;- c("Sales","Marketing","HR","R &amp;amp; D")<br>
emp.data &amp;lt;- data.frame(empid,empname,empdept)<br>
print(emp.data)<br>

Output:

Sl.No.	empid	empname	empdept
1	1	Sam	Sales
2	2	Rob	Marketing
3	3	Max	HR
4	4	John	R&D

Extracting Columns or Rows from a Data Frame

To extract a specific column from a data frame, use the following syntax:

result &amp;lt;- data.frame(emp.data$empname,emp.data$empdept)<br>
print(result)

Output:

Sr. No.	emp.data.empname	emp.data.empdept
1	Sam	Sales
2	Rob	Marketing
3	Max	HR
4	John	R&D

To extract specific rows from a data frame, use the following syntax:

result &amp;lt;- emp.data[1:2,]<br>
print(result)

Output:

Sr. No.	empid	empname	empdept
1	1	Sam	Sales
2	2	Rob	Marketing

The following code extracts the first and third rows with second and third columns respectively.

result &amp;lt;- emp.data[c(1,2),c(2,3)]<br>
print(result)

Output:

Sr. No.	empname	empdept
1	Sam	Sales
2	Max	HR

Adding a Column to a Data Frame

To add a salary column to the above data frame, you can use the following syntax:

<br>
emp.data$salary &amp;lt;- c(20000,30000,40000,27000)<br>
n &amp;lt;- emp.data<br>
print(n)<br>

Sr. No.	empid	empname	empdept	Salary
1	1	Sam	Sales	20000
2	2	Rob	Marketing	30000
3	3	Max	HR	40000
4	4	John	R & D	27000

Adding a Row to a Data Frame

To add a new row(s) to an existing data frame, you need to create a new data frame that contains the new row(s), and then merge it with the existing data frame using the rbind() function.

Push boundaries—start your free course today.

Boost Your Career with Free Data Science Learning

Explore Program

Creating a New Data Frame

<br>
emp.newdata &amp;lt;- data.frame(<br>
empid = c(5:7),<br>
empname = c("Frank","Tony","Eric"),<br>
empdept = c("IT","Operations","Finance"),<br>
salary = c(32000,51000,45000)<br>
)<br>

Merging the New Data Frame with the Existing Data Frame

<br>
emp.finaldata &amp;lt;- rbind(emp.data,emp.newdata)<br>
print(emp.finaldata)<br>

Output:

Sr. No.	empid	empname	empdept	Salary
1	1	Sam	Sales	20000
2	2	Rob	Marketing	30000
3	3	Max	HR	40000
4	4	John	R & D	27000
5	5	Frank	IT	32000
6	6	Tony	Operations	51000
7	7	Eric	Finance	45000

Arrays

Arrays refer to the type of data structure that is used to store multiple items of a similar type together. This leads to a collection of items that are stored at contiguous memory locations. This memory location is denoted by the array name. The position of an element can be calculated simply by adding an offset to its base value.

For example:

Array

Array Structure

An array consists of the following:

Array Index: The array index identifies the location of the element. The array index starts with 0.

Array Element: Array elements are items that are stored in the array.

Array Length: The array length is determined by the number of elements that can be stored by the array. In the above-mentioned example, the array length is 12.

There are two types of arrays:

One-dimensional Arrays
Multi-dimensional Arrays

One-dimensional Arrays

One- or single-dimensional arrays are the types of arrays that have array elements stored in a sequence and can be accessed in the same order. The figure given above is an example of a one-dimensional array.

Multi-dimensional Arrays

Multi-dimensional arrays are arrays that have elements stored in more than one dimension. They can be two- or three-dimensional arrays and can consist of row and column indexes.

For example:

Accessing the Elements of an Array

The elements of an array can be accessed using the following syntax:

Syntax:

arrayName[index]

In this blog, we have discussed the data structures in R programming, their different types, and how to perform simple data manipulation using data structures. In the next blog, we will discuss Control Flow statements in R. Learn the art of data analysis and problem-solving by enrolling in our Data Science course.

Related Blogs	What’s Inside
Data Mining vs Data Science	Explores data mining versus data science in purpose and techniques.
How to Pass the Google Data Engineer Certification	Provides strategies for acing the Google Data Engineer certification test.
Data Science vs Software Engineering	Compares data science and software engineering in focus and career goals.
Autoencoders in Deep Learning	Outlines autoencoders for data compression in deep learning tasks.
Top Data Science Companies in India	Showcases leading Indian companies for data science job opportunities.
Data Science Prerequisites	Details key prerequisites for starting a data science career path.
What is AutoSum?	Explains Excel’s AutoSum tool for simplifying data summation.
Data Science vs Computer Science	Examines data science versus computer science in scope and skills.
Strings in R Programming	Explains string handling in R for processing text data.
Introduction to Data Science	Offers a beginner’s guide to data science principles and uses.
Matrices in R Programming	Details matrix operations in R for computational data tasks.
R Studio	Describes R Studio as a powerful IDE for R programming and analysis.

Data Structures in R Programming

Table of content

What are Data Structures in R Programming?

Vectors

Lists

Matrices

Factor

Data Frame

Arrays

About the Author