Importing Data in R
Importing data in R programming means that we can read data from external files, write data to external files, and can access those files from outside the R environment. File formats like CSV, XML, xlsx, JSON, and web data can be imported into the R environment to read the data and perform data analysis, and also the data present in the R environment can be stored in external files in the same file formats.
We have the perfect professional R Programming Training Course for you!
Before going further in this importing data in R tutorial, let’s have a quick glance at the topics that we will cover in this tutorial:
Reading CSV Files
CSV (Comma Separated Values) is a text file in which the values in columns are separated by a comma.
For importing data in the R programming environment, we have to set our working directory with the setwd() function.
For example:
setwd("C:/Users/intellipaat/Desktop/BLOG/files")
To read a csv file, we use the in-built function read.csv() that outputs the data from the file as a data frame.
Get familiar with the top R Programming Interview Questions And Answers to get a head start in your career!
For example:
read.data <- read.csv("file1.csv")
print(read.data)
Output:
Sl. No. |
empid |
empname |
empdept |
empsalary |
empstart_date |
1 |
1 |
Sam |
IT |
25000 |
03-09-2005 |
2 |
2 |
Rob |
HR |
30000 |
03-05-2005 |
3 |
3 |
Max |
Marketing |
29000 |
05-06-2007 |
4 |
4 |
John |
R&D |
35000 |
01-03-1999 |
5 |
5 |
Gary |
Finance |
32000 |
05-09-2000 |
6 |
6 |
Alex |
Tech |
20000 |
09-05-2005 |
7 |
7 |
Ivar |
Sales |
36000 |
04-04-1999 |
8 |
8 |
Robert |
Finance |
34000 |
06-08-2008 |

Analyzing a CSV File
#To print number of columns
print(ncol(read.data))
Output:
[1] 5
#To print number of rows
print(nrow(read.data))
Output:
[1] 8
#To print the range of salary packages
range.sal <- range(read.data$empsalary)
print(range.sal)
Output:
[1] 20000 36000
#To print the details of a person with the highest salary, we use the subset() function to extract variables and observations
max.sal <- subset(read.data, empsalary == max(empsalary))
print(max.sal)
Output:
Sl. No. |
empid |
empname |
empdept |
empsalary |
empstart_date |
7 |
7 |
Ivar |
Sales |
36000 |
04-04-1999 |
#To print the details of all people working in Finance department
fin.per <- subset(read.data, empdept == “Finance”)
print(fin.per)
Output:
Sl. No. |
empid |
empname |
empdept |
empsalary |
empstart_date |
5 |
5 |
Gary |
Finance |
36000 |
05-09-2000 |
8 |
8 |
Robert |
Finance |
34000 |
06-08-2008 |
For the best of career growth, check out Intellipaat’s R Programming Training in Sydney and get certified!
Writing to a CSV File
To write data to a CSV file, we use the write.csv() function. The output file is stored in the working directory of our R programming environment.
For example:
#To print the details of people having salary between 30000 and 40000 and store the results in a new file
per.sal <- subset(read.data, empsalary >= "30000" & empsalary <= "40000")
print(per.sal)
Output:
|
empid |
empname |
empdept |
empsalary |
empstart_date |
2 |
2 |
Rob |
HR |
30000 |
03-05-2002 |
4 |
4 |
John |
R&D |
35000 |
01-03-1999 |
5 |
5 |
Gary |
Finance |
32000 |
05-09-2000 |
7 |
7 |
Ivar |
Sales |
36000 |
04-04-1999 |
8 |
8 |
Robert |
Finance |
34000 |
06-08-2008 |
# Writing data into a new CSV file
write.csv(per.sal,"output.csv")
new.data <- read.csv("output.csv")
print(new.data)
Output:
|
x |
empid |
empname |
empdept |
empsalary |
empstart_date |
1 |
2 |
2 |
Rob |
HR |
30000 |
03-05-2002 |
2 |
4 |
4 |
John |
R&D |
35000 |
01-03-1999 |
3 |
5 |
5 |
Gary |
Finance |
32000 |
05-09-2000 |
4 |
7 |
7 |
Ivar |
Sales |
36000 |
04-04-1999 |
5 |
8 |
8 |
Robert |
Finance |
34000 |
06-08-2008 |
# To exclude the extra column X from the above file
write.csv(per.sal,"output.csv", row.names = FALSE)
new.data <- read.csv("output.csv")
print(new.data)
|
empid |
empname |
empdept |
empsalary |
empstart_date |
1 |
2 |
Rob |
HR |
30000 |
03-05-2002 |
2 |
4 |
John |
R&D |
35000 |
01-03-1999 |
3 |
5 |
Gary |
Finance |
32000 |
05-09-2000 |
4 |
7 |
Ivar |
Sales |
36000 |
04-04-1999 |
5 |
8 |
Robert |
Finance |
34000 |
06-08-2008 |
Reading XML Files
XML (Extensible Markup Language) file shares both data and file format on the web, and elsewhere, using the ASCII text. Like an html file, it also contains markup tags, but the tags in an XML file describe the meaning of the data contained in the file rather than the structure of the page.
For importing data in R from XML files, we need to install the XML package, which can be done as follows:
install.packages("XML")
To read XML files, we use the in-built function xmlParse().
For example:
#To load required xml package to read XML files
library("XML")
#To load other required packages
library("methods")
#To give the input file name to the function
newfile <- xmlParse(file = "file.xml")
print(newfile)
Output:
<?xml version="1.0"?>
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Sam</NAME>
<SALARY>32000</SALARY>
<STARTDATE>1/1/2001</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Rob</NAME>
<SALARY>36000</SALARY>
<STARTDATE>9/3/2006</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Max</NAME>
<SALARY>42000</SALARY>
<STARTDATE>1/5/2011</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ivar</NAME>
<SALARY>50000</SALARY>
<STARTDATE>25/1/2001</STARTDATE>
<DEPT>Tech</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Robert</NAME>
<SALARY>25000</SALARY>
<STARTDATE>13/7/2015</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Leon</NAME>
<SALARY>57000</SALARY>
<STARTDATE>5/1/2000</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Samuel</NAME>
<SALARY>45000</SALARY>
<STARTDATE>27/3/2003</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Jack</NAME>
<SALARY>24000</SALARY>
<STARTDATE>6/1/2016</STARTDATE>
<DEPT>Sales</DEPT>
</EMPLOYEE>
</RECORDS>
#To get the root node of xml file
rootnode <- xmlRoot(newfile)
#To get the number of nodes in the
rootrootsize <- xmlSize(rootnode)
print(rootsize)
Output: [1] 8
#To print a specific node
print(rootnode[1])
Output:
$EMPLOYEE
<EMPLOYEE>
<ID>1</ID>
<NAME>Sam</NAME>
<SALARY>32000</SALARY>
<STARTDATE>1/1/2001</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"
#To print elements of a particular node
print(rootnode[[1]][[1]])
print(rootnode[[1]][[3]])
print(rootnode[[1]][[5]])
Output:
<ID>1</ID>
<SALARY>32000</SALARY>
<DEPT>HR</DEPT>

Converting an XML to a Data Frame
To perform data analysis effectively after importing data in R, we convert the data in an XML file to a Data Frame. After converting, we can perform data manipulation and other operations as performed in a data frame.
For example:
library("XML")
library("methods")
#To convert the data in xml file to a data frame
xmldataframe <- xmlToDataFrame("file.xml")
print(xmldataframe)
Output:
|
ID |
NAME |
SALARY |
STARTDATE |
DEPT |
1 |
1 |
Sam |
32000 |
01/01/2001 |
HR |
2 |
2 |
Rob |
36000 |
09/03/2006 |
IT |
3 |
3 |
Max |
42000 |
01/05/2011 |
Sales |
4 |
4 |
Ivar |
50000 |
25/01/2001 |
Tech |
5 |
5 |
Robert |
25000 |
13/07/2015 |
Sales |
6 |
6 |
Leon |
57000 |
05/01/2000 |
IT |
7 |
7 |
Samuel |
45000 |
27/03/2003 |
Operations |
8 |
8 |
Jack |
24000 |
06/01/2016 |
Sales |
Reading JSON Files
JSON (JavaScript Object Notation) file is used to exchange data between a web application and a server. They are text-based human-readable files and can be edited by a normal text editor.
Importing data in R from a JSON file requires the rjson package that can be installed as follows:
install.packages("rjson")
Now to read json files, we use the in-built function from JSON() which stores the data as a list.
For example:
#To load rjson package
library("rjson")
#To give the file name to the function
newfile <- fromJSON(file = "file1.json")
#To print the file
print(newfile)
Output:
$ID
[1] "1" "2" "3" "4" "5" "6" "7" "8"
$Name
[1] "Sam" "Rob" "Max" "Robert" "Ivar" "Leon" "Samuel" "Ivar"
$Salary
[1] "32000" "27000" "35000" "25000" "37000" "41000" "36000" "51000"
$StartDate
[1] "1/1/2001" "9/3/2003" "1/5/2004" "14/11/2007" "13/7/2015" "4/3/2007"
[7] "27/3/2013" "25/7/2000"
$Dept
[1] "IT" "HR" "Tech" "HR" "Sales" "HR"
[7] "Operations" "IT"
Converting a JSON File to a Data Frame
To convert JSON file to a Data Frame, we use the as.data.frame() function.
For example:
library("rjson")
newfile <- fromJSON(file = "file1.json")
#To convert a JSON file to a data frame
jsondataframe <- as.data.frame(newfile)
print(jsondataframe)
Output:
|
ID |
NAME |
SALARY |
STARTDATE |
DEPT |
1 |
1 |
Sam |
32000 |
01/01/2001 |
IT |
2 |
2 |
Rob |
27000 |
09/03/2003 |
HR |
3 |
3 |
Max |
35000 |
01/05/2004 |
Tech |
4 |
4 |
Ivar |
25000 |
14/11/2007 |
HR |
5 |
5 |
Robert |
37000 |
13/07/2015 |
Sales |
6 |
6 |
Leon |
41000 |
04/03/2007 |
HR |
7 |
7 |
Samuel |
36000 |
27/03/2013 |
Operations |
8 |
8 |
Jack |
51000 |
25/07/2000 |
IT |
Reading Excel Files
Microsoft Excel is a very popular spreadsheet program that stores data in xls and xlsx format. We can read and write data, from and to Excel files using the readxl package in R.
To install the readxl package, run the following command
install.packages("readxl")
For importing data in R programming from an excel file, we use the read_excel() function that stores it as a data frame.
newfile <- read_excel("sheet1.xlsx)
print(newfile)
Output:
|
ID |
NAME |
DEPT |
SALARY |
AGE |
1 |
1 |
SAM |
SALES |
32000 |
35 |
2 |
2 |
ROB |
HR |
36000 |
23 |
3 |
3 |
MAC |
IT |
37000 |
40 |
4 |
4 |
IVAR |
IT |
25000 |
37 |
5 |
5 |
MAX |
R&D |
30000 |
22 |
6 |
6 |
ROBERT |
HR |
27000 |
32 |
7 |
7 |
SAMUEL |
FINANCE |
50000 |
41 |
8 |
8 |
RAGNAR |
SALES |
45000 |
29 |
Reading HTML Tables
To read HTML tables from websites and retrieve data from them, we use the XML and RCurl packages in R programming.
To install XML and RCurl packages, run the following command:
install.packages("XML")
install.packages("RCurl")
To load the packages, run the following command:
library("XML")
library("RCurl")
For example, we will fetch the ‘Ease of Doing Business Index’ table from a URL using the readHTMLTable() function which stores it as a Data Frame.
#To fetch a table from any website paste the url
url <- "https://en.wikipedia.org/wiki/Ease_of_doing_business_index#Ranking"
tabs <- getURL(url)
#To fetch the first table,if the webpage has more than one table, we use which = 1
tabs <- readHTMLTable(tabs,which = 1, stringsAsFactors = F)
head(tabs)
Output:
|
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
V9 |
V10 |
V11 |
V12 |
V13 |
1 |
Classification |
Jurisdiction |
2019 |
2018 |
2017 |
2016 |
2015 |
2014 |
2013 |
2012 |
2011 |
2010 |
2009 |
2 |
Very Easy |
New Zealand |
1 |
1 |
1 |
2 |
2 |
3 |
3 |
3 |
3 |
2 |
2 |
3 |
Very Easy |
Singapore |
2 |
2 |
2 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
4 |
Very Easy |
Denmark |
3 |
3 |
3 |
3 |
4 |
5 |
5 |
5 |
6 |
6 |
5 |
5 |
Very Easy |
Hong Kong |
4 |
5 |
4 |
4 |
3 |
2 |
2 |
2 |
2 |
3 |
4 |
6 |
Very Easy |
South Korea |
5 |
4 |
5 |
5 |
5 |
7 |
8 |
8 |
16 |
19 |
23 |
|
V14 |
V15 |
V16 |
1 |
2008 |
2007 |
2006 |
2 |
2 |
2 |
1 |
3 |
1 |
1 |
2 |
4 |
5 |
7 |
8 |
5 |
4 |
5 |
7 |
6 |
30 |
23 |
27 |
Have you got more queries? Come to our R Programming Community and get them clarified today!
We use the str() function to analyze the structure of the data frame.
For example:
str(tabs)
Output:
'data.frame': 191 obs. of 16 variables:
$ V1 : chr "Classification" "Very Easy" "Very Easy" "Very Easy" ...
$ V2 : chr "Jurisdiction" "New Zealand" "Singapore" "Denmark" ...
$ V3 : chr "2019" "1" "2" "3" ...
$ V4 : chr "2018" "1" "2" "3" ...
$ V5 : chr "2017" "1" "2" "3" ...
$ V6 : chr "2016" "2" "1" "3" ...
$ V7 : chr "2015" "2" "1" "4" ...
$ V8 : chr "2014" "3" "1" "5" ...
$ V9 : chr "2013" "3" "1" "5" ...
$ V10: chr "2012" "3" "1" "5" ...
$ V11: chr "2011" "3" "1" "6" ...
$ V12: chr "2010" "2" "1" "6" ...
$ V13: chr "2009" "2" "1" "5" ...
$ V14: chr "2008" "2" "1" "5" ...
$ V15: chr "2007" "2" "1" "7" ...
$ V16: chr "2006" "1" "2" "8" ...
#To print rows from 5 to 10 and columns from 1 to 8
T1 <- tabs[5:10, 1:8]
head(T1)
Output:
|
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
5 |
Very Easy |
Hong Kong |
4 |
5 |
4 |
5 |
3 |
2 |
6 |
Very Easy |
South Korea |
5 |
4 |
5 |
4 |
5 |
7 |
7 |
Very Easy |
Georgia |
6 |
9 |
16 |
24 |
15 |
8 |
8 |
Very Easy |
Norway |
7 |
8 |
6 |
9 |
6 |
9 |
9 |
Very Easy |
United States |
8 |
6 |
8 |
7 |
7 |
4 |
10 |
Very Easy |
United Kingdom |
9 |
7 |
7 |
6 |
8 |
10 |
#To find the position of India in the Table
T1 <- subset(tabs,tabs$V2 == "India")
head(T1)
Output:
|
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
V9 |
V10 |
V11 |
V12 |
V13 |
V14 |
V15 |
V16 |
78 |
Easy |
India |
77 |
100 |
130 |
130 |
142 |
134 |
132 |
132 |
134 |
133 |
122 |
120 |
134 |
116 |
In this tutorial, we learned what importing data in R is, how to read files in different formats in R, and how to convert data from files to data frames for efficient data manipulation. In the next session, we are going to talk about data manipulation in R.
Wish to get certified in R! Learn R from top R experts and excel in your career with Intellipaat’s R Programming certification!