0 votes
1 view
in R Programming by (4.6k points)

I have a CSV file (24.1 MB) that I cannot fully read into my R session. When I open the file in a spreadsheet program I can see 112,544 rows. When I read it into R with read.csv I only get 56,952 rows and this warning:

cit <- read.csv("citations.CSV", row.names = NULL, 

                comment.char = "", header = TRUE, 

                stringsAsFactors = FALSE,  

                colClasses= "character", encoding= "utf-8")

Warning message:

In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :

  EOF within quoted string

I can read the whole file into R with readLines:

rl <- readLines(file("citations.CSV", encoding = "utf-8"))

length(rl)

[1] 112545

But I can't get this back into R as a table (via read.csv):

write.table(rl, "rl.txt", quote = FALSE, row.names = FALSE)

rl_in <- read.csv("rl.txt", skip = 1, row.names = NULL)

Warning message:

In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :

  EOF within quoted string

How can I solve or workaround this EOF message (which seems to be more of an error than a warning) to get the entire file into my R session?

I have similar problems with other methods of reading CSV files:

require(sqldf)

cit_sql <- read.csv.sql("citations.CSV", sql = "select * from file")

require(data.table)

cit_dt <- fread("citations.CSV")

require(ff)

cit_ff <- read.csv.ffdf(file="citations.CSV")

1 Answer

0 votes
by (22.4k points)

To prevent this warning while reading csv files containing text, you need to disable quoting with 

quote="".

For example:

filedata <- read.csv("file_name.csv", quote = "", row.names = NULL, stringsAsFactors = FALSE)

In your case:

cit <- read.csv("citations.CSV", quote = "", row.names = NULL, stringsAsFactors = FALSE)

...