Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (7.3k points)

I'm trying to load this ugly-formatted data-set into my R session: http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for

Weekly SST data starts week centered on 3Jan1990

Nino1+2      Nino3        Nino34        Nino4

Week          SST SSTA     SST SSTA     SST SSTA     SST SSTA 

03JAN1990     23.4-0.4     25.1-0.3     26.6 0.0     28.6 0.3 

10JAN1990     23.4-0.8     25.2-0.3     26.6 0.1     28.6 0.3 

17JAN1990     24.2-0.3     25.3-0.3     26.5-0.1     28.6 0.3

So far, I can read the lines with

  x = readLines(path)

But the file mixes 'white space' with '-' as separators, and I'm not a regex expert. I appreciate any help on turning this into a nice and clean R data-frame. thanks!

1 Answer

0 votes
by

To read a fixed-width file, you can use any one of the following packages:

Using readr package:

library(readr)

x <- read_fwf(

  file="http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for",   

  skip=4,

  fwf_widths(c(12, 7, 4, 9, 4, 9, 4, 9, 4)))

Output:

# A tibble: 1,542 x 9

   X1           X2    X3    X4    X5    X6    X7    X8    X9

   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>

 1 03JAN1990  23.4  -0.4  25.1  -0.3  26.6   0    28.6   0.3

 2 10JAN1990  23.4  -0.8  25.2  -0.3  26.6   0.1  28.6   0.3

 3 17JAN1990  24.2  -0.3  25.3  -0.3  26.5  -0.1  28.6   0.3

 4 24JAN1990  24.4  -0.5  25.5  -0.4  26.5  -0.1  28.4   0.2

 5 31JAN1990  25.1  -0.2  25.8  -0.2  26.7   0.1  28.4   0.2

 6 07FEB1990  25.8   0.2  26.1  -0.1  26.8   0.1  28.4   0.3

 7 14FEB1990  25.9  -0.1  26.4   0    26.9   0.2  28.5   0.4

 8 21FEB1990  26.1  -0.1  26.7   0.2  27.1   0.3  28.9   0.8

 9 28FEB1990  26.1  -0.2  26.7  -0.1  27.2   0.3  29     0.8

10 07MAR1990  26.7   0.3  26.7  -0.2  27.3   0.2  28.9   0.7

# ... with 1,532 more rows

You can also use the utils package as follows:

df <- read.fwf(

  file=url("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for"),

  widths=c(-1, 9, -5, 4, 4, -5, 4, 4, -5, 4, 4, -5, 4, 4),

  skip=4

)

The -1 in the widths argument says there is a one-character column that should be ignored, the -5 in the widths argument says there is a five-character column that should be ignored, likewise…

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...