Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have downloaded multiple .txt.gz files for Hadley Sea Surface Temperature observations. The data have been unzipped, resulting in mutiple .txt files in ASCII format.
I have the following files (the R script is the one I'm working on):

list.files()
 [1] "Get_SST_Data.R"                "HadISST1_SST_1931-1960.txt"    "HadISST1_SST_1931-1960.txt.gz"
 [4] "HadISST1_SST_1961-1990.txt"    "HadISST1_SST_1961-1990.txt.gz" "HadISST1_SST_1991-2003.txt"  
 [7] "HadISST1_SST_2004.txt"         "HadISST1_SST_2005.txt"         "HadISST1_SST_2006.txt"       
[10] "HadISST1_SST_2007.txt"         "HadISST1_SST_2008.txt"         "HadISST1_SST_2009.txt"       
[13] "HadISST1_SST_2010.txt"         "HadISST1_SST_2011.txt"         "HadISST1_SST_2012.txt"       
[16] "HadISST1_SST_2013.txt"

 would like to be able to utilize the temperature data to make a numeric vector for the Sea Surface Temperature for everyday since 1950, to eventually make a time series plot.

Which will look something like this

enter image description here

[p.s. this is just for reference...]

1 Answer

0 votes
by (32.3k points)

R is able to read NetCDF format

(http://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_sst.nc.gz).

In order to read such data you can use the "raster" package, after decompression, such as:

library(raster)

library(xts)

library(caTools)    

Some time definitions:

startYear <- 1950   # starting period

endYear <- 2011     # end of the period

subp <- '1951-01-01/1980-12-01'   # period for the climatology calculation

Open the file:

sst <- brick('HadISST_sst.nc')

Date <- substr(names(sst),2,11) 

Date <- gsub('\\.', '\\-', Date)

Date <- as.Date(Date)

dstart <- paste(startYear,'01','01',sep='-'); dstart <- grep(dstart, Date)

dend <- paste(endYear,'12','01',sep='-'); dend <- grep(dend, Date)

sst <- subset(sst, dstart:dend)

Date <- Date[dstart:dend]

Now, extract the time series for a specific point (lat=35, lon=120):

tserie <- as.vector(extract(sst, cbind(116, -35)))

tserie <- xts(tserie, order.by=Date)

Calculate the climatology for the subp period:

clim <- as.numeric()

for(ii in 1:12){

  clim[ii] <- mean(tserie[subp][(.indexmon(tserie[subp])+1) == ii])

}

clim <- xts(rep(clim, length(tserie)/12), order.by=Date)

Calculate anomalies:

tserie <- tserie - clim

Plot the result:

par(las=1)

plot(tserie, t='n', main='HadISST')

lines(tserie, col='grey')

lines(xts(runmean(tserie, 12), order.by=Date), col='red', lwd=2)

legend('bottomleft', c('Monthly anomaly','12-month moving avg'), lty=c(1,1), lwd=c(1,2), col=c('grey','red'))

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...