All of these dates that I’ve manipulated in the Execute R module in Azure Machine Learning write out as blank in the output – that is, these date columns exist, but there is no value in those columns.
The source variables which contain date information that I’m reading into the data frame have two different date formats. They are as follows:
usage$Date1=c(‘8/6/2015’ ‘8/20/2015’ ‘7/9/2015’)
usage$Date2=c(‘4/16/2015 0:00’, ‘7/1/2015 0:00’, ‘7/1/2015 0:00’)
I inspected the log file in AML, and AML can't find the local time zone. The log file warnings specifically: [ModuleOutput] 1: In strptime(x, format, tz = tz) : [ModuleOutput] unable to identify current timezone 'C': [ModuleOutput] please set environment variable 'TZ' [ModuleOutput] [ModuleOutput] 2: In strptime(x, format, tz = tz) : unknown timezone 'localtime'
I referred to another answer regarding setting the default time zone for strptime here
unknown timezone name in R strptime/as.POSIXct
I changed my code to explicitly define the global environment time variable.
Sys.setenv(TZ='GMT')
####Data frame usage cleanup, format and labeling
usage<-as.data.frame(usage)
usage$Date1<-as.character(usage$Date1)
usage$Date1<-as.POSIXct(usage$Date1, "%m/%d/%Y",tz="GMT")
usage$Date1<-format(usage$Date1, "%m/%d/%Y")
usage$Date1<-as.Date(usage$Date1, "%m/%d/%Y")
usage<-as.data.frame(usage)
usage$Date2<- as.POSIXct(usage$Date2, "%m/%d/%Y",tz="GMT")
usage$Date2<- format(usage$Date2,"%m/%d/%Y")
usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y")
usage<-as.data.frame(usage)
The problem persists -as a result, AzureML does not write these variables out, rather writing out these columns as blanks.
(This code works in R studio, where I presume the local time is taken from the system.)
After reading two blog posts on this problem, it seems that Azure ML doesn't support some date time formats:
http://blogs.msdn.com/b/andreasderuiter/archive/2015/02/03/troubleshooting-error-1000-rpackage-library-exception-failed-to-convert-robject-to-dataset-when-running-r-scripts-in-azure-ml.aspx
http://www.mikelanzetta.com/2015/01/data-cleaning-with-azureml-and-r-dates/
So I tried to convert to POSIXct before sending it to the output stream, which I've done as follows: tenantusage$Date1 = as.POSIXct(tenantusage$Date1 , "%m/%d/%Y",tz = "EST5EDT"); tenantusage$Date2 = as.POSIXct(tenantusage$Date2 , "%m/%d/%Y",tz = "EST5EDT");
But encounter the same problem. The information in these variables refuses to write out to the output. Date1 and Date2 columns are blank.