Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have dates as factor datatype in my datasets and it contains lag values. Now I wanted to get only the lag values.

Example code:

ID <- rep("A5", 15)

product <- rep(c("prod1","prod2","prod3", "prod55", "prod4", "prod9", "prod83"),3)

start <- c(rep("01.01.2016", 3), rep("01.01.2015", 3), rep("01.01.2014",3),

           rep("01.01.2013",3), rep("01.01.2012",3))

prodID <- c(3,1,2,3,1,2,3,1,2,3,2,1,3,1,2)

mydata <- cbind(ID, product[1:15], start, prodID)

mydata <- as.data.table(mydata)

mydata[, (nameCols) := shift(.SD, 3, fill = "NA", "lead"), .SDcols= c("start", "V2"), by = "prodID"]

I am using the below code for now:

mydata[start == "01.01.2015"]

The semi-last date is not the same always this is causing the problem. Whenever the data changes I need to manually run the code I need to avoid it?

1 Answer

0 votes
by (36.8k points)

You need to convert the data into a data object and it has to be sorted as shown below:

library(data.table)

mydata[, start := as.IDate(start, '%d.%m.%Y')]

mydata[start == sort(unique(start), decreasing = TRUE)[2]]

#   ID     V2      start prodID

#1: A5 prod55 2015-01-01      3

#2: A5  prod4 2015-01-01      1

#3: A5  prod9 2015-01-01      2

 Learn Python for Data Science Course to improve your technical knowledge.

Browse Categories

...