1 view

I am new to Data Science and I am learning it by surfing the web, I got a dataset which looks like this:

df <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,3,1,2,3,1,2,3),y = rnorm(9), x1 = LETTERS[seq( from = 1, to = 9 )], x2 = c(0,0,0,0,1,0,1,1,1),c2 = rnorm(9))

df

#    id time     y      x1 x2     c2

# 1  1    1  0.6364831  A  0 -0.066480473

# 2  1    2  0.4476390  B  0  0.161372575

# 3  1    3  1.5113458  C  0  0.343956178

# 4  2    1  0.3532957  D  0  0.279987147

# 5  2    2  0.3401402  E  1 -0.462635393

# 6  2    3 -0.3160222  F  0  0.338454940

# 7  3    1 -1.3797158  G  1 -0.621169576

# 8  3    2  1.4026640  H  1 -0.005690801

# 9  3    3  0.2958363  I  1 -0.176488132

I am trying to write a function that consists of 2 parameters first one is dataset and the second is the variable of interest.

The Function is further divided into different steps. however when I try filtering my dataset using a table which looks like this:

testfun<- function(dataset,var){

intermediatedf<-unique(setDT(dataset)[var==1 & c2>0,.(y)])

return(intermediatedf)

}

when I run the below code it breaks down:

df2<-testfun(df,y)

Can anyone guide me, how to index my dataset as well as a variable?

by (36.8k points)

To index, you can use the substitute and eval which will help you to get the desired output.

I have given the code below check it out:

testfun <- function(dataset, var) {

var <- substitute(var)

intermediatedf <- unique(dataset[eval(var) == 1 & c2 > 0, .(y)])

return(intermediatedf)

}

If you are a beginner and want to know more about Data Science the do check out the Data Science course