Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am trying to select only factor columns from my data frame. Example is below:

bank[,apply(bank[,names(bank)!="id"], is.factor)]

But the code behaves strangely. Step by step:

sapply(bank[,names(bank)!="id"], is.factor)

I get:

age   sex  region  income   married  children    car 

 FALSE  TRUE   TRUE    FALSE   TRUE     FALSE        TRUE 

save_act current_act  mortgage  pep    ageBin 

TRUE        TRUE        TRUE        TRUE        TRUE 

Looks OK. Now, I assume that I just pass this matrix of TRUE/FALSE to the next step and get only the columns I need:

bank[,sapply(bank[,names(bank)!="id"], is.factor)]

But as result I get all the same columns as in original bank dataframe. Nothing is filtered out. I tried it one way or another but can't find a solution. Any advice on what I am doing wrong?

1 Answer

0 votes
by (41.4k points)
edited by

Refer to the below code to get the correct result:

df = mtcars 

colnames(df) = gsub("mpg","id",colnames(df)) 

df$am = as.factor(df$am) 

df$gear = as.factor(df$gear) 

df$id = as.factor(df$id)

df[,sapply(df, is.factor) & colnames(df) != "id"]

If you want to explore more in R programming then watch this R Programming Tutorial for beginner:

Browse Categories

...