Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

The following is my data frame:

       Domain         Phylum          Class          Order

ID_1 Bacteria  Cyanobacteria Unclassified_c Unclassified_o

ID_2 Bacteria  Cyanobacteria Unclassified_c Unclassified_o

ID_3 Bacteria  Bacteroidetes Unclassified_c Unclassified_o

ID_4 Bacteria Proteobacteria Unclassified_c Unclassified_o

ID_5 Bacteria  Bacteroidetes Unclassified_c Unclassified_o

and I want to substitute all the character Unclassified_c, Unclassified_o, elment_3, etc, for NA, so I had executed the following:

df[df == "Unclassified_c" ] <- NA

this works well if I use one by one value, but sometimes it could be too many. So for that case, I want to try something like a list of patterns and then use it:

Remove_list <- ("Unclassified_c", "Unclassified_o", "element_3", "element_4", "element_x") 

and then use the list to replace for NA:

df[ df == Remove_list ] <- NA 

It changes to NA some of the values but not all. I don't want to use a stringr library, because it eliminates the rownames (ID_1 .. ID_x) and I need it.

1 Answer

0 votes
by (108k points)

For achieving that you can use the sapply with %in% which will return a logical matrix of whether a value is present in Remove_list or not. We can assign NA for TRUE values.

df[sapply(df, `%in%`, Remove_list)] <- NA

df

#       Domain         Phylum Class Order

#ID_1 Bacteria  Cyanobacteria  <NA>  <NA>

#ID_2 Bacteria  Cyanobacteria  <NA>  <NA>

#ID_3 Bacteria  Bacteroidetes  <NA>  <NA>

#ID_4 Bacteria Proteobacteria  <NA>  <NA>

#ID_5 Bacteria  Bacteroidetes  <NA>  <NA>

If you are interested in R certification then do check out the R programming certification

Browse Categories

...