Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am trying to work on the data frame in R studio, below is the data frame.

    id    cluster   from     to

1  A      NA    <NA>     <NA>

2  B       2     B        D

3  C       5     C        A

4  D      NA    <NA>    <NA>

5  E       5     E       B

6  F      NA   <NA>    <NA>

7  G       3     G      F

 In the "cluster" column and the "to" column there exist NA values.  I wanted to code in such a way that, when there is NA value in "cluster" and "to", then it needs to fetch the value of the "id" column and store the value in my new column named "New_Col as shown below:

id  cluster  from    to  New_Col

A   NA        NA      NA     5

B   2         B       D      2

C   5         C       A      5

D   NA       NA      NA      2

E   5        E       B       5

F   NA      NA      NA       3

G   3       G       F        3

1 Answer

0 votes
by (36.8k points)

We can use the function match to solve the question.

I have tried with the same data set which you have taken and i can get the result as per your requirements, I have shared the code below, go through it. 

#Copy cluster value

df$New_col <- df$cluster

#Get NA indices

inds <- is.na(df$New_col)

#Get corresponding cluster values for NA values.

df$New_col[inds] <- with(df, cluster[match(id[inds], to)])

df

I am getting this output:

    id   cluster  from   to    New_col

1  A      NA    <NA>   <NA>     5

2  B       2     B      D       2

3  C       5     C      A       5

4  D      NA    <NA>   <NA>     2

5  E       5      E     B       5

6  F      NA    <NA>  <NA>      3

7  G       3     G     F        3

If you want to learn more about data science course then click on data science tutorial, where you can learn data science from scratch.

Browse Categories

...