Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (5.3k points)

Working with a data frame similar to this:

set.seed(100)  

df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))             

df <- df[order(df$cat, df$val), ]  

df  

   cat        val  

1  aaa 0.05638315  

2  aaa 0.25767250  

3  aaa 0.30776611  

4  aaa 0.46854928  

5  aaa 0.55232243  

6  bbb 0.17026205  

7  bbb 0.37032054  

8  bbb 0.48377074  

9  bbb 0.54655860  

10 bbb 0.81240262  

11 ccc 0.28035384  

12 ccc 0.39848790  

13 ccc 0.62499648  

14 ccc 0.76255108  

15 ccc 0.88216552 

I am trying to add a column with numbering within each group. Doing it this way obviously isn't using the powers of R:

df$num <- 1  

 for (i in 2:(length(df[,1]))) {  

   if (df[i,"cat"]==df[(i-1),"cat"]) {  

     df[i,"num"]<-df[i-1,"num"]+1  

     }  

 }  

 df  

   cat        val num  

1  aaa 0.05638315   1  

2  aaa 0.25767250   2  

3  aaa 0.30776611   3  

4  aaa 0.46854928   4  

5  aaa 0.55232243   5  

6  bbb 0.17026205   1  

7  bbb 0.37032054   2  

8  bbb 0.48377074   3  

9  bbb 0.54655860   4  

10 bbb 0.81240262   5  

11 ccc 0.28035384   1  

12 ccc 0.39848790   2  

13 ccc 0.62499648   3  

14 ccc 0.76255108   4  

15 ccc 0.88216552   5  

What would be a good way to do this?

1 Answer

0 votes
by

To create groups and number rows in them, you can use the group_by and mutate function from the dplyr package as follows:

set.seed(100)  

df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))             

df <- df[order(df$cat, df$val), ]  

 

library(dplyr)

df %>% group_by(cat) %>% mutate(id = row_number())

Output:

# A tibble: 15 x 3

# Groups:   cat [3]

   cat      val    id

   <fct>  <dbl> <int>

 1 aaa   0.0564     1

 2 aaa   0.258      2

 3 aaa   0.308      3

 4 aaa   0.469      4

 5 aaa   0.552      5

 6 bbb   0.170      1

 7 bbb   0.370      2

 8 bbb   0.484      3

 9 bbb   0.547      4

10 bbb   0.812      5

11 ccc   0.280      1

12 ccc   0.398      2

13 ccc   0.625      3

14 ccc   0.763      4

15 ccc   0.882      5

You can also use the following functions from the data.table package which saves memory and is faster than dplyr.

library(data.table)

dt <- data.table(df)

dt[, id := seq_len(.N), by = cat]

dt[, id := rowid(cat)]

Output:

   cat        val id

 1: aaa 0.05638315  1

 2: aaa 0.25767250  2

 3: aaa 0.30776611  3

 4: aaa 0.46854928  4

 5: aaa 0.55232243  5

 6: bbb 0.17026205  1

 7: bbb 0.37032054  2

 8: bbb 0.48377074  3

 9: bbb 0.54655860  4

10: bbb 0.81240262  5

11: ccc 0.28035384  1

12: ccc 0.39848790  2

13: ccc 0.62499648  3

14: ccc 0.76255108  4

15: ccc 0.88216552  5

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...