Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I am having the following code with data.table:

library(data.table)

dat <- structure(list(barcodes = c("scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", 

"scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", 

"scA22_CAACAGCAACAG", "scA22_CAACAGCAACAG", "scA22_TTTTTTTTTTTT"

), gene_name = c("A930037H05Rik", "A930037H05Rik", "A930037H05Rik", 

"A930037H05Rik", "Lgals8", "Lgals8", "Lgals8", "Lgals8"), tsse = c(0.152777777777778, 

0.152777777777778, 0.152777777777778, 0.00192307692307692, 0.055, 

0.0485294117647059, 0.033, 0.0294642857142857)), na.action = structure(integer(0), .Names = character(0)), row.names = c(NA, 

8L), class = "data.frame")

setDT(dat)

dat

And the above program returns the following:

             barcodes     gene_name        tsse

1: scA22_CAACAGCAACAG A930037H05Rik 0.152777778

2: scA22_CAACAGCAACAG A930037H05Rik 0.152777778

3: scA22_CAACAGCAACAG A930037H05Rik 0.152777778

4: scA22_CAACAGCAACAG A930037H05Rik 0.001923077

5: scA22_CAACAGCAACAG        Lgals8 0.055000000

6: scA22_CAACAGCAACAG        Lgals8 0.048529412

7: scA22_CAACAGCAACAG        Lgals8 0.033000000

8: scA22_TTTTTTTTTTTT        Lgals8 0.029464286

I just want to group by c("barcodes", "gene_name") and then select based on tsse column.

Resulting in:

             barcodes     gene_name        tsse

1: scA22_CAACAGCAACAG A930037H05Rik 0.152777778

2: scA22_CAACAGCAACAG        Lgals8 0.055000000

3: scA22_TTTTTTTTTTTT        Lgals8 0.029464286

How to do that?

1 Answer

0 votes
by (108k points)

In r programming, you can use the which.max() that can help you in achieving your desired output:

library(data.table)

setDT(dat)[, .SD[which.max(tsse)], .(barcodes, gene_name)]

#             barcodes     gene_name   tsse

#1: scA22_CAACAGCAACAG A930037H05Rik 0.1528

#2: scA22_CAACAGCAACAG        Lgals8 0.0550

#3: scA22_TTTTTTTTTTTT        Lgals8 0.0295

Browse Categories

...