0 votes
1 view
in R Programming by (5k points)

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each or mutate_each function of dplyr. Below is a minimal example of the data frame:

library(dplyr)

df=data.frame(

  x1=c(1,0,0,NA,0,1,1,NA,0,1),

  x2=c(1,1,NA,1,1,0,NA,NA,0,1),

  x3=c(0,1,0,1,1,0,NA,NA,0,1),

  x4=c(1,0,NA,1,0,0,NA,0,0,1),

  x5=c(1,1,NA,1,1,1,NA,1,0,1))

> df

   x1 x2 x3 x4 x5

1   1  1  0  1  1

2   0  1  1  0  1

3   0 NA  0 NA NA

4  NA  1  1  1  1

5   0  1  1  0  1

6   1  0  0  0  1

7   1 NA NA NA NA

8  NA NA NA  0  1

9   0  0  0  0  0

10  1  1  1  1  1

I could use something like:

df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try to avoid having to give any column names.

How can I do that most efficiently? Any assistance would be greatly appreciated.

1 Answer

0 votes
by (23.2k points)

To sum down each column, you can use the following:

 

df %>% replace(is.na(.), 0) %>% summarise_all(funs(sum))

  x1 x2 x3 x4 x5

1  4  5  4  3  7

To sum up, each row, use the following:

 

df %>% replace(is.na(.), 0) %>% mutate(sum = rowSums(.[1:5]))

   x1 x2 x3 x4 x5 sum

1   1  1  0  1  1   4

2   0  1  1  0  1   3

3   0  0  0  0  0   0

4   0  1  1  1  1   4

5   0  1  1  0  1   3

6   1  0  0  0  1   2

7   1  0  0  0  0   1

8   0  0  0  0  1   1

9   0  0  0  0  0   0

10  1  1  1  1  1   5

...