0 votes
1 view
in R Programming by (5.3k points)

From a data frame, is there an easy way to aggregate (sum, mean, max, etc) multiple variables simultaneously?

Below are some sample data:

library(lubridate)

days = 365*2

date = seq(as.Date("2000-01-01"), length = days, by = "day")

year = year(date)

month = month(date)

x1 = cumsum(rnorm(days, 0.05)) 

x2 = cumsum(rnorm(days, 0.05))

df1 = data.frame(date, year, month, x1, x2)

I would like to simultaneously aggregate the x1 and x2 variables from the df2 data frame by year and month. The following code aggregates the x1 variable, but is it also possible to simultaneously aggregate the x2 variable?

### aggregate variables by year month

df2=aggregate(x1 ~ year+month, data=df1, sum, na.rm=TRUE)

head(df2)

Any suggestions would be greatly appreciated.

1 Answer

0 votes
by (25.3k points)

To aggregate or summarize multiple variables per group, you can use the melt function from the reshape2 package.

In your case:

library(lubridate)

days = 365*2

date = seq(as.Date("2000-01-01"), length = days, by = "day")

year = year(date)

month = month(date)

x1 = cumsum(rnorm(days, 0.05)) 

x2 = cumsum(rnorm(days, 0.05))

df1 = data.frame(date, year, month, x1, x2)

To reshape the data frame to summarize  by year and month:

require(reshape2)

df_melt <- melt(df1, id = c("date", "year", "month"))

dcast(df_melt, year + month ~ variable, sum)

Output:

   year month         x1         x2

1  2000     1 -103.60191   14.60998

2  2000     2  -12.73609   45.99378

3  2000     3   68.82441  116.47341

4  2000     4  190.00582  156.65607

5  2000     5  111.98758  326.98259

6  2000     6   55.97389  526.39677

7  2000     7   32.86611  640.82281

8  2000     8  -24.97887  818.12475

9  2000     9 -106.04458  842.83799

10 2000    10  105.50760  988.41725

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...