2 views

I am having the following data frame:

State   Year   Gender  Age   Population

Ga      2001    B       20      5

Ga      2001    B       20      2

Ga      2002    B       10      1

Wa      2006    B       60      1

Wa      2006    B       60      1

I want to groupby the state and the year and sum the population, for output of:

State   Year   Gender  Age   Population

Ga      2001    B       20      7

Ga      2002    B       10      1

Wa      2006    B       60      2

dput:

structure(list(State = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Ga",

"Wa"), class = "factor"), Year = c(2001L, 2001L, 2002L, 2006L,

2006L), Gender = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "B", class = "factor"),

Age = c(20L, 20L, 10L, 60L, 60L), Population = c(5L, 2L,

1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L

))

This is what I have attempted:

I must groupby State, and Year and then find the Population sum:

library(plyr)

df1<-df(df, .(State, Year), Population(sum))

by (108k points)

We can use the dplyr rather than plyr:

library(dplyr)

df %>% group_by(State, Year, Gender, Age) %>% summarise(Population = sum(Population))

You also have a data.table solution:

library(data.table)

setDT(df)[,.(Population = sum(Population)), by = c("State", "Year","Gender", "Age")]

If you are a beginner and want to know more about R then do refer to the following R programming tutorial.