Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I just want to know what group_by returns... I think it will return a vector for each unique group_by combination that exists. Let say, for instance:

data<-data.frame(Names = c("odyssey", "camry", "odyssey", "camry"), year = c(1990, 1990, 1992, 1994), sales = c(200, 400, 1000, 4000))

If I calculate the percent sales as given below, I can see in 1990, that sales (on the left) equal 200 in row one, while the sales that were captured in the sum, must be c(200, 400)! 

data %>% group_by(year) %>% mutate(percent_sales = 100*sales/sum(sales)) %>% select(percent_sales)

I know that the sales being defined as this per the year 1990: numeric, double, length of 2 1992: numeric, double, length of 1 1994: numeric, double, length of 1

So I can conclude that this must be a vector... but why does it reflect just one value for sales when asking for 100*sales, yet return the full vector for sum(sales)?

1 Answer

0 votes
by (108k points)

In R programming, the group_by is used to obtain the information about a group as you arrange in grouping the rows by year. After having the groups, all calculations you are doing with summarizing are performed only within this group only. But the rows will never change, they will remain the same.

So, in short, the summarised value is written to all rows.

Note: Now after completing all calculations on these groups, I suggest you, perform ungroup so that no more calculations are done within groups.

Browse Categories

...