How can I simplify a correlation code in R?

Question

asked May 30, 2020 in R Programming by ashely (50.2k points)

This is my df:

date z x y
<dttm> <dbl> <dbl> <dbl>
1 2019-01-01 00:00:00 1333 3339072. 456700000000
2 2019-02-01 00:00:00 915 4567582. 904600000000
3 2019-03-01 00:00:00 1433 7887962. 247900000000
4 2019-04-01 00:00:00 1444 3454559. 905700000000
5 2019-05-01 00:00:00 1231 9082390. 245600000000
6 2019-06-01 00:00:00 346 781224. 346700000000

How can I simplify this code to a for loop?

df %>%
filter(year(df$date) == 2017) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2018) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2019) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2020) %>%
mutate(correlation = cor(x, y))
#That's what I tried so far, but I've got some NAs:
years <- c(2017, 2018, 2019, 2020)
for (y in years) {
df %>%
filter(date == y) %>%
mutate(correlation = cor(x, y))
print(df$correlation[y])
}

My desired output:

[1] 2017: 0.23
[1] 2018: -0.38
[1] 2019: 0.40
[1] 2020: 0.15

1 Answer

vinita · Answer 1 · 2020-05-30T10:34:16+0000

You can simply perform the group_by() year and calculate the correlation for x and y in each year.

library(dplyr)
library(lubridate)
df %>% group_by(year = year(date)) %>% summarise(correlation = cor(x, y))

If you want to know more about r then do check out the R programming course.

How can I simplify a correlation code in R?

1 Answer

Related questions

Browse Categories