Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

This is my df:

                  date                     z         x                    y 

   <dttm>                               <dbl>    <dbl>                <dbl> 

 1 2019-01-01 00:00:00                   1333  3339072.         456700000000 

 2 2019-02-01 00:00:00                    915  4567582.         904600000000 

 3 2019-03-01 00:00:00                   1433  7887962.         247900000000 

 4 2019-04-01 00:00:00                   1444  3454559.         905700000000 

 5 2019-05-01 00:00:00                   1231  9082390.         245600000000 

 6 2019-06-01 00:00:00                    346   781224.         346700000000 

How can I simplify this code to a for loop?

df %>%

filter(year(df$date) == 2017) %>%

mutate(correlation = cor(x, y))

df %>%

filter(year(df$date) == 2018) %>%

mutate(correlation = cor(x, y))

df %>%

filter(year(df$date) == 2019) %>%

mutate(correlation = cor(x, y))

df %>%

filter(year(df$date) == 2020) %>%

mutate(correlation = cor(x, y))

#That's what I tried so far, but I've got some NAs:

years <- c(2017, 2018, 2019, 2020)

for (y in years) {

  df %>%

    filter(date == y) %>%

    mutate(correlation = cor(x, y))

    print(df$correlation[y])

}

My desired output:

[1] 2017: 0.23

[1] 2018: -0.38

[1] 2019: 0.40

[1] 2020: 0.15

1 Answer

0 votes
by (108k points)

You can simply perform the group_by() year and calculate the correlation for x and y in each year. 

library(dplyr)

library(lubridate)

df %>% group_by(year = year(date)) %>% summarise(correlation = cor(x, y))

If you want to know more about r then do check out the R programming course

Browse Categories

...