Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I have created the following data frame:

df <- data.frame("name" = c("jack", "william", "david", "john"),

                 "01-Jan-19" = c(NA,"A", NA,"A"),

                 "01-Feb-19" = c("A","A",NA,"A"),

                 "01-Mar-19" = c("S","A","A","A"),

                 "01-Apr-19" = c("A","A","A","S"),

                 "01-May-19" = c(NA,"A","A","A"),

                 "01-Jun-19" = c("A","S","A","S"),

                 "01-Jul-19" = c("A","S","A","S"),

                 "01-Aug-19" = c(NA,"S","A","A"),

                 "01-Sep-19" = c(NA,"S","A","S"),

                 "01-Oct-19" = c("S","S","A","S"),

                 "01-Nov-19" = c("S","S",NA,"S"),

                 "01-Dec-19" = c("S","S","S",NA),

                 "01-Jan-20" = c("S","M","A","M"),

                 "01-Feb-20" = c("M","M","M","M"))

For estimating the duration for each person between the first A to the last A, I was able to get that with the following piece of code:

duration <- df %>%

  tidyr::pivot_longer(cols = -name, names_to = 'person', values_drop_na = TRUE) %>%

  dplyr::mutate(person = dmy(sub('X', '', person))) %>%

  group_by(name) %>%

  dplyr::summarise(avg_duration = person[max(which(value == 'A'))] - person[min(which(value == 'A'))])

But I want to find the periods in between the two As, and also how can I eliminate the period that is with other values (anything that is not A, e.g. S, NA)?

1 Answer

0 votes
by (108k points)

I think you need two implications for a period. Let's say, for instance, a person has the series: A, S, A, do they have 0, 1 or 2 periods with A? and what about A, S, A, A, S, A?

duration <- df %>%

  tidyr::pivot_longer(cols = -name, names_to = 'date') %>%

  dplyr::mutate(date = lubridate::dmy(sub('X', '', date))) %>%

  group_by(name) %>%

  dplyr::arrange(name, date) %>% 

  dplyr::mutate(duration = c(diff(date), 0)) %>% 

  dplyr::group_by(name, value) %>% 

  dplyr::summarise(summed_duration = sum(duration))

# A tibble: 15 x 3

# Groups:   name [4]

   name    value summed_duration

   <chr>   <chr> <drtn>         

 1 david   A     276 days       

 2 david   M       0 days       

 3 david   S      31 days       

 4 david   NA     89 days       

 5 jack    A     119 days       

 6 jack    M       0 days       

 7 jack    S     154 days       

 8 jack    NA    123 days       

 9 john    A     152 days       

10 john    M      31 days       

11 john    S     182 days       

12 john    NA     31 days       

13 william A     151 days       

14 william M      31 days       

15 william S     214 days 

If you are a beginner and want to know more about R, then do check out the R programming tutorial

Browse Categories

...