Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in R Programming by (50.2k points)

I have formed a data frame from the number of surveys. Each survey can be sent many times with updated values. For each survey or you can say row in the dataset, there is a date when the survey was presented (created). I want to join the rows for each survey and keep the date from the first survey but other data from the last survey.

A simple example:

#>   survey    created var1 var2

#> 1     s1 2020-01-01   10   30

#> 2     s2 2020-01-02   10   90

#> 3     s2 2020-01-03   20   20

#> 4     s3 2020-01-01   45    5

#> 5     s3 2020-01-02   50   50

#> 6     s3 2020-01-03   30   10

Desired result:

#>   survey    created var1 var2

#> 1     s1 2020-01-01   10   30

#> 2     s2 2020-01-02   20   20

#> 3     s3 2020-01-01   30   10

1 Answer

0 votes
by (108k points)

You can do this in 2 ways, the first one is that after grouping by 'survey', just change the 'created' as the first or min value in 'created' and then perform the slicing in the last row (n()):

library(dplyr)

df %>% 

   group_by(survey) %>% 

   mutate(created = as.Date(first(created))) %>% 

   slice(n())

# A tibble: 3 x 4

# Groups:   survey [3]

#  survey created     var1  var2

#  <chr>  <date>     <dbl> <dbl>

#1 s1     2020-01-01    10    30

#2 s2     2020-01-02    20    20

#3 s3     2020-01-01    30    10

Or you can just do the following in base R programming:

transform(df, created = ave(created, survey, FUN = first)

         )[!duplicated(df$survey, fromLast = TRUE),]

...