Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I am having a data set that is having banking details in Excel:

A        B         C

Name     XYZ       trash

Date     20/05/31  trash

Amount   trash     0.01

Name     ABC       trash

Date     20/06/30  trash

Amount   trash     0.02

Name     KLM       trash

Date     20/07/29  trash

Amount   trash     -0.03

The result I want is:

Name  Date      Amount

XYZ   20/05/31  0.01

ABC   20/06/30  0.02

KLM   20/07/29  -0.03

To clean that df, I have implemented the below code:

sel_col <- c("Name" = 2, "Date" = 2, "Amount" = 3)

df <- df %>%

  mutate(D = sel_col[match(df$A, names(sel_col))]) %>% 

  mutate(E = recode(D, A, B, C)) %>% 

  select(A, E)

How to split and transpose that? And is it the best way to go?

1 Answer

0 votes
by (108k points)

You can execute the below code:

df %>%

  mutate_all(~ if_else(. == "trash", NA_character_, .)) %>%

  mutate(

    grp = cumsum(A == "Name"),

    B = coalesce(B, C)

  ) %>%

  select(-C) %>%

  pivot_wider(grp, names_from = A, values_from  = B) %>%

  mutate(

    Date = as.Date(Date, format = "%y/%m/%d"),

    Amount = as.numeric(Amount)

  ) %>%

  select(-grp)

# # A tibble: 3 x 3

#   Name  Date       Amount

#   <chr> <date>      <dbl>

# 1 XYZ   2020-05-31   0.01

# 2 ABC   2020-06-30   0.02

# 3 KLM   2020-07-29  -0.03

The following are the assumptions I have taken while executing the above code:

  1. Each 3-pack of rows always starts with "Name"
  2. There is usable data either in B or in C, but not in both.
  3. I assumed you would want Date to be an actual date class in R(You can omit that if you prefer to keep it a string).

 If you want to know more about R then do refer to the R programming online course.

Browse Categories

...