Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in R Programming by (50.2k points)

I am having the below dataset: 


1        101   201601    100    Y               Y            201501          S

1        102   201603    100    N               Y            201501          S

1        103   201501    100    Y               Y            201501          S

1        104   201501    200    N               Y            201501          S

2        201   201601    100    N               N               -            M

2        202   201603    100    N               N               -            M

As you can see there are multiple customers in the data-frame and each CUSTOMER may have several loans. A CUSTOMER with at least 1 LOAN_DEFAULT is marked as CUSTOMER_DEFAULT, and the DEFAULT_DATE is the first time of default.

But the main question is how can I calculate the loan amount for a customer at the first default date.

I want to summarise the number of default customers by Classification, but by using summarise function it seems not working properly?

1 Answer

0 votes
by (108k points)

What you can do is perform the sum operation on the AMOUNT value where the DATE is equal to the first DEFAULT_DATE for each CUSTOMER.


df %>% 

  group_by(CUSTOMER) %>% 

  summarise(total_sum  = sum(AMOUNT[DATE == first(DEFAULT_DATE)]))

#   CUSTOMER total_sum

#     <int>     <int>

#1        1       300

#2        2         0

Now for getting the number of default customers for each CLASSIFICATION, refer to the code below :

df %>%

  group_by(CLASSIFICATION) %>%

  summarise(no_default_cust = n_distinct(CUSTOMER[CUSTOMER_DEFAULT == "Y"]))

The data is:

df <- structure(list(CUSTOMER = c(1L, 1L, 1L, 1L, 2L, 2L), LOAN = c(101L, 

102L, 103L, 104L, 201L, 202L), DATE = c(201601L, 201603L, 201501L, 

201501L, 201601L, 201603L), AMOUNT = c(100L, 100L, 100L, 200L, 

100L, 100L), LOAN_DEFAULT = structure(c(2L, 1L, 2L, 1L, 1L, 1L

), .Label = c("N", "Y"), class = "factor"), CUSTOMER_DEFAULT = structure(c(2L, 

2L, 2L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), 

DEFAULT_DATE = structure(c(2L, 2L, 2L, 2L, 1L, 1L), .Label = c("-", 

"201501"), class = "factor"), CLASSIFICATION = structure(c(2L, 

2L, 2L, 2L, 1L, 1L), .Label = c("M", "S"), class = "factor")), 

class = "data.frame", row.names = c(NA, -6L))

If you are beginner and wan tot know more about R then do check out the R programming course that will help you in learning R form scratch. 

Browse Categories