2 views

I am having the below dataset:

CUSTOMER LOAN   DATE    AMOUNT LOAN_DEFAULT CUSTOMER_DEFAULT DEFAULT_DATE CLASSIFICATION

1        101   201601    100    Y               Y            201501          S

1        102   201603    100    N               Y            201501          S

1        103   201501    100    Y               Y            201501          S

1        104   201501    200    N               Y            201501          S

2        201   201601    100    N               N               -            M

2        202   201603    100    N               N               -            M

As you can see there are multiple customers in the data-frame and each CUSTOMER may have several loans. A CUSTOMER with at least 1 LOAN_DEFAULT is marked as CUSTOMER_DEFAULT, and the DEFAULT_DATE is the first time of default.

But the main question is how can I calculate the loan amount for a customer at the first default date.

I want to summarise the number of default customers by Classification, but by using summarise function it seems not working properly?

by (108k points)

What you can do is perform the sum operation on the AMOUNT value where the DATE is equal to the first DEFAULT_DATE for each CUSTOMER.

library(dplyr)

df %>%

group_by(CUSTOMER) %>%

summarise(total_sum  = sum(AMOUNT[DATE == first(DEFAULT_DATE)]))

#   CUSTOMER total_sum

#     <int>     <int>

#1        1       300

#2        2         0

Now for getting the number of default customers for each CLASSIFICATION, refer to the code below :

df %>%

group_by(CLASSIFICATION) %>%

summarise(no_default_cust = n_distinct(CUSTOMER[CUSTOMER_DEFAULT == "Y"]))

The data is:

df <- structure(list(CUSTOMER = c(1L, 1L, 1L, 1L, 2L, 2L), LOAN = c(101L,

102L, 103L, 104L, 201L, 202L), DATE = c(201601L, 201603L, 201501L,

201501L, 201601L, 201603L), AMOUNT = c(100L, 100L, 100L, 200L,

100L, 100L), LOAN_DEFAULT = structure(c(2L, 1L, 2L, 1L, 1L, 1L

), .Label = c("N", "Y"), class = "factor"), CUSTOMER_DEFAULT = structure(c(2L,

2L, 2L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),

DEFAULT_DATE = structure(c(2L, 2L, 2L, 2L, 1L, 1L), .Label = c("-",

"201501"), class = "factor"), CLASSIFICATION = structure(c(2L,

2L, 2L, 2L, 1L, 1L), .Label = c("M", "S"), class = "factor")),

class = "data.frame", row.names = c(NA, -6L))

If you are beginner and wan tot know more about R then do check out the R programming course that will help you in learning R form scratch.