Target variable in credit risk models

Risk teams and scientists are working hard to train their models to better predict – at application stage – if a loan will be Good or Bad. In order to win competitive advantege in credit decisions, they often seek to improve model performance by using model boosting or machine learning methods alternative to logistic regression.

Less attention is usually given the construction of the Good / Bad variable itself. In reality, constructing the target variable is a challenging and important task. It requires taking many assumptions and can easily determine the level of success of the whole modelling process.

What makes a Good / Bad variable good

Long enough observation window

12-months performance observation window was selected. It is common to use shorter observation windows for some consumer portfolios. A shorter window allows including more recent loans in the model training sample so it might be necessary in case of young or dynamically changing portfolios. Some lenders use models which take into account only 6 or even as few as 3 months of performance to assign a Good or Bad flag. It is risky, especially in case when loan duration is increasing or significantly greater than the model observation window.

Of course longer observation window makes the prediction objectively more difficult. More random events can happen to a customer in 12 months than over 3 months only. Still, from a business or regulatory perspective it is usually better to have a longer prediction, even if it comes with less impressive AUROC or GINI coefficient. These measures should not be compared without comparing Good / Bad definitions.

Integrated customer-level perspective (across all loans)

Credit scoring target variable is often built only on loan account level. This means a loan is marked as Good even if the customer took another loan and defaulted (hit 90 days-past-due status) in the performance observation window (in our case – 12 months since the loan origination). It means the machine learning model will learn to treat as Good those customers who repay the first loan but quickly default on the next one (usually with greater amount). Combined with risk of miss-selling or aggressive incentives for Sales, it is a very significant risk for many consumer lenders.

To manage this risk, our Good / Bad flag is set on customer level. This means that all loans of a given customer (based on CustID variable) are checked for 90 days-past-due within 12 months from taking each loan. We assume the decision to grant the loan was bad if the customer is in bad status 12 month later. The decision was good only if good repayment status is observed at customer level a year later.

What makes a Good / Bad variable good

Long enough observation window

Integrated customer-level perspective (across all loans)

Recent news