What is woe in credit risk?

In credit risk modeling, Weight of Evidence (WoE) is a crucial measure for evaluating the predictive power of a categorical variable. It essentially quantifies how well a grouping of a variable separates "good" (non-defaulting) credits from "bad" (defaulting) credits. According to the provided reference, WoE represents a widely used measure of the strength of a grouping for separating good and bad risk (default).

Understanding Weight of Evidence (WoE)

WoE transforms categorical variables into numerical values that can be directly used in models like logistic regression. The transformation is based on the distribution of good and bad credit outcomes within each category of the variable.

How WoE is Calculated

WoE is computed using the following formula, based on the odds ratio:

WoE = ln((Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes))

Where:

"ln" is the natural logarithm.
"Distribution of Good Credit Outcomes" refers to the proportion of good loans/credits within a particular category.
"Distribution of Bad Credit Outcomes" refers to the proportion of bad loans/credits within the same category.

Interpretation of WoE Values

The WoE value provides insights into the relationship between the categorical variable and the target variable (credit risk):

Positive WoE: Indicates that the category has a higher proportion of "good" credits than "bad" credits. The higher the positive value, the stronger the indication.
Negative WoE: Indicates that the category has a higher proportion of "bad" credits than "good" credits. The more negative the value, the stronger the indication.
WoE close to zero: Indicates that the category has a similar proportion of "good" and "bad" credits, and therefore, has little predictive power.

Example

Consider a categorical variable "Employment Status" with three categories: "Employed," "Unemployed," and "Self-Employed." Assume the following distributions:

Employment Status	Good Credits	Bad Credits
Employed	800	200
Unemployed	100	400
Self-Employed	400	100

Calculations:

We need to calculate the distribution of good and bad credits for each employment status category, then calculate the WoE for each. First, we calculate the totals: Total Good Credits = 800 + 100 + 400 = 1300; Total Bad Credits = 200 + 400 + 100 = 700. Now we calculate the distributions:

Employment Status	% Good (Good/Total Good)	% Bad (Bad/Total Bad)	WoE
Employed	800/1300 = 0.615	200/700 = 0.286	ln(0.615/0.286) = 0.773
Unemployed	100/1300 = 0.077	400/700 = 0.571	ln(0.077/0.571) = -2.004
Self-Employed	400/1300 = 0.308	100/700 = 0.143	ln(0.308/0.143) = 0.776

Interpretation:

Employed: Has a positive WoE (0.773), indicating a higher proportion of good credits.
Unemployed: Has a negative WoE (-2.004), indicating a higher proportion of bad credits.
Self-Employed: Has a positive WoE (0.776), indicating a higher proportion of good credits.

Benefits of Using WoE

Handles Missing Values: WoE can handle missing values by treating them as a separate category.
Transforms Non-Linear Relationships: WoE can transform a non-linear relationship between a categorical variable and the target variable into a linear one, making it suitable for linear models like logistic regression.
Variable Selection: WoE helps in identifying the most predictive categorical variables for inclusion in a credit risk model. Higher absolute WoE values generally indicate stronger predictive power.

Using WoE in Credit Scoring

WoE is a common technique for developing credit scoring models. After calculating the WoE for each category of a variable, these WoE values can be used directly as input features in a logistic regression model. This approach allows for the model to effectively leverage the information contained within categorical variables for predicting credit risk.