Improving Performance Of Memory Based Reasoning Model Using Weight of Evidence Coded Categorical Variables
Posted: 3 May 2016 | Source: SAS
Memory based Reasoning (MBR) is an empirical classification method which works by comparing cases in hand with similar examples from the past and then applying that information to the new case. MBR modeling is based on the assumptions that the input variables are numeric, orthogonal to each other, and standardized. The latter two assumptions are taken care by Principal Components’ transformation of raw variables and using the components instead of the raw variables as inputs to MBR. To satisfy the first assumption, the categorical variables are often dummy coded. This raises issues such as increasing dimensionality and overfitting in the training data by introducing discontinuity in the response surface relating inputs and target variables.
The Weight of Evidence (WOE) method overcomes this challenge. This method measures the relative response of the target for each group level of a categorical variable. Then the levels are replaced by the pattern of response of the target variable within that category. SAS® Enterprise Miner’s Interactive Grouping Node is used to achieve this. By this way the categorical variables are converted into numeric. This paper demonstrates the improvement in performance of an MBR model when categorical variables are WOE coded.
A credit screening dataset obtained from SAS Education that comprises of 25 attributes for 3,000 applicants is used for this study. Three different types of MBR models were built using SAS® Enterprise Miner’s MBR node to check the improvement in performance. The results showed the MBR model with WOE coded categorical variables performed best based on misclassification rate. Using this data, when WOE coding was adopted the model misclassification rate decreased from 0.382 to 0.344 while the sensitivity of the model increased from 0.552 to 0.572.