|
Confusion Matrix
In its Logistic Regression Framework, GeneXproTools infers and shows two different
Confusion Matrices: the Logistic Confusion Matrix and the
ROC Confusion Matrix. Both these matrices are excellent indicators of the accuracy of a model (of both the core model and the final logistic
regression model), but they can also be used to fine-tune the logistic
regression model.
The Logistic Confusion Matrix is derived from the logistic
regression model and infers the Most Likely Class through the predicted probabilities evaluated for each sample case. Thus, probabilities higher than or equal to 0.5
(the Logistic Cutoff Point) indicate a Positive response or a
Negative otherwise. The model output closest to the Logistic Cutoff
Point is highlighted in light green in the Confusion Matrix Table.
Note that the exact value of the Logistic Cutoff Point is shown in
the companion Logistic Confusion Matrix Stats Report.
In the Confusion Matrix Table you have access not only to the predicted probabilities for each class but also to the
Most Likely Class plus how these predictions compare to actual target values. GeneXproTools also shows in the
table the Type of each classification (true positive, true negative,
false positive, or false negative) for all sample cases, which are obviously all the calculations
you need to build the Confusion Matrix that is displayed in the graphic section.
The
ROC Confusion Matrix, on the other hand, is inferred using the Optimal Cutoff Point,
a parameter derived from the ROC Curve. This means that for model scores higher than or equal to the Optimal Model Threshold, a
Positive response is predicted; and a Negative response otherwise. Note that, despite displaying here in this section the diagram representation of the ROC Confusion Matrix, the confusion matrix data (Predicted Class, Match, and Type) are shown in the Cutoff Points Table.
Note, however, that the statistics evaluated at the Optimal Cutoff Point (or
OCP statistics, for short) might result in slightly different values than the
ones derived from the ROC Confusion Matrix. Remember that OCP statistics are evaluated using the direct readings of all the parameters
at the Optimal Cutoff Point (this point, which is highlighted in green both in the ROC
Curve Table and Cutoff Points Table,
is also highlighted here in green for a comparison with the Logistic
Cutoff Point). For inverted models, for instance, the ROC Confusion Matrix was adjusted to match the default predictions of binomial logistic regression, which always predicts the “1” or positive class. The OCP statistics, however, are not adjusted for inversion and correspond to the actual values for the model. Also note that if you decide to export an inverted model to the Classification Framework, the confusion matrix you’ll get there
using the Optimal Model Threshold will match the OCP statistics rather than the ROC Confusion Matrix.
Besides the canonical confusion matrix, GeneXproTools also shows a
new kind of confusion matrix. This new confusion matrix plots the
distribution of all the classification outcomes (TP, TN, FP, FN) along the different quantiles or buckets. This shows clearly what each model is doing, and where their strengths and weaknesses lay. And by comparing both
Distribution Confusion Matrices (logistic and ROC), you can also see how both systems are operating. This is valuable information that you can use in different ways, but most importantly you can use it immediately to fine-tune the number of quantiles in your system so that you can get the most of the logistic fit (as a reminder, the ROC Confusion Matrix is quantile-independent and can be used as reference for fine-tuning the logistic
regression model that is quantile dependent).
|