Logistic Fit Chart
The Logistic Fit Chart is a very useful graph that allows not only a quick visualization of how good the
Logistic Fit is (the shape and steepness of the sigmoid curve are excellent indicators of the accuracy of
the model), but also how the model outputs are distributed all over the
The blue line (the sigmoid curve) on the graph is the logistic transformation of the model outputs x, using the
slope a and intercept b calculated in the Log Odds Chart and is evaluated by the already familiar formula for the probability p:
Since the proportion of Positive (1’s) responses and Negative
(0’s) responses must add up to 1, both probabilities can be read on
the vertical axis on the left. Thus, the probability of “1”
is read directly on the vertical axis; and the probability of “0”
is the distance from the line to the top of the graph, which is 1
minus the axis reading.
But there’s more information on the Logistic Fit Chart. Firstly, the
vertical axis on the right shows the proportion of Positive and
Negative cases in the dataset. Then, by plotting the dummy data
points, which consist of up to 1000 randomly selected model
scores paired with dummy random ordinates, one can clearly visualize
how model scores are dispersed. Are they all clumped together or are
they finely distributed, which is the telltale sign of a good model?
This is valuable information not only to guide the modeling process
(not only in choosing model architecture and composition but also in
the exploration of different fitness
functions and class encodings that you can use to model your
data), but also to sharpen one’s intuition and knowledge about the
workings of learning evolutionary systems.
Indeed, browsing through the different models created in a run might
prove both insightful and great fun. And you can do that easily as
all the models in the Run History are accessible through the History
combo box in the Logistic Regression Window. Good models will
generally allow for a good distribution, resulting in a unique score
for each different case. Bad models, though, will usually
concentrate most of their responses around certain values and
consequently are unable to distinguish between most cases.
Below is shown a Gallery of
Logistic Fit Charts typical of intermediate models generated during a GeneXproTools run.
It was generated using the same models used to create the
ROC Curve Gallery
presented in the ROC Analysis section.
The models were created for a risk assessment problem with a training dataset with
18,253 cases and using a small population of just 30 programs. Both
the R-square and the Area Under the ROC Curve (AUC) of each model, as well
as the generation at which they were discovered, are also shown as
reference. From left to right and top to bottom, they correspond to the following:
- Generation 0, R-square = 0.002221, AUC = 0.544463
- Generation 3, R-square = 0.022389, AUC = 0.584494
- Generation 8, R-square = 0.050686, AUC = 0.635251
- Generation 12, R-square = 0.064736, AUC = 0.696237
- Generation 14, R-square = 0.163695, AUC = 0.746642
- Generation 344, R-square = 0.219212, AUC = 0.782164
Besides its main goal, which is to estimate the probability of a
response, the Logistic Regression Model can also be used to make
From the logistic
regression equation introduced in the
previous section, we know that when a Positive event has the
same probability of happening as a Negative one, the log odds term
in the logistic regression equation becomes zero, giving:
where x is the model output at the Logistic Cutoff
Point; and a and b are, respectively, the slope
and the intercept of the regression line.
The Logistic Cutoff Point can be obviously used to infer a
Confusion Matrix (in GeneXproTools it is called Logistic
Confusion Matrix), with model scores resulting in
probabilities(1) higher than or equal to 0.5 being converted into
Positive cases or into Negative otherwise.
In the Logistic Fit Table, GeneXproTools shows the Most
Likely Class, the Match, and Type values used to build this Logistic
Confusion Matrix (you can see the graphical representation of the
Logistic Confusion Matrix in the Confusion
Matrix Tab). For easy visualization, the model output closest to
the Logistic Cutoff Point is highlighted in light green in the
Logistic Fit Table. Note that the exact value of the Logistic Cutoff
Point is shown in the companion Logistic Fit Stats Report.