Home About Us Contact Join Associates
Products Downloads Purchasing Editions
 

Logistic Regression Framework

Logistic Fit Chart

 
 
 

Logistic Fit Chart

The Logistic Fit Chart is a very useful graph that allows not only a quick visualization of how good the Logistic Fit is (the shape and steepness of the sigmoid curve are excellent indicators of the accuracy of the model), but also how the model outputs are distributed all over the model range.


 

The blue line (the sigmoid curve) on the graph is the logistic transformation of the model outputs x, using the slope a and intercept b calculated in the Log Odds Chart and is evaluated by the already familiar formula for the probability p:



Since the proportion of Positive (1’s) responses and Negative (0’s) responses must add up to 1, both probabilities can be read on the vertical axis on the left. Thus, the probability of “1” is read directly on the vertical axis; and the probability of “0” is the distance from the line to the top of the graph, which is 1 minus the axis reading.

But there’s more information on the Logistic Fit Chart. Firstly, the vertical axis on the right shows the proportion of Positive and Negative cases in the dataset. Then, by plotting the dummy data points, which consist of up to 1000 randomly selected model scores paired with dummy random ordinates, one can clearly visualize how model scores are dispersed. Are they all clumped together or are they finely distributed, which is the telltale sign of a good model? This is valuable information not only to guide the modeling process (not only in choosing model architecture and composition but also in the exploration of different fitness functions and class encodings that you can use to model your data), but also to sharpen one’s intuition and knowledge about the workings of learning evolutionary systems.

Indeed, browsing through the different models created in a run might prove both insightful and great fun. And you can do that easily as all the models in the Run History are accessible through the History combo box in the Logistic Regression Window. Good models will generally allow for a good distribution, resulting in a unique score for each different case. Bad models, though, will usually concentrate most of their responses around certain values and consequently are unable to distinguish between most cases.

Below is shown a Gallery of Logistic Fit Charts typical of intermediate models generated during a GeneXproTools run. It was generated using the same models used to create the twin ROC Curve Gallery presented in the ROC Analysis section. The models were created for a risk assessment problem with a training dataset with 18,253 cases and using a small population of just 30 programs. Both the R-square and the Area Under the ROC Curve (AUC) of each model, as well as the generation at which they were discovered, are also shown as reference. From left to right and top to bottom, they correspond to the following:

  • Generation 0, R-square = 0.002221, AUC = 0.544463
  • Generation 3, R-square = 0.022389, AUC = 0.584494
  • Generation 8, R-square = 0.050686, AUC = 0.635251
  • Generation 12, R-square = 0.064736, AUC = 0.696237
  • Generation 14, R-square = 0.163695, AUC = 0.746642
  • Generation 344, R-square = 0.219212, AUC = 0.782164

Generation 0


Generation 3


Generation 8


Generation 12


Generation 14


Generation 344


Besides its main goal, which is to estimate the probability of a response, the Logistic Regression Model can also be used to make categorical predictions. From the logistic regression equation introduced in the previous section, we know that when a Positive event has the same probability of happening as a Negative one, the log odds term in the logistic regression equation becomes zero, giving:



where x is the model output at the Logistic Cutoff Point; and a and b are, respectively, the slope and the intercept of the regression line.

The Logistic Cutoff Point can be obviously used to infer a Confusion Matrix (in GeneXproTools it is called Logistic Confusion Matrix), with model scores resulting in probabilities(1) higher than or equal to 0.5 being converted into Positive cases or into Negative otherwise.

In the Logistic Fit Table, GeneXproTools shows the Most Likely Class, the Match, and Type values used to build this Logistic Confusion Matrix (you can see the graphical representation of the Logistic Confusion Matrix in the Confusion Matrix Tab). For easy visualization, the model output closest to the Logistic Cutoff Point is highlighted in light green in the Logistic Fit Table. Note that the exact value of the Logistic Cutoff Point is shown in the companion Logistic Fit Stats Report.



 
 
Logistic Regression Framework



 
GeneXproTools 4.0


   


GeneXproServer 1.0



Tutorials


Quick Tour Videos





Gene Expression Programming


Subscribe to the GEP-list

 
 
 
 
     
  Join Associates | Terms of Use | Privacy Statement

 
 

Copyright (c) 2000-2008 Gepsoft Ltd. All rights reserved.