Home About Us Contact Search >>
Products Buy Now Downloads Support
 

Logistic Regression Framework

Logistic Fit Chart

Download the Demo

 
 
 

Logistic Fit Chart

The Logistic Fit Chart is a very useful graph that allows not only a quick visualization of how good the Logistic Fit is (the shape and steepness of the sigmoid curve are excellent indicators of the accuracy of the model), but also how the model outputs are distributed all over the model range.

Logistic Fit Chart - Logistic Regression Framework
 

The blue line (the sigmoid curve) on the graph is the logistic transformation of the model outputs x, using the slope a and intercept b calculated in the Log Odds Chart and is evaluated by the already familiar formula for the probability p:



Since the proportion of Positive (1’s) responses and Negative (0’s) responses must add up to 1, both probabilities can be read on the vertical axis on the left. Thus, the probability of “1” is read directly on the vertical axis; and the probability of “0” is the distance from the line to the top of the graph, which is 1 minus the axis reading.

But there’s more information on the Logistic Fit Chart. Firstly, the vertical axis on the right shows the proportion of Positive and Negative cases in the dataset. Then, by plotting the dummy data points, which consist of up to 1000 randomly selected model scores paired with dummy random ordinates, one can clearly visualize how model scores are dispersed. Are they all clumped together or are they finely distributed, which is the telltale sign of a good model? This is valuable information not only to guide the modeling process (not only in choosing model architecture and composition but also in the exploration of different fitness functions and class encodings that you can use to model your data), but also to sharpen one’s intuition and knowledge about the workings of learning evolutionary systems.

Indeed, browsing through the different models created in a run might prove both insightful and great fun. And you can do that easily as all the models in the Run History are accessible through the History combo box in the Logistic Regression Window. Good models will generally allow for a good distribution, resulting in a unique score for each different case. Bad models, though, will usually concentrate most of their responses around certain values and consequently are unable to distinguish between most cases.

Below is shown a Gallery of Logistic Fit Charts typical of intermediate models generated during a GeneXproTools run. It was generated using the same models used to create the twin ROC Curve Gallery presented in the ROC Analysis section. The models were created for a risk assessment problem with a training dataset with 18,253 cases and using a small population of just 30 programs. Both the R-square and the Area Under the ROC Curve (AUC) of each model, as well as the generation at which they were discovered, are also shown as reference. From left to right and top to bottom, they correspond to the following:

  • Generation 0, R-square = 0.002221, AUC = 0.544463
  • Generation 3, R-square = 0.022389, AUC = 0.584494
  • Generation 8, R-square = 0.050686, AUC = 0.635251
  • Generation 12, R-square = 0.064736, AUC = 0.696237
  • Generation 14, R-square = 0.163695, AUC = 0.746642
  • Generation 344, R-square = 0.219212, AUC = 0.782164
Logistic Fit Chart - Logistic Regression Framework
Generation 0

Logistic Fit Chart - Logistic Regression Framework
Generation 3

Logistic Fit Chart - Logistic Regression Framework
Generation 8

Logistic Fit Chart - Logistic Regression Framework
Generation 12

Logistic Fit Chart - Logistic Regression Framework
Generation 14

Logistic Fit Chart - Logistic Regression Framework
Generation 344


Besides its main goal, which is to estimate the probability of a response, the Logistic Regression Model can also be used to make categorical predictions. From the logistic regression equation introduced in the previous section, we know that when a Positive event has the same probability of happening as a Negative one, the log odds term in the logistic regression equation becomes zero, giving:



where x is the model output at the Logistic Cutoff Point; and a and b are, respectively, the slope and the intercept of the regression line.

The Logistic Cutoff Point can be obviously used to infer a Confusion Matrix (in GeneXproTools it is called Logistic Confusion Matrix), with model scores resulting in probabilities(1) higher than or equal to 0.5 being converted into Positive cases or into Negative otherwise.

In the Logistic Fit Table, GeneXproTools shows the Most Likely Class, the Match, and Type values used to build this Logistic Confusion Matrix (you can see the graphical representation of the Logistic Confusion Matrix in the Confusion Matrix Tab). For easy visualization, the model output closest to the Logistic Cutoff Point is highlighted in light green in the Logistic Fit Table. Note that the exact value of the Logistic Cutoff Point is shown in the companion Logistic Fit Stats Report.

Logistic Fit Chart & Analysis - Logistic Regression Framework

 
 
Download GeneXproTools for Windows Buy GeneXproTools Upgrade GeneXproTools
 
Logistic Regression Framework

   
   
 
GeneXproTools


   


"Finally, a world class user interface in the field of genetic programming and evolutionary computation !! GeneXproTools is simply unrivaled in its marvelous user interface, the breadth of its Fitness Functions, the choice and flexibility in Math and Logic functions, the clarity of its final Model Presentation, and a built in panel for Scoring new data, right inside the interface. This kind of functionality and ease of use has never been seen before in the field of Genetic Programming. Additionally, Dr Ferreira’s specific methodology of Gene Expression Programming makes important contributions to the field of evolutionary computation, and the various algorithms she has developed and deployed inside of GeneXproTools are brilliantly conceived, and her methodologies evolve highly predictive models that solve real business problems. GeneXproTools is an extraordinary structural tour de force."

Brian C. Watt, CRM
Chief Risk Officer / Chief Financial Officer
GECC Inc, USA

   
 

More

   

Tutorials



Quick Tour Videos





Gene Expression Programming


   Subscribe to the GEP-list
Enter 2 + 32 =
Signup Now

 
 
 
 
     

 
Home | What's New | Products | Buy Now | Upgrade | Downloads | Quick Tour | Support | Contact Us | About Gepsoft | Sign Up
Tutorials | Videos | FAQ | Knowledge Base | Logistic Regression KB | Terms of Use | Privacy & Cookies
 
 

Copyright (c) 2000-2013 Gepsoft Ltd. All rights reserved.