Home About Us Contact Search >>
Products Buy Now Downloads Support
 

Logistic Regression Framework

ROC Analysis

Download the Demo

 
 
 

ROC Analysis

Receiver Operating Characteristic or ROC Curves are useful visualization tools that allow a quick assessment of the quality of a model. They are usually plotted in reference to a Baseline or Random Model, with the Area Under the ROC Curve (or AUC for short) as a popular indicator of the quality of a model.

So, for the Random Model, the area under the ROC curve is equal to 0.5, which means that the further up (or down, for inverted models) a model is from 0.5 the better it is. Indeed, for perfect models on both sides of the random line, what is called ROC heaven takes place when AUC = 1 (for normal models) or AUC = 0 (for inverted models). Below is shown a typical ROC curve obtained for a risk assessment model using a training dataset with 18,253 cases. This model has an R-square of 0.245889 (R-square values might seem unusually low, but in risk assessment applications R-square values around 0.23 are considered excellent and indicative of a good model) and an AUC of 0.800028.

ROC Curve & AUC - Logistic Regression Framework
 

And below is shown a Gallery of ROC Curves typical of intermediate models generated during a GeneXproTools run. These specifically were created for a risk assessment problem with a training dataset with 18,253 cases and using a small population of just 30 programs. The R-square of each model, as well as the generation at which they were discovered, are also shown as illustration. From left to right and top to bottom, they are as follow (see also the twin Gallery of Logistic Fit Charts in the Logistic Fit section):

  • Generation 0, R-square = 0.002221, AUC = 0.544463
  • Generation 3, R-square = 0.022389, AUC = 0.584494
  • Generation 8, R-square = 0.050686, AUC = 0.635251
  • Generation 12, R-square = 0.064736, AUC = 0.696237
  • Generation 14, R-square = 0.163695, AUC = 0.746642
  • Generation 344, R-square = 0.219212, AUC = 0.782164
ROC Curve & AUC - Logistic Regression Framework
Generation 0

ROC Curve & AUC - Logistic Regression Framework
Generation 3

ROC Curve & AUC - Logistic Regression Framework
Generation 8

ROC Curve & AUC - Logistic Regression Framework
Generation 12

ROC Curve & AUC - Logistic Regression Framework
Generation 14

ROC Curve & AUC - Logistic Regression Framework
Generation 344


ROC Curves and Tables are also useful to evaluate what is called the Optimal Cutoff Point, which is given by the maximum of the Youden index. The Youden index J returns the maximum value of the expression (for inverted models, it returns the minimum):

J = max[SE(t) + SP(t) - 1]


where SE(t) and SP(t) are, respectively, the sensitivity and specificity over all possible threshold values t of the model. Thus, the Optimal Model Threshold corresponds to the model output at the Optimal Cutoff Point.

In the ROC Table, GeneXproTools also shows all “SE + SP -1” values and highlights in light green the row with the Optimal Cutoff Point and corresponding Optimal Model Threshold. These parameters are also shown in the Quantiles Statistics Report.

ROC Curve & Analysis - Logistic Regression Framework
 

The Optimal Model Threshold can be obviously used to infer a Confusion Matrix (in GeneXproTools it is called ROC Confusion Matrix) and, in the Cutoff Points Table, you have access to the Predicted Class, the Match, and Type values used to build this ROC Confusion Matrix (you can see the graphical representation of the ROC Confusion Matrix in the Confusion Matrix section).

The visualization of the ROC Confusion Matrix is a valuable tool and can in fact be used to determine the right number of buckets to achieve a good fit with the Logistic Regression Model. But GeneXproTools allows you to do more with the ROC Confusion Matrix and associated Optimal Model Threshold. By allowing the conversion of Logistic Regression runs to the Classification Framework, you can use this model, with its finely adapted Optimal Model Threshold, straightaway to make categorical classifications using the Classification Scoring Engine of GeneXproTools.

The Youden index is also used to evaluate a wide set of useful statistics at the Optimal Cutoff Point (OCP statistics for short). They include:

  • TP (True Positives)
  • TN (True Negatives)
  • FP (False Positives)
  • FN (False Negatives)
  • TPR (True Positives Rate or Sensitivity)
  • TNR (True Negatives Rate or Specificity)
  • FPR (False Positives Rate, also known as 1-Specificity)
  • FNR (False Negatives Rate)
  • PPV (Positive Predictive Value)
  • NPV (Negative Predictive Value)
  • Classification Accuracy (Correct Classifications)
  • Classification Error (Wrong Classifications)

How they are calculated is shown in the table below ("TC" represents the number of Total Cases):

TPR (Sensitivity) TP / (TP + FN)
TNR (Specificity) TN / (TN + FP)
FPR (1-Specificity) FP / (FP + TN)
FNR FN / (FN + TP)
PPV TP / (TP + FP), and TP + FP ≠ 0
NPV TN / (TN + FN), and TN + FN ≠ 0
Classification Accuracy (TP + TN) / TC
Classification Error (FP + FN) / TC

 

It is worth pointing out that OCP statistics are quantile-independent and therefore are a good indicator of what could be achieved with a model in terms of logistic fit and accuracy.
 

 
Download GeneXproTools for Windows Buy GeneXproTools Upgrade GeneXproTools
 
Logistic Regression Framework

   
   
 
GeneXproTools


   


"Finally, a world class user interface in the field of genetic programming and evolutionary computation !! GeneXproTools is simply unrivaled in its marvelous user interface, the breadth of its Fitness Functions, the choice and flexibility in Math and Logic functions, the clarity of its final Model Presentation, and a built in panel for Scoring new data, right inside the interface. This kind of functionality and ease of use has never been seen before in the field of Genetic Programming. Additionally, Dr Ferreira’s specific methodology of Gene Expression Programming makes important contributions to the field of evolutionary computation, and the various algorithms she has developed and deployed inside of GeneXproTools are brilliantly conceived, and her methodologies evolve highly predictive models that solve real business problems. GeneXproTools is an extraordinary structural tour de force."

Brian C. Watt, CRM
Chief Risk Officer / Chief Financial Officer
GECC Inc, USA

   
 

More

   

Tutorials



Quick Tour Videos





Gene Expression Programming


   Subscribe to the GEP-list
Enter 2 + 32 =
Signup Now

 
 
 
 
     

 
Home | What's New | Products | Buy Now | Upgrade | Downloads | Quick Tour | Support | Contact Us | About Gepsoft | Sign Up
Tutorials | Videos | FAQ | Knowledge Base | Logistic Regression KB | Terms of Use | Privacy & Cookies
 
 

Copyright (c) 2000-2013 Gepsoft Ltd. All rights reserved.