Home About Us Contact Join Associates
Products Downloads Purchasing Editions
 

Logistic Regression Framework

Class Encodings and Evolutionary Strategies

 
 
 

Class Encodings and Evolutionary Strategies

The addition of the Logistic Regression Analytics Platform to GeneXproTools 4.0 is a client driven release in response to specific user requests and the analysis of how GeneXproTools is being used in the wild. This first release of the Logistic Regression Analytics Platform leverages the strengths of the Function Finding infrastructure, adding significant analytics capabilities to GeneXproTools. Future versions of GeneXproTools will build on these new analytics tools improving further the user experience and expanding the scope of GeneXproTools.

To help you reap the benefits of the combination of GeneXproTools learning algorithms and the new statistical methods and analyses we suggest that you explore the following in your runs:

  1. Create your core models in the Function Finding Framework
  2. Use the R-square or the Correlation Coefficient fitness functions to drive evolution
  3. Explore different Class Encodings to search the solution space

This implementation of the Logistic Regression Analytics Platform uses the Function Finding Framework in the model creation phase which was tweaked in some places. As our tests indicate that the R-square and the Correlation Coefficient are the most appropriate fitness functions for this type of analysis, when you create a new Function Finding run using a binary dependent variable we infer that you are creating a run for the Logistic Regression Framework and reset the fitness function to the Correlation Coefficient instead of RRSE.


 

The Curve Fitting Chart of the Run Panel, despite not being the most appropriate for a binary target, is still useful to get an idea of the kind of range the evolving models are exploring. Indeed, different fitness functions work on different ranges and therefore explore the solution space differently. Indeed, the reason why both the R-square and Correlation Coefficient fitness functions work so well with the standard 0/1 class encoding is that they can get free of the restricting 0/1 target range of the standard class encoding. For instance, a fitness function such as the one based on the Mean Squared Error (MSE) will only be able to drive evolution towards local optima around the boundaries of the standard 0/1 class encoding, like in the example below.


 

But if you use a different Class Encoding, say -1000/1000, you'll get a very different behavior, and although still confined to the target range, a fitness function such as the MSE has now much more room to explore and come up with good ranges for the model scores. This is of course the most important prerequisite for designing a good model. And you can observe this change in behavior straightaway in the Curve Fitting Chart.


 

For this new Release 2 of GeneXproTools 4.0, the implementation of the R-square fitness function was improved and strengthened and the Correlation Coefficient fitness function was redesigned using the absolute value of the Pearson Correlation Coefficient instead of the square. These two fitness functions appear to be the best creators of good models for the Logistic Regression Framework when using standard 0/1 class encodings. But as mentioned above, by using a different class encoding you'll see that other fitness functions besides the R-square and Correlation Coefficient start to work, giving excellent results too.

GeneXproTools allows you to Change the Class Representation easily and therefore you can experiment with different class encodings without much trouble (and you can just as easily revert to the standard 0/1 encoding if you feel more comfortable with it, although it has no bearing on the real meaning of the binary representation and how everything is processed and shown in the Logistic Regression Window, with the lowest value always representing the standard "0" or Negative cases, and the highest the standard "1" or Positive cases).

To change your Class Encoding within GeneXproTools, choose Change Class Encoding from the Logistic Regression menu. This opens the Class Encoding Window. In the Class Encoding Window, you can choose from several default values, but you can also experiment with all kinds of binary encodings, including systems with rational numbers, by entering any pair of two different numbers in the Change To box in the Other Encodings option.


 

Also notice that you can invert your class representation by ticking the Invert Class Representation check box. This means that what you had originally represented as “0” will become “1” and vice versa. This might prove useful in certain modeling situations, but please keep in mind that GeneXproTools will be handling what you originally had as negative cases as 1’s. And this means that within the Logistic Regression Framework all the predictions and analyses will be made for these new 1’s because the Logistic Regression Technique is by default designed to always predict the 1’s. Remember, however, that you can always revert to the original encoding by inverting the representation once more.

Also worth mentioning in this section about evolutionary strategies, is the fact that, in this Release 2 of GeneXproTools 4.0, we are for the first time allowing the conversion of Classification runs to the Function Finding Framework and vice versa (as long as the dependent variable in the Function Finding runs is binary, of course). This obviously means that you can explore all the fitness functions available for Classification (there are a total of 15 different ones, thus a nice complement to the basic R-square and Correlation Coefficient fitness functions of the Function Finding Framework) to evolve your models. Then, from the Logistic Regression Framework, you can access your model scores (they are not accessible in Classification mode as they are all rounded to either “0” or “1” according to pre-established 0/1 rounding thresholds). And once you have access to your model scores you are ready to proceed with the complete Logistic Regression Analytics, including the construction of Quantile Tables, analysis of Gains and Lift Charts, construction of ROC Curves and Cutoff Points Charts, and of course evaluation of the probabilities with Logistic Regression Models and also the construction of Logistic and ROC Confusion Matrixes.

Of course modelers interested in just Discrete Classifications can also benefit from this bridge between the Logistic Regression Framework and the Classification Framework. Perhaps the most important and obvious consequence of this interconnection, is the fact that, in the Logistic Regression Framework, the equivalent to the 0/1 Rounding Threshold of Classification models is being evolved totally unsupervised and therefore a whole new range of possibilities can be explored. Furthermore, in the Logistic Regression Framework, GeneXproTools also shows and computes the Confusion Matrix derived from the ROC Curve and Optimal Cutoff Point, and therefore you can see immediately how good a model created within the Logistic Regression Framework is at making crisp classifications. And you can make good use of that finding immediately by converting that run to Classification.

When a Logistic Regression run is converted to Classification, the Optimal Model Threshold evaluated for the training dataset with the active model is automatically set up as the 0/1 Rounding Threshold to be used in the new Classification run. Note, however, that in Logistic Regression virtually every single model has its own finely adapted Optimal Model Threshold; but now, after the conversion, only the active model will be able to keep its own. This obviously means that you can only expect the ROC Confusion Matrix you obtained for the active model on the training dataset to match exactly the Confusion Matrix you get on the Classification side. For the testing dataset, however, results may be slightly different. This shouldn’t be a problem, though, for a modeler is usually interested in the best-of-run model and also because it is within the expected variation one gets when a model is checked against unseen data. Note also that if you decide to use this imported model as seed to create even better models for discrete classifications, the whole population will now all adapt to the now shared and fixed 0/1 Rounding Threshold.

It is also worth pointing out that, when you convert a Logistic Regression run to Classification, you can also use the Logistic Cutoff Point as your 0/1 Rounding Threshold. Note, however, that in this case you'll have to set up the 0/1 Rounding Threshold manually in the Fitness Function Tab of the Settings Panel. The confusion matrix you'll get in this case on the Classification side will match obviously the Logistic Confusion Matrix.

 

 
Logistic Regression Framework



 
GeneXproTools 4.0


   


GeneXproServer 1.0



Tutorials


Quick Tour Videos





Gene Expression Programming


Subscribe to the GEP-list

 
 
 
 
     
  Join Associates | Terms of Use | Privacy Statement

 
 

Copyright (c) 2000-2008 Gepsoft Ltd. All rights reserved.