Ensemble Models in Classification
  • I am most interested in classification and in using Ensembles to achieve the same and on this I have two questions.

    1. Does the software indicate the important variables that need to be used (i.e. identify the subset of variables that need to be used)

    2. On your website, it says -  “Ensemble Deployment without Models Embedded - New & faster data-only deployment mode to Excel of large model random forests”. Are these Random Forests algorithms  based on Leo Breiman’s algorithm or in this case does it stand for GEP based algorithms.

  • Hi,

    The first question – important variables:
    You can evaluate the correlations between all variables in your datasets in the Data Panel, for example by choosing the Scatter Plot and then selecting any pair of variables. Furthermore, GeneXproTools aggregates and plots the most important correlations between variables, which are the correlations between each predictor variable and the response, in the Statistics Charts –> Correl Chart or R-square Chart. You can then use that information to create different datasets using different subsets of variables to create your models (we will be adding new functionality for more controlled subset selection in future).

    Notwithstanding, GeneXproTools provides a wide range of tools for subset selection. First, GeneXproTools algorithms and fitness functions (both with Parsimony Pressure and Variable Pressure) incorporate and blend automatically different subsets of variables according to the importance they confer to the evolving models. Second, the Subset Selection modeling strategy available through the Genetic Operators Tab, is another advanced tool for creating different models using different subsets of variables, which in this case are chosen randomly by the algorithms (and since you are particularly interested in model ensembles, this functionality provides one of the basic techniques for generating good ensembles).

    Related to this is also the variable importance of each variable in the models that you end up creating. For each model, GeneXproTools evaluates the variable importance of all its variables and shows the results graphically in the Statistics Charts -> Variable Importance in the Data Panel.


    The second question – Model Ensembles:
    You can create and manage multiple models in GeneXproTools and then combine them in different ways and then deploy them automatically to Excel. The models GeneXproTools generates are GEP models, which as you know are also trees and also randomly generated, so the name random forests seems appropriate, but perhaps we should stick to GEP Forests or just plain Ensembles to avoid confusion.

    Candida Ferreira

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!