Posted
Comments 1

We are releasing today GeneXproTools 5.0 MR2 with all the new features that we scheduled for this mini-release "New Project: Cross-Validation, Variable Importance & More":

Please try it out and enjoy!

Download GeneXproTools 5.0 MR2
 

Blog Posts of this Mini-Release

Here's the list of all the blog articles of this project:

Author

Posted
Comments None

The importance of each variable in a model (what we call Variable Importance) is evaluated and shown in the Data Panel, both in the Statistics Report and in the Variable Importance Chart:



Because of its centrality in model assessment and analysis, we are making the Variable Importance Chart more accessible both through an icon and menu shortcuts. So now, instead of having to go to the Data Panel, then selecting "Model Variables" on top and then selecting "Statistics" in the charts on the bottom in order to access the Variable Importance Chart, you can just click the new icon (the Gold Bars icon) to take you directly to the Variable Importance Chart of the active model.

We've implemented this new feature for all modeling categories (Regression, Classification, Logistic Regression, Time Series Prediction, and Logic Synthesis) with both the new icon and menu shortcuts both in the Data Menu and Results/Predictions Menu. This new feature is part of the new mini-release "New Project: Cross-Validation, Var Importance & More" and will be launched shortly with GeneXproTools 5.0 MR2.

Author

Posted
Comments None

With the new mini-release "New Project: Cross-Validation, Var Importance & More" we are re-implementing and improving the favorite statistics Hits and Outliers in order to make them available irrespective of the fitness function that you're using (before, these stats only became available if you were using fitness functions based on the relative or absolute errors).

As we saw with the previous new project "New Project: Multi-class Classification & Trading Strategies", these stats are extremely useful, especially if you're doing multi-class classification in the Regression Framework (see the post "New Classification Algorithms").

The new implementation of these stats in GeneXproTools 5.0 MR2 allows you to choose the error type (relative or absolute error) and the precision. And like for all the other favorite statistics, you can now also evaluate the Cross-Validation Hits and Cross-Validation Outliers (see the post "Bootstrap Cross-Validation"):



These new stats are available in the Regression Framework and in Time Series Prediction and can also be used for model selection during Ensemble Deployment to Excel:



And by the way, these Hits and Outliers statistics are the same ones that you can conveniently visualize in the new multi-functional Data Panel using different charts (Sequential Distribution Chart, Bivariate Line Chart, and Scatter Plot):



Author

Posted
Comments None

Some of the most important new features that we introduced in GeneXproTools 5.0 include different methods for dataset partitioning and subsampling. Now with Mini-Release 2 "New Project: Cross-Validation, Var Importance & More" we are building up on these methods to implement what I call Bootstrap Cross-Validation.

The Bootstrap Cross-Validation technique consists of evaluating a particular measure of fit, for example, the classification accuracy or the R-square of a model, across k different random samples of a specific dataset (training or validation/test dataset) and then averaging the results for the k folds. For each dataset, the random sampling is done with replacement using the number of records chosen by the user in the Settings Panel for the training and validation/test datasets.

The Bootstrap Cross-Validation technique is implemented through the Favorite Statistics Window, allowing you to apply this powerful technique to a wide range of measures of fit. The Bootstrap Cross-Validation technique is available for Regression, Classification and Logistic Regression.



Furthermore, we've also extended the Bootstrap Cross-Validation technique to the fitness evaluation, which means that you also have access to cross-validation results for a wide range of fitness function measures, including custom fitness functions.

So, in conclusion, the Bootstrap Cross-Validation technique is a powerful tool for model selection as it allows you to cross-validate model performance across a wide range of performance metrics, including all the Favorite Statistics and Fitness Functions available for each modeling category and also User Defined Statistics through Custom Fitness Functions.

So, whether you have a big dataset or a small one, you can make the most of it to help you in the selection of the very best model using cross-validation. For example, if you have a big dataset, say, 20k records, and want to both speed up testing and a more accurate measure for the generalization error, you can use instead just 2k records in a 30-fold cross-validation. If, on the other hand, you have a small dataset and are afraid that your validation/test dataset is not representative of the sample population, by using Bootstrap Cross-Validation on the validation/test or on the entire dataset you can increase the odds of selecting the very best model.

The classical cross-validation technique was developed to deal with model overfitting that plagues different algorithms, from Decision Trees to Linear Regression. Model overfitting is not much of an issue in Gene Expression Programming, but still I hope you’ll find this adaptation of cross-validation to the evolutionary context of model building and selection in GeneXproTools a valuable and powerful tool.

Author

Posted
Comments None

The support for the R language in Logic Synthesis requires adding the same set of Boolean Grammars that we generated for the Go language (All Gates Grammar, NOT-AND-OR Grammar, NAND Grammar, NOR Grammar, MUX Grammar, and Reed-Muller System Grammar).

Now repeating the process for the R language (or any other language for that matter) is very similar to what we did for the Go language, so I won't repeat it here. Instead I recommend you take a look at the posts I wrote for the Go language as they cover all you need to know about generating all the Boolean Grammars for any programming language:

But now back to the R language and our new Boolean Grammars, more specifically my choice of template for the R Grammars.

I used the Matlab Boolean Grammars as template for the new R Grammars because both languages implement the XOR function as a function call rather than as an operator (most programming languages implement XOR as an operator and, indeed, of all the programming languages supported by GeneXproTools, only Matlab, Octave, and R implement XOR as a function call). And like we saw for the Go language, the way XOR is implemented is crucial for the way we map each of the 258 built-in logical functions of GeneXproTools in terms of NOT-AND-XOR, which, as you know by now, are the building blocks of the Reed-Muller System.

Let's now take a look at some R code generated with the new Boolean Grammars.

For example, the code below is a minimal logic circuit for the 6-Multiplexer and was designed using just NOT, AND, and OR gates:

gepModel <- function(d)
{
    y <- FALSE

    y <- ((d[6] && d[1]) && d[2])
    y <- y || ((d[5] && d[1]) && (!(d[2])))
    y <- y || (d[2] && (d[4] && (!(d[1]))))
    y <- y || (d[3] && (!((d[2] || d[1]))))

    return (y)
}

Now, thanks to the different grammars we have for different Universal Logical Systems, we can convert automatically the logic circuit above to just NAND gates or NOR gates or MUX gates or NOT-AND-XOR gates (the NOT-AND-OR System would obviously give us the same output as above).

As an example here's the corresponding MUX circuit for the code above generated with our new MUX Grammar for the R language:

gepModel <- function(d)
{
    y <- FALSE

    y <- gepMux(gepMux(d[6],d[6],d[1]),gepMux(d[6],d[6],d[1]),d[2])
    y <- gepMux(y,gepMux(gepMux(d[5],d[5],d[1]),
         gepMux(d[5],d[5],d[1]),(!(d[2]))),y)
    y <- gepMux(y,gepMux(d[2],d[2],gepMux(d[4],d[4],(!(d[1])))),y)
    y <- gepMux(y,gepMux(d[3],d[3],(!(gepMux(d[2],d[1],d[2])))),y)

    return (y)
}

gepMux <- function(x, y, z)
{
    return (((!(x)) && y) || (x && z))
}

And finally, let's also show off the compactness of the Reed-Muller System by converting the NOT-AND-OR circuit above to NOT-AND-XOR:

gepModel <- function(d)
{
    y <- FALSE

    y <- ((d[6] && d[1]) && d[2])
    y <- xor((y && (!(((d[5] && d[1]) && (!(d[2])))))),((d[5] &&
         d[1]) && (!(d[2]))))
    y <- xor((y && (!((d[2] && (d[4] && (!(d[1]))))))),(d[2] &&
         (d[4] && (!(d[1])))))
    y <- xor((y && (!((d[3] && (!(xor((d[2] && (!(d[1]))),d[1]))))))),(d[3] &&
         (!(xor((d[2] && (!(d[1]))),d[1])))))

    return (y)
}

We could have also generated NAND or NOR circuits for this circuit with the NAND or NOR Grammars, but they both are huge and ungainly to show here. Like we saw in the previous posts about the NAND System and NOR System, if we are concerned about performance and our goal is to design NAND or NOR circuits (or any other kind of circuit, for that matter), it's best to design the original circuit with building blocks that map compactly to the gates we are interested in. But if performance is not a concern (I for one love to enter a Zen state where I feed really huge NAND or NOR circuits to a compiler and marvel each time it spits out the correct answer at how perfect and reliable computers really are. By the way, R does not handle these long lines of code very well, which I must say interfered tremendously with my Zen states; on the other hand, the Go compiler worked like a charm…) and you just need to convert whatever circuit you have to NAND gates or NOR gates or MUX gates or what have you, you can use any of the Universal Logical Systems that GeneXproTools implements for automatic circuit conversion.

In the next post I'll move away from Boolean Grammars and Universal Logical Systems and talk about a new way of cross-validating your models in GeneXproTools.

Author

← Older Newer →