- Bootstrap Cross-Validation
- Hits & Outliers Favorite Statistics
- Shortcuts to the Variable Importance Chart
- Support for the Go Language
- Support for the R Language in Logic Synthesis

Please try it out and enjoy!

Download GeneXproTools 5.0 MR2

**Blog Posts of this Mini-Release
**

Here's the list of all the blog articles of this project:

- New Project: Cross-Validation, Var Importance & More
- Support for the Go Language in GeneXproTools
- New Math Functions in Go
- Go Language: Boolean Grammars
- Go Language: Boolean Xor Operator
- Go Language: All Gates Boolean Grammar
- Go Language: NOT-AND-OR Grammar
- Go Language: NAND Grammar
- Go Language: NOR Grammar
- Go Language: MUX System
- Go Language: Reed-Muller System
- R Language: Boolean Grammars
- Bootstrap Cross-Validation
- Hits/Outliers Favorite Statistics
- Variable Importance Chart

Because of its centrality in model assessment and analysis, we are making the **Variable Importance Chart** more accessible both through an icon and menu shortcuts. So now, instead of having to go to the Data Panel, then selecting "Model Variables" on top and then selecting "Statistics" in the charts on the bottom in order to access the Variable Importance Chart, you can just click the new icon (the Gold Bars icon) to take you directly to the Variable Importance Chart of the active model.

We've implemented this new feature for all modeling categories (Regression, Classification, Logistic Regression, Time Series Prediction, and Logic Synthesis) with both the new icon and menu shortcuts both in the Data Menu and Results/Predictions Menu. This new feature is part of the new mini-release "New Project: Cross-Validation, Var Importance & More" and will be launched shortly with GeneXproTools 5.0 MR2.

]]>As we saw with the previous new project "New Project: Multi-class Classification & Trading Strategies", these stats are extremely useful, especially if you're doing **multi-class classification** in the Regression Framework (see the post "New Classification Algorithms").

The new implementation of these stats in GeneXproTools 5.0 MR2 allows you to choose the **error type** (relative or absolute error) and the **precision**. And like for all the other favorite statistics, you can now also evaluate the **Cross-Validation Hits** and **Cross-Validation Outliers** (see the post "Bootstrap Cross-Validation"):

These new stats are available in the **Regression Framework** and in **Time Series Prediction** and can also be used for model selection during Ensemble Deployment to Excel:

And by the way, these Hits and Outliers statistics are the same ones that you can conveniently visualize in the new multi-functional Data Panel using different charts (Sequential Distribution Chart, Bivariate Line Chart, and Scatter Plot):

]]>The **Bootstrap Cross-Validation technique** consists of evaluating a particular measure of fit, for example, the classification accuracy or the R-square of a model, across *k* different random samples of a specific dataset (training or validation/test dataset) and then averaging the results for the *k* folds. For each dataset, the random sampling is done with replacement using the number of records chosen by the user in the Settings Panel for the training and validation/test datasets.

The Bootstrap Cross-Validation technique is implemented through the **Favorite Statistics Window**, allowing you to apply this powerful technique to a wide range of **measures of fit**. The Bootstrap Cross-Validation technique is available for Regression, Classification and Logistic Regression.

Furthermore, we've also extended the Bootstrap Cross-Validation technique to the **fitness evaluation**, which means that you also have access to cross-validation results for a wide range of **fitness function measures**, including **custom fitness functions**.

So, in conclusion, the Bootstrap Cross-Validation technique is a powerful tool for **model selection** as it allows you to cross-validate model performance across a wide range of performance metrics, including all the **Favorite Statistics** and **Fitness Functions** available for each modeling category and also **User Defined Statistics** through **Custom Fitness Functions**.

So, whether you have a big dataset or a small one, you can make the most of it to help you in the selection of the very best model using cross-validation. For example, if you have a big dataset, say, 20k records, and want to both speed up testing and a more accurate measure for the generalization error, you can use instead just 2k records in a 30-fold cross-validation. If, on the other hand, you have a small dataset and are afraid that your validation/test dataset is not representative of the sample population, by using Bootstrap Cross-Validation on the validation/test or on the entire dataset you can increase the odds of selecting the very best model.

The classical cross-validation technique was developed to deal with **model overfitting** that plagues different algorithms, from Decision Trees to Linear Regression. Model overfitting is not much of an issue in Gene Expression Programming, but still I hope you’ll find this adaptation of cross-validation to the evolutionary context of model building and selection in GeneXproTools a valuable and powerful tool.

Now repeating the process for the R language (or any other language for that matter) is very similar to what we did for the **Go language**, so I won't repeat it here. Instead I recommend you take a look at the posts I wrote for the Go language as they cover all you need to know about generating all the **Boolean Grammars** for any programming language:

- Go Language: Boolean Grammars
- Go Language: Boolean Xor Operator
- Go Language: All Gates Boolean Grammar
- Go Language: NOT-AND-OR Grammar
- Go Language: NAND Grammar
- Go Language: NOR Grammar
- Go Language: MUX System
- Go Language: Reed-Muller System

But now back to the R language and our new Boolean Grammars, more specifically my choice of template for the R Grammars.

I used the Matlab Boolean Grammars as template for the new R Grammars because both languages implement the XOR function as a function call rather than as an operator (most programming languages implement XOR as an operator and, indeed, of all the programming languages supported by GeneXproTools, only Matlab, Octave, and R implement XOR as a function call). And like we saw for the Go language, the way XOR is implemented is crucial for the way we map each of the 258 built-in logical functions of GeneXproTools in terms of NOT-AND-XOR, which, as you know by now, are the building blocks of the Reed-Muller System.

Let's now take a look at some **R code** generated with the new Boolean Grammars.

For example, the code below is a minimal logic circuit for the 6-Multiplexer and was designed using just NOT, AND, and OR gates:

gepModel <- function(d)

{

y <- FALSE

y <- ((d[6] && d[1]) && d[2])

y <- y || ((d[5] && d[1]) && (!(d[2])))

y <- y || (d[2] && (d[4] && (!(d[1]))))

y <- y || (d[3] && (!((d[2] || d[1]))))

return (y)

}

{

y <- FALSE

y <- ((d[6] && d[1]) && d[2])

y <- y || ((d[5] && d[1]) && (!(d[2])))

y <- y || (d[2] && (d[4] && (!(d[1]))))

y <- y || (d[3] && (!((d[2] || d[1]))))

return (y)

}

Now, thanks to the different grammars we have for different **Universal Logical Systems**, we can convert automatically the logic circuit above to just NAND gates or NOR gates or MUX gates or NOT-AND-XOR gates (the NOT-AND-OR System would obviously give us the same output as above).

As an example here's the corresponding **MUX circuit** for the code above generated with our new MUX Grammar for the R language:

gepModel <- function(d)

{

y <- FALSE

y <- gepMux(gepMux(d[6],d[6],d[1]),gepMux(d[6],d[6],d[1]),d[2])

y <- gepMux(y,gepMux(gepMux(d[5],d[5],d[1]),

gepMux(d[5],d[5],d[1]),(!(d[2]))),y)

y <- gepMux(y,gepMux(d[2],d[2],gepMux(d[4],d[4],(!(d[1])))),y)

y <- gepMux(y,gepMux(d[3],d[3],(!(gepMux(d[2],d[1],d[2])))),y)

return (y)

}

gepMux <- function(x, y, z)

{

return (((!(x)) && y) || (x && z))

}

{

y <- FALSE

y <- gepMux(gepMux(d[6],d[6],d[1]),gepMux(d[6],d[6],d[1]),d[2])

y <- gepMux(y,gepMux(gepMux(d[5],d[5],d[1]),

gepMux(d[5],d[5],d[1]),(!(d[2]))),y)

y <- gepMux(y,gepMux(d[2],d[2],gepMux(d[4],d[4],(!(d[1])))),y)

y <- gepMux(y,gepMux(d[3],d[3],(!(gepMux(d[2],d[1],d[2])))),y)

return (y)

}

gepMux <- function(x, y, z)

{

return (((!(x)) && y) || (x && z))

}

And finally, let's also show off the compactness of the Reed-Muller System by converting the NOT-AND-OR circuit above to NOT-AND-XOR:

gepModel <- function(d)

{

y <- FALSE

y <- ((d[6] && d[1]) && d[2])

y <- xor((y && (!(((d[5] && d[1]) && (!(d[2])))))),((d[5] &&

d[1]) && (!(d[2]))))

y <- xor((y && (!((d[2] && (d[4] && (!(d[1]))))))),(d[2] &&

(d[4] && (!(d[1])))))

y <- xor((y && (!((d[3] && (!(xor((d[2] && (!(d[1]))),d[1]))))))),(d[3] &&

(!(xor((d[2] && (!(d[1]))),d[1])))))

return (y)

}

{

y <- FALSE

y <- ((d[6] && d[1]) && d[2])

y <- xor((y && (!(((d[5] && d[1]) && (!(d[2])))))),((d[5] &&

d[1]) && (!(d[2]))))

y <- xor((y && (!((d[2] && (d[4] && (!(d[1]))))))),(d[2] &&

(d[4] && (!(d[1])))))

y <- xor((y && (!((d[3] && (!(xor((d[2] && (!(d[1]))),d[1]))))))),(d[3] &&

(!(xor((d[2] && (!(d[1]))),d[1])))))

return (y)

}

We could have also generated NAND or NOR circuits for this circuit with the NAND or NOR Grammars, but they both are huge and ungainly to show here. Like we saw in the previous posts about the NAND System and NOR System, if we are concerned about performance and our goal is to design NAND or NOR circuits (or any other kind of circuit, for that matter), it's best to design the original circuit with building blocks that map compactly to the gates we are interested in. But if performance is not a concern (I for one love to enter a Zen state where I feed really huge NAND or NOR circuits to a compiler and marvel each time it spits out the correct answer at how perfect and reliable computers really are. By the way, R does not handle these long lines of code very well, which I must say interfered tremendously with my Zen states; on the other hand, the Go compiler worked like a charm…) and you just need to convert whatever circuit you have to NAND gates or NOR gates or MUX gates or what have you, you can use any of the Universal Logical Systems that GeneXproTools implements for automatic circuit conversion.

In the next post I'll move away from Boolean Grammars and Universal Logical Systems and talk about a new way of cross-validating your models in GeneXproTools.

]]>