New Classification Algorithms

Comments None

Both the new classifier functions and the new linking functions introduced with this mini-release give rise to new ways of doing classification in GeneXproTools.

For example, the new classifier functions of 2-6 discrete outputs greatly simplify the creation of classification models in the Regression Framework when we are interested in crisp outcomes for the model outputs.

Furthermore, these new classifier functions can be used both in unigenic and multigenic systems not only in binary classification but also in multi-class classification. It's worth noting, however, that for multi-class classification better results are usually obtained by decomposing the problem in k sub-tasks (k is the number of classes). But as usual in these situations it all depends on the problem and on the data. For example, all the new classifier functions of 3-6 outputs perform great on the Iris dataset (a 3-class classification problem), but on the Balance Scale dataset (also with 3 classes and the same number of input variables) they perform poorly, mainly due to the unbalanced distribution of classes in the original data. This is obviously a problem for which we can find a solution, like for instance, creating a balanced dataset, but the recommended course of action for multi-class classification is to divide the problem into k binary sub-tasks and solve each one of them in the Logistic Regression Framework or Classification Framework where we have access to the probabilities for each class.

Notwithstanding, for multi-class classification problems where these new 3-6 output classifier functions work well, you can either use these new discrete functions in combination with other math functions in systems with just one tree or in systems with multiple trees, linking the sub-trees with the most appropriate linking function. For example, the Min & Max linking functions work particularly well in those cases.

The other alternative is to use the new discrete output linking functions. In this case, you can either add some of the new discrete output functions to your function set or use just a discrete output function for the linking. Particularly interesting is the setup with just 2 trees linked by a function like the buy-sell-wait function described in the post "Function Design: The BUY-SELL-WAIT Function". But this obviously extends to all kinds of 3-class classification problems as long as the data is amenable to this kind of approach.

And to finish, just a quick note on the uses of these new discrete output linkers in the Logistic Regression Framework and the Classification Framework.

GeneXproTools allows you to use the new linking functions both in Logistic Regression and Classification, but you must be aware that the models you're creating are much simpler (which might be great if you're interested in just that) than the ones created with non-discrete linkers (addition, min, max, Avg2 and others). Therefore these simpler models won't give you much differentiation in terms of probabilities. But again, this might prove useful in certain problems or as a form to gain insight into your data.

So you're free to explore all these new classification algorithms by exploring different combinations of fitness functions (the fitness functions of Regression, Logistic Regression and Classification were all fine-tuned for different purposes), visualization tools (the model fitting charts of Regression and Logistic Regression/Classification are all different), and analytics (the measures of fit used in Regression and Logistic Regression/Classification are also different).

Have fun!



There are currently no comments on this article.



← Older Newer →