Hi Rebecca,
The first argument is correct. I explain how to do that in the comment above and I also include an example complete with the .gep file with all the models for each individual class and an Excel spreadsheet with all the calculations for evaluating the class using the argmax function.
Regarding your last question, the GEP models are indeed called from Excel using embedded macros that GeneXproTools automatically generates during Model/Ensemble Deployment to Excel. You can access the VBA code of the gepModels in the Excel worksheet by enabling the Developer tools and then selecting View Code in the Developer tab.
Hope this helps.
Best,
Candida
Dear Candida,
I have gone through the classification as you described. I
have 3 classes. The data for each class are saved in three different text
files. The problem is when I want to import the data to GeneXproTools, I am facing the
following error since I included only one class for training on it:
“Unable to proceed: The response Variable has only 1 kind of
class”
So how to train a model separately for each class and
corresponding data? Is it feasible to add a sample from another class (e.g.
just one sample) to solve the problem?
Thanks
Rebecca
Hi Rebecca,
If you're solving a 3-class problem, your dataset should include records from all the 3 classes and GeneXproTools generates all the necessary datasets for you automatically so you don’t have to generate them yourself. These datasets are binary as GeneXproTools models each class separately both in the Classification Framework and in the Logistic Regression Framework (see the Knowledge Base article on Class Merging & Discretization). You then can combine the models as described above.
Hope this helps.
Candida Ferreira
Hi Rebecca,
Your first question:
1. After ensembling the model to Excel, there is a parameter called "Training Data Constant" which is the same for all of the five trained models! I know it is used when the model output is not a numeric value. But what is this parameter, how it is obtained and why it is fixed for all?
The "Training Data Constants" (either Prior Probability[1] or Majority Class, depending on the Model Output you chose for your models) are constants derived from the Training Data and therefore will have the same value if the same training data was used to create the models. So, the Prior Probability[1] is the percentage of positive cases in the training data expressed as a ratio; and the Majority Class equals 1 if there are more positive cases in the training data than negative cases and zero otherwise.
Your second question:
2. Using the logistic regression analysis, we obtain the following equation for each of the classes:
probabilityOne = 1.0 / (1.0 + exp(-(SLOPE * y + INTERCEPT))) and for a given dataset, the probabilities for each class are compared to find the highest probable class. I have gone through the manual to understand the philosophy behind this concept but it was not easy to grasp some technical parts such as:
- How SLOPE and INTERCEPT are calculated for each class?
All the details are given in the documentation for the Logistic Regression Framework:
Logistic Regression Analytics Platform
- The general formula for the logistic probability p is: p= 1 / (1 + exp(-(a*x+ b))). So here y (which is a nonlinear GEP-based equation in terms of inputs) is replaced with x in the original equation?
That's correct (this is also explained in the above mentioned tutorial).
I recently found out that a good paper on multi-class classification with GEP has been published by a group of researchers at MSU. It seems to be a sound reference for those interested in multi-class classification. This is the link: http://www.sciencedirect.com/science/article/pii/S026322411500679X
It looks like you're new here. If you want to get involved, click one of these buttons!