|
Testing a Model
The predictive accuracy of logistic regression models (or more precisely, the core model of the logistic regression procedure) can be evaluated like all the models are evaluated in GeneXproTools, that is, as soon as evolution stops and if a testing set is available, both the fitness and R-square are immediately evaluated for the
testing dataset and the results are shown straightaway on the
Run Panel. Furthermore, an additional set of statistics, including the correlation coefficient, are
evaluated and shown in the
Results Panel. And there you can also test the predictive accuracy of all the models created in a run.
When the fitness and R-square obtained for the testing set are about the same as the values obtained for the training set (and when the partition of the data is done correctly, they usually are, for GEP models rarely, if ever, overfit the data), this is a good indicator that your model is a good one and therefore can be used to make good predictions, which, in this case, means using it to create the final
Logistic Regression Model in the
Logistic Regression Window.
Additionally, within the Logistic Regression Framework, GeneXproTools allows you to run the whole set of analytics tools on the testing dataset, including construction and analysis of
Quantile Tables, ROC Curves,
Cutoff Points, Gains and
Lift Charts, Logistic Regression and
Logistic Fit, and
ROC and Logistic Confusion Matrixes. For that you just have to select Testing Set in the Dataset combo box in the Logistic Regression Window.
Note, however, that this additional testing procedure builds its own
Quantile Table and also evaluates and uses its own slope and intercept for the
Logistic Regression Model. This means obviously that the logistic regression parameters evaluated for the training dataset are not used in this testing procedure, which may have been interesting in some cases as a form of further testing the model.
If such a rigorous testing is desired, though, you can always perform a blind
scoring on this testing dataset (you’ll have to remove obviously the target output from the scoring dataset).
Indeed, the logistic regression model that GeneXproTools deploys during
scoring, uses the slope and intercept evaluated for the training
dataset. This means that you can easily perform this precise testing
within GeneXproTools using its new Logistic
Regression Scoring Engine. The Scoring Engine was updated so
that you could evaluate not only the predicted probabilities but also the most likely class.
This means that you can use it to quickly compute rigorously the
predictive accuracy by comparing the scoring results with your blind
target values.
|