|
Getting Started
In order to access the Logistic Regression Framework of GeneXproTools 4.0 you need to:
- Create a statistical model that explains a binary dependent variable, using either the Function Finding / Logistic Regression Framework or the Classification Framework of GeneXproTools.
In the Function Finding / Logistic Regression Framework all datasets with binary dependent variables use by default the Correlation Coefficient Fitness Function as this kind of function gives the best results with the standard 0/1
class encoding.
In the Classification Framework you have access to a wide variety of fitness functions that despite using pre-established rounding thresholds, offer interesting alternatives for exploring the solution space.
-
In the Function Finding / Logistic Regression Framework, click the Logistic Regression menu and then choose one of the available
analytics tools:
Quantile Analysis, ROC Curve,
Cutoff Points, Gains Chart,
Lift Chart, Log Odds,
Logistic Fit, or Confusion Matrix.
This activates the number-crunching process of the Logistic Regression Analytics
Platform that starts with the construction of the
Quantile Table and finishes with the creation of the
Logistic Regression Model and the inference of very clear
Confusion Matrixes. After the processing has stopped, you just have to navigate from tab to tab to
evaluate the quality and performance of your modeling system.
In the Logistic Regression Window of GeneXproTools you can:
- Analyze and create Quantile Tables and Charts;
perform Quantile Regression; analyze the
ROC Curve of your models; visualize the
Optimal Cutoff Point for your test scores; study the
Gains and Lift Charts of your models;
access the
Log Odds Chart used to evaluate the slope and intercept of the
Logistic Regression Model; visualize how well your logistic model fits the data in the
Logistic Fit Chart; and compare and analyze
Logistic and ROC Confusion Matrixes using both 2 x 2 Contingency Tables and
quantile-based Distribution Charts.
-
Copy all the Tables and Charts to the clipboard.
All the Tables and Charts generated within the Logistic Regression
Window can be copied to the clipboard through the mouse right-click
menu. Tables can be copied in their entirety or you can copy just
selected rows or individual columns.
-
Print all the Charts.
All the Charts in the Logistic Regression Window (Quantile Charts,
ROC Curve, Cutoff Points,
Gains Chart, Lift Chart,
Log Odds, Logistic Fit, and
Confusion Matrixes) can be printed through the mouse right-click menu.
-
Copy and print the Statistics Report.
The Stats Report summarizes all the relevant parameters and
statistics derived from all the analyses (Quantile
Regression,
ROC Curve, Cutoff Points,
Gains Chart, Lift Chart,
Log Odds, Logistic Fit, and
Confusion Matrixes) performed for the
active model and active dataset. It also contains relevant
information about the training and testing data, such as class
distribution and number of records. And finally, the Stats Report
also summarizes some basic information about the model, such as its
fitness and R-square and if any calculation errors occurred during
the computation of the model scores. Within the Logistic Regression
Window, all such calculation errors (they can happen when processing
the unseen
data of the testing set and also of the "training dataset" if it was
replaced by another one or if the model itself was modified by the
user) return zero so that the calculations can resume. Note,
however, that GeneXproTools flags these errors clearly, highlighting them in
light red in all the tables where the
model outputs are shown (ROC Table, Cutoff Points
Table,
Logistic Fit Table, and
Confusion Matrix Table).
-
Choose a different number of buckets for your Quantile Table and then see immediately how it affects the
Logistic Regression Model through the
Logistic Fit Chart.
The number of buckets is an essential parameter for most of the analyses
performed in the Logistic Regression Window (Quantile
Regression,
Gains Chart, Lift Chart,
Log Odds and Logistic Regression, Logistic Fit, and
Logistic Confusion Matrix) and therefore
it is saved for each model and for each dataset.
By using the ROC-derived accuracy as your golden standard (it is quantile-independent and remains unchanged for a particular model), you can fine-tune the number of buckets to get the most of your models. Note, however, that
it is not uncommon to get better performance on the Logistic Confusion Matrix, which of course is indicative of a very good
Logistic Fit.
-
Access the testing dataset so that you can not only test further the predictive accuracy of your model
but also build logistic regression models with it.
The testing dataset was never brought into contact with the model during the training process and therefore constitutes an excellent blind test for checking the predictive accuracy of your model
on unseen data.
You access the testing dataset by choosing Testing Set in the Dataset combo box. GeneXproTools then creates a specific
Quantile Table for the testing dataset and
also performs the complete logistic regression analysis for this dataset. Note, however, that if you want to use this logistic regression model
(that is, the slope and intercept evaluated for the testing set) for scoring new cases using the
Scoring Engine of GeneXproTools, you’ll have to replace the original training dataset with this one and then recalculate the logistic parameters (the slope and intercept of the
Log Odds Chart) with this new operational dataset.
-
Analyze all the intermediate models created in a run by selecting any model in the History combo box.
Each model in the Run History is identified by its ID and training R-square for easy access in the History combo box (although controversial, the small R-square values typical of
risk assessment and response models are useful indicators of the quality of a model and are in fact widely used by real-world modelers;
for instance, R-square values around 0.23 are considered excellent for typical
risk assessment models and indicative of a good fit). Note that when you close the Logistic Regression Window, the last observed model will remain your active model.
Although modelers are understandably interested in the best-of-run model, it’s great fun to get a glimpse of how evolution works by being able to see how intermediate models behave and how their performance becomes better and better with time. But this process is also important to develop a good intuition and learn some tips that might prove useful to better guide the evolutionary process.
-
Choose to browse all the available tables and charts in synchrony or asynchronously by
ticking the Synchronize Tables & Charts check-box.
By default, the Tables & Charts of the Logistic Regression Framework of GeneXproTools move in synchrony. But you can have them move independently so that you can look at any of the tables while analyzing a certain chart
and vice versa. Another advantage of having
Tables & Charts move independently is that it’s much quicker to move from chart to chart when using very large datasets.
-
Access the Logistic Regression Help File.
By clicking the Help button a new window with the Logistic Regression Help File
pops up so that you can access any section in the Logistic Regression
Documentation to answer any question you might have regarding any of the
analyses of the Logistic Regression Platform. For help on general evolutionary
model design, you'll have to consult the main GeneXproTools Help File.
In order to make predictions or rank new cases within GeneXproTools, you need to:
-
Choose and evaluate all the key parameters (number of buckets plus the slope and intercept for the logistic
regression model) in the Logistic Regression Window.
You start by selecting the model you are interested in (usually the best-of-run model) and then enter the Logistic Regression Window by choosing one
of the entry points listed under the Logistic Regression menu (Quantile
Analysis, ROC Curve, Cutoff Points,
Gains Chart, Lift Chart,
Log Odds, Logistic Fit, and
Confusion Matrix). GeneXproTools executes all the required calculations automatically, although you may want to fine-tune the number of quantiles to achieve the best possible performance
with the
Logistic Regression Model.
-
Close the Logistic Regression Window and then go to the Scoring Panel.
To score a database or Excel file, on the Scoring menu select Databases or go to the Scoring Panel and select the Databases Tab. For scoring data kept in text files, on the Scoring
menu select Text Files or go to the Scoring Panel and select the Text Files Tab.
-
In the Scoring Panel tick the Logistic Regression check-box and then enter the path for both the source data and output file.
The Scoring Engine of GeneXproTools uses the Javascript code of your model to perform the computations as it already contains the code for the UDFs and DDFs.
-
Then press the Start button to begin the scoring process.
GeneXproTools saves the scoring results to a file which contains the predictions of your model for all the new cases in the source file. For small datasets (up to 16 variables and 2000 cases) GeneXproTools also shows the scoring results in the table of the Scoring Panel. Besides the model output, GeneXproTools also computes the probabilities of being either a “1” or “0” and also infers the most likely class for all the cases.
|