Home About Us Contact Join Associates
Products Downloads Purchasing Editions
 

Logistic Regression Framework

Getting Started

 
 
 

Getting Started

In order to access the Logistic Regression Framework of GeneXproTools 4.0 you need to:
  1. Create a statistical model that explains a binary dependent variable, using either the Function Finding / Logistic Regression Framework or the Classification Framework of GeneXproTools.
    In the Function Finding / Logistic Regression Framework all datasets with binary dependent variables use by default the Correlation Coefficient Fitness Function as this kind of function gives the best results with the standard 0/1 class encoding.
    In the Classification Framework you have access to a wide variety of fitness functions that despite using pre-established rounding thresholds, offer interesting alternatives for exploring the solution space.
  2. In the Function Finding / Logistic Regression Framework, click the Logistic Regression menu and then choose one of the available analytics tools: Quantile Analysis, ROC Curve, Cutoff Points, Gains Chart, Lift Chart, Log Odds, Logistic Fit, or Confusion Matrix.
    This activates the number-crunching process of the Logistic Regression Analytics Platform that starts with the construction of the Quantile Table and finishes with the creation of the Logistic Regression Model and the inference of very clear Confusion Matrixes. After the processing has stopped, you just have to navigate from tab to tab to evaluate the quality and performance of your modeling system.

 

In the Logistic Regression Window of GeneXproTools you can:

  1. Analyze and create Quantile Tables and Charts; perform Quantile Regression; analyze the ROC Curve of your models; visualize the Optimal Cutoff Point for your test scores; study the Gains and Lift Charts of your models; access the Log Odds Chart used to evaluate the slope and intercept of the Logistic Regression Model; visualize how well your logistic model fits the data in the Logistic Fit Chart; and compare and analyze Logistic and ROC Confusion Matrixes using both 2 x 2 Contingency Tables and quantile-based Distribution Charts.
  2. Copy all the Tables and Charts to the clipboard.
    All the Tables and Charts generated within the Logistic Regression Window can be copied to the clipboard through the mouse right-click menu. Tables can be copied in their entirety or you can copy just selected rows or individual columns.
  3. Print all the Charts.
    All the Charts in the Logistic Regression Window (Quantile Charts, ROC Curve, Cutoff Points, Gains Chart, Lift Chart, Log Odds, Logistic Fit, and Confusion Matrixes) can be printed through the mouse right-click menu.
  4. Copy and print the Statistics Report.
    The Stats Report summarizes all the relevant parameters and statistics derived from all the analyses (Quantile Regression, ROC Curve, Cutoff Points, Gains Chart, Lift Chart, Log Odds, Logistic Fit, and Confusion Matrixes) performed for the active model and active dataset. It also contains relevant information about the training and testing data, such as class distribution and number of records. And finally, the Stats Report also summarizes some basic information about the model, such as its fitness and R-square and if any calculation errors occurred during the computation of the model scores. Within the Logistic Regression Window, all such calculation errors (they can happen when processing the unseen data of the testing set and also of the "training dataset" if it was replaced by another one or if the model itself was modified by the user) return zero so that the calculations can resume. Note, however, that GeneXproTools flags these errors clearly, highlighting them in light red in all the tables where the model outputs are shown (ROC Table, Cutoff Points Table, Logistic Fit Table, and Confusion Matrix Table).
  5. Choose a different number of buckets for your Quantile Table and then see immediately how it affects the Logistic Regression Model through the Logistic Fit Chart.
    The number of buckets is an essential parameter for most of the analyses performed in the Logistic Regression Window (Quantile Regression, Gains Chart, Lift Chart, Log Odds and Logistic Regression, Logistic Fit, and Logistic Confusion Matrix) and therefore it is saved for each model and for each dataset.
    By using the ROC-derived accuracy as your golden standard (it is quantile-independent and remains unchanged for a particular model), you can fine-tune the number of buckets to get the most of your models. Note, however, that it is not uncommon to get better performance on the Logistic Confusion Matrix, which of course is indicative of a very good Logistic Fit.
  6. Access the testing dataset so that you can not only test further the predictive accuracy of your model but also build logistic regression models with it.
    The testing dataset was never brought into contact with the model during the training process and therefore constitutes an excellent blind test for checking the predictive accuracy of your model on unseen data. You access the testing dataset by choosing Testing Set in the Dataset combo box. GeneXproTools then creates a specific Quantile Table for the testing dataset and also performs the complete logistic regression analysis for this dataset. Note, however, that if you want to use this logistic regression model (that is, the slope and intercept evaluated for the testing set) for scoring new cases using the Scoring Engine of GeneXproTools, you’ll have to replace the original training dataset with this one and then recalculate the logistic parameters (the slope and intercept of the Log Odds Chart) with this new operational dataset.
  7. Analyze all the intermediate models created in a run by selecting any model in the History combo box.
    Each model in the Run History is identified by its ID and training R-square for easy access in the History combo box (although controversial, the small R-square values typical of risk assessment and response models are useful indicators of the quality of a model and are in fact widely used by real-world modelers; for instance, R-square values around 0.23 are considered excellent for typical risk assessment models and indicative of a good fit). Note that when you close the Logistic Regression Window, the last observed model will remain your active model.
    Although modelers are understandably interested in the best-of-run model, it’s great fun to get a glimpse of how evolution works by being able to see how intermediate models behave and how their performance becomes better and better with time. But this process is also important to develop a good intuition and learn some tips that might prove useful to better guide the evolutionary process.
  8. Choose to browse all the available tables and charts in synchrony or asynchronously by ticking the Synchronize Tables & Charts check-box.
    By default, the Tables & Charts of the Logistic Regression Framework of GeneXproTools move in synchrony. But you can have them move independently so that you can look at any of the tables while analyzing a certain chart and vice versa. Another advantage of having Tables & Charts move independently is that it’s much quicker to move from chart to chart when using very large datasets.
  9. Access the Logistic Regression Help File.
    By clicking the Help button a new window with the Logistic Regression Help File pops up so that you can access any section in the Logistic Regression Documentation to answer any question you might have regarding any of the analyses of the Logistic Regression Platform. For help on general evolutionary model design, you'll have to consult the main GeneXproTools Help File.


In order to make predictions or rank new cases within GeneXproTools, you need to:

  1. Choose and evaluate all the key parameters (number of buckets plus the slope and intercept for the logistic regression model) in the Logistic Regression Window.
    You start by selecting the model you are interested in (usually the best-of-run model) and then enter the Logistic Regression Window by choosing one of the entry points listed under the Logistic Regression menu (Quantile Analysis, ROC Curve, Cutoff Points, Gains Chart, Lift Chart, Log Odds, Logistic Fit, and Confusion Matrix). GeneXproTools executes all the required calculations automatically, although you may want to fine-tune the number of quantiles to achieve the best possible performance with the Logistic Regression Model.
  2. Close the Logistic Regression Window and then go to the Scoring Panel.
    To score a database or Excel file, on the Scoring menu select Databases or go to the Scoring Panel and select the Databases Tab. For scoring data kept in text files, on the Scoring menu select Text Files or go to the Scoring Panel and select the Text Files Tab.
  3. In the Scoring Panel tick the Logistic Regression check-box and then enter the path for both the source data and output file.
    The Scoring Engine of GeneXproTools uses the Javascript code of your model to perform the computations as it already contains the code for the UDFs and DDFs.
  4. Then press the Start button to begin the scoring process.
    GeneXproTools saves the scoring results to a file which contains the predictions of your model for all the new cases in the source file. For small datasets (up to 16 variables and 2000 cases) GeneXproTools also shows the scoring results in the table of the Scoring Panel. Besides the model output, GeneXproTools also computes the probabilities of being either a “1” or “0” and also infers the most likely class for all the cases.


 
Logistic Regression Framework



 
GeneXproTools 4.0


   


GeneXproServer 1.0



Tutorials


Quick Tour Videos





Gene Expression Programming


Subscribe to the GEP-list
Enter 2 + 32 =

 
 
 
 
     
  Join Associates | Terms of Use | Privacy Statement

 
 

Copyright (c) 2000-2011 Gepsoft Ltd. All rights reserved.