Log Odds and Logistic Regression
The Log Odds Chart is vital to the Logistic Regression Model.
It’s with its aid that the slope and intercept of the
Logistic Regression Model
are calculated. And the procedure is quite simple. As mentioned
previously, it’s quantile-based and, in fact, just a few additional
calculations are required to draw the regression line.
So, based on the Quantile Table, one first evaluates the
ratio for all the buckets (you can check all the values on the
Odds Table under Odds Ratio). Then the natural logarithm of this
ratio is evaluated, resulting in what goes by the name of Log Odds
(the log odds values are also shown on the Log Odds Table under Log
Note, however, that there might be a problem in the evaluation of
the log odds if we have buckets with zero positive cases. But this
problem can be easily fixed. Although rare for large datasets, it
can sometimes happen that some of the buckets end up with zero
positive cases in them. And this obviously results in a calculation
error in the evaluation of the natural logarithm of the odds ratio.
GeneXproTools handles this with a slight modification to the Laplace
estimator to get what is called a complete Bayesian formulation with
prior probabilities. In essence, this means that if the quantile
table we are using has buckets with no positive cases in them, then
we do the equivalent of priming all the buckets with a very small
amount of positive cases.
The formula GeneXproTools uses in the evaluation of the Positives
Rate values pi for all the quantiles is the following:
where μ is the Laplace estimator that in GeneXproTools has
value of 0.01; Qi and Ti are, respectively, the number of
Cases and the number of Total Cases in bucket i; and P is the
Average Positive Rate of the whole dataset.
So, in the Log Odds Chart, the Log Odds values (adjusted or not with the Laplace strategy) are plotted on the Y-axis against the Model
Output in the X-axis. And as for Quantile Regression, here there are also special rules to follow, depending on whether the
predominant class is “1” or “0” and whether the model is normal or inverted. To be precise, the Log Odds are plotted against the
Model Upper Boundaries if the predominant class is “1” and the model is normal, or the
predominant class is “0” and the model is inverted; or against the
Lower Boundaries if the predominant class is “1” and the model is inverted, or the
predominant class is “0” and the model is normal.
Then a weighted linear regression is performed and the slope and
intercept of the regression line are evaluated. And these are the parameters that will be used in the
Logistic Regression Equation to evaluate the probabilities.
The regression line can be written as:
where p is the probability of being “1”; x is the Model Output; and
a and b are, respectively, the slope and intercept of the regression line. GeneXproTools draws the regression line and shows both the equation and the R-square in the
Log Odds Chart.
And now solving the logistic equation above for p, gives:
which is the formula for evaluating the probabilities with the
Logistic Regression Model. The probabilities estimated for each are
shown in the Logistic Fit Table.
Besides the slope and intercept of the Logistic Regression Model, another useful and popular parameter is the exponent of the slope, usually represented by
Exp(slope). It describes the proportionate rate at which the predicted odds ratio changes with each successive unit of
x. GeneXproTools also shows this parameter both in the Log Odds Chart and in the
Log Odds Stats Report.