Placeholder Image

Subtitles section Play video

  • In statistics, logistic regression, or logit regression, is a type of probabilistic statistical

  • classification model. It is also used to predict a binary response from a binary predictor,

  • used for predicting the outcome of a categorical dependent variable based on one or more predictor

  • variables. That is, it is used in estimating the parameters of a qualitative response model.

  • The probabilities describing the possible outcomes of a single trial are modeled, as

  • a function of the explanatory variables, using a logistic function. Frequently "logistic

  • regression" is used to refer specifically to the problem in which the dependent variable

  • is binary—that is, the number of available categories is two—while problems with more

  • than two categories are referred to as multinomial logistic regression or, if the multiple categories

  • are ordered, as ordered logistic regression. Logistic regression measures the relationship

  • between a categorical dependent variable and one or more independent variables, which are

  • usually continuous, by using probability scores as the predicted values of the dependent variable.

  • As such it treats the same set of problems as does probit regression using similar techniques.

  • Fields and examples of applications Logistic regression was put forth in the 1940s

  • as an alternative to Fisher's 1936 classification method, linear discriminant analysis. It is

  • used extensively in numerous disciplines, including the medical and social science fields.

  • For example, the Trauma and Injury Severity Score, which is widely used to predict mortality

  • in injured patients, was originally developed by Boyd et al. using logistic regression.

  • Logistic regression might be used to predict whether a patient has a given disease, based

  • on observed characteristics of the patient. Another example might be to predict whether

  • an American voter will vote Democratic or Republican, based on age, income, gender,

  • race, state of residence, votes in previous elections, etc. The technique can also be

  • used in engineering, especially for predicting the probability of failure of a given process,

  • system or product. It is also used in marketing applications such as prediction of a customer's

  • propensity to purchase a product or cease a subscription, etc. In economics it can be

  • used to predict the likelihood of a person's choosing to be in the labor force, and a business

  • application would be to predict the likehood of a homeowner defaulting on a mortgage. Conditional

  • random fields, an extension of logistic regression to sequential data, are used in natural language

  • processing. Basics

  • Logistic regression can be binomial or multinomial. Binomial or binary logistic regression deals

  • with situations in which the observed outcome for a dependent variable can have only two

  • possible types. Multinomial logistic regression deals with situations where the outcome can

  • have three or more possible types. In binary logistic regression, the outcome is usually

  • coded as "0" or "1", as this leads to the most straightforward interpretation. If a

  • particular observed outcome for the dependent variable is the noteworthy possible outcome

  • it is usually coded as "1" and the contrary outcome as "0". Logistic regression is used

  • to predict the odds of being a case based on the values of the independent variables.

  • The odds are defined as the probability that a particular outcome is a case divided by

  • the probability that it is a noncase. Like other forms of regression analysis, logistic

  • regression makes use of one or more predictor variables that may be either continuous or

  • categorical data. Unlike ordinary linear regression, however, logistic regression is used for predicting

  • binary outcomes of the dependent variable rather than continuous outcomes. Given this

  • difference, it is necessary that logistic regression take the natural logarithm of the

  • odds of the dependent variable being a case to create a continuous criterion as a transformed

  • version of the dependent variable. Thus the logit transformation is referred to as the

  • link function in logistic regression—although the dependent variable in logistic regression

  • is binomial, the logit is the continuous criterion upon which linear regression is conducted.

  • The logit of success is then fit to the predictors using linear regression analysis. The predicted

  • value of the logit is converted back into predicted odds via the inverse of the natural

  • logarithm, namely the exponential function. Therefore, although the observed dependent

  • variable in logistic regression is a zero-or-one variable, the logistic regression estimates

  • the odds, as a continuous variable, that the dependent variable is a success. In some applications

  • the odds are all that is needed. In others, a specific yes-or-no prediction is needed

  • for whether the dependent variable is or is not a case; this categorical prediction can

  • be based on the computed odds of a success, with predicted odds above some chosen cut-off

  • value being translated into a prediction of a success.

  • Logistic function, odds ratio, and logit

  • An explanation of logistic regression begins with an explanation of the logistic function,

  • which always takes on values between zero and one:

  • and viewing as a linear function of an explanatory variable , the logistic function can be written

  • as:

  • This will be interpreted as the probability of the dependent variable equalling a "success"

  • or "case" rather than a failure or non-case. We also define the inverse of the logistic

  • function, the logit:

  • and equivalently:

  • A graph of the logistic function is shown in Figure 1. The input is the value of and

  • the output is . The logistic function is useful because it can take an input with any value

  • from negative infinity to positive infinity, whereas the output is confined to values between

  • 0 and 1 and hence is interpretable as a probability. In the above equations, refers to the logit

  • function of some given linear combination of the predictors, denotes the natural logarithm,

  • is the probability that the dependent variable equals a case, is the intercept from the linear

  • regression equation, is the regression coefficient multiplied by some value of the predictor,

  • and base denotes the exponential function. The formula for illustrates that the probability

  • of the dependent variable equaling a case is equal to the value of the logistic function

  • of the linear regression expression. This is important in that it shows that the value

  • of the linear regression expression can vary from negative to positive infinity and yet,

  • after transformation, the resulting expression for the probability ranges between 0 and 1.

  • The equation for illustrates that the logit is equivalent to the linear regression expression.

  • Likewise, the next equation illustrates that the odds of the dependent variable equaling

  • a case is equivalent to the exponential function of the linear regression expression. This

  • illustrates how the logit serves as a link function between the probability and the linear

  • regression expression. Given that the logit ranges between negative infinity and positive

  • infinity, it provides an adequate criterion upon which to conduct linear regression and

  • the logit is easily converted back into the odds.

  • Multiple explanatory variables If there are multiple explanatory variables,

  • then the above expression can be revised to Then when this is used in the equation relating

  • the logged odds of a success to the values of the predictors, the linear regression will

  • be a multiple regression with m explanators; the parameters for all j = 0, 1, 2, ..., m

  • are all estimated. Model fitting

  • Estimation Maximum likelihood estimation

  • The regression coefficients are usually estimated using maximum likelihood estimation. Unlike

  • linear regression with normally distributed residuals, it is not possible to find a closed-form

  • expression for the coefficient values that maximizes the likelihood function, so an iterative

  • process must be used instead, for example Newton's method. This process begins with

  • a tentative solution, revises it slightly to see if it can be improved, and repeats

  • this revision until improvement is minute, at which point the process is said to have

  • converged. In some instances the model may not reach

  • convergence. When a model does not converge this indicates that the coefficients are not

  • meaningful because the iterative process was unable to find appropriate solutions. A failure

  • to converge may occur for a number of reasons: having a large proportion of predictors to

  • cases, multicollinearity, sparseness, or complete separation.

  • Having a large proportion of variables to cases results in an overly conservative Wald

  • statistic and can lead to nonconvergence. Multicollinearity refers to unacceptably high

  • correlations between predictors. As multicollinearity increases, coefficients remain unbiased but

  • standard errors increase and the likelihood of model convergence decreases. To detect

  • multicollinearity amongst the predictors, one can conduct a linear regression analysis

  • with the predictors of interest for the sole purpose of examining the tolerance statistic

  • used to assess whether multicollinearity is unacceptably high.

  • Sparseness in the data refers to having a large proportion of empty cells. Zero cell

  • counts are particularly problematic with categorical predictors. With continuous predictors, the

  • model can infer values for the zero cell counts, but this is not the case with categorical

  • predictors. The reason the model will not converge with zero cell counts for categorical

  • predictors is because the natural logarithm of zero is an undefined value, so final solutions

  • to the model cannot be reached. To remedy this problem, researchers may collapse categories

  • in a theoretically meaningful way or may consider adding a constant to all cells.

  • Another numerical problem that may lead to a lack of convergence is complete separation,

  • which refers to the instance in which the predictors perfectly predict the criterion

  • – all cases are accurately classified. In such instances, one should reexamine the data,

  • as there is likely some kind of error. Although not a precise number, as a general

  • rule of thumb, logistic regression models require a minimum of 10 events per explaining

  • variable. Minimum chi-squared estimator for grouped

  • data While individual data will have a dependent

  • variable with a value of zero or one for every observation, with grouped data one observation

  • is on a group of people who all share the same characteristics; in this case the researcher

  • observes the proportion of people in the group for whom the response variable falls into

  • one category or the other. If this proportion is neither zero nor one for any group, the

  • minimum chi-squared estimator involves using weighted least squares to estimate a linear

  • model in which the dependent variable is the logit of the proportion: that is, the log

  • of the ratio of the fraction in one group to the fraction in the other group.

  • Evaluating goodness of fit Goodness of fit in linear regression models

  • is generally measured using the R2. Since this has no direct analog in logistic regression,

  • various methods including the following can be used instead.

  • Deviance and likelihood ratio tests In linear regression analysis, one is concerned

  • with partitioning variance via the sum of squares calculations – variance in the criterion

  • is essentially divided into variance accounted for by the predictors and residual variance.

  • In logistic regression analysis, deviance is used in lieu of sum of squares calculations.

  • Deviance is analogous to the sum of squares calculations in linear regression and is a

  • measure of the lack of fit to the data in a logistic regression model. Deviance is calculated

  • by comparing a given model with the saturated model – a model with a theoretically perfect

  • fit. This computation is called the likelihood-ratio test:

  • In the above equation D represents the deviance and ln represents the natural logarithm. The

  • log of the likelihood ratio will produce a negative value, so the product is multiplied

  • by negative two times its natural logarithm to produce a value with an approximate chi-squared

  • distribution. Smaller values indicate better fit as the fitted model deviates less from

  • the saturated model. When assessed upon a chi-square distribution, nonsignificant chi-square

  • values indicate very little unexplained variance and thus, good model fit. Conversely, a significant

  • chi-square value indicates that a significant amount of the variance is unexplained.

  • Two measures of deviance are particularly important in logistic regression: null deviance

  • and model deviance. The null deviance represents the difference between a model with only the

  • intercept and the saturated model. And, the model deviance represents the difference between

  • a model with at least one predictor and the saturated model. In this respect, the null

  • model provides a baseline upon which to compare predictor models. Given that deviance is a

  • measure of the difference between a given model and the saturated model, smaller values

  • indicate better fit. Therefore, to assess the contribution of a predictor or set of

  • predictors, one can subtract the model deviance from the null deviance and assess the difference

  • on a chi-square distribution with degree of freedom equal to the difference in the number

  • of parameters estimated. Let

  • Then

  • If the model deviance is significantly smaller than the null deviance then one can conclude

  • that the predictor or set of predictors significantly improved model fit. This is analogous to the