Placeholder Image

Subtitles section Play video

  • This is an introduction to modeling in event history analysis.

  • The 1st part deals with the famous Cox model. The brilliant idea of David R Cox in 1972

  • was to combine two types of analysis: regression and life tables. The Cox model

  • can be seen as the control of the effect of the explanatory variables in the

  • survival analysis through regression, or as the introduction of the temporal

  • dimension in the regression. The advantage of one technique can make it

  • possible to fill the gaps of the other. In the case of the logit model, odds of

  • belonging to a category are computed at a given point in the life of the

  • individual regardless of when the status changed. The duration, the elapsed time is

  • therefore an important dimension that is missing in the logit model.

  • In particular, the censoring by the date of the survey or emigration is not taken

  • into account. A good part of the sample whose observations are censored is not

  • taken into account in the analysis if we do not explicitly consider time.

  • On the other hand, if we simply make the description of the event by the survival

  • table technique, it would be difficult to control the influence of explanatory variables.

  • Splitting the sample into different categories according to

  • generations ,or rural origin, etc., leads to small sub-samples with insufficient

  • number for analysis, especially to measure the combined influence of

  • several explanatory factors. To solve both the problem of duration and

  • that of explanatory factors, David Cox's idea was to combine survival analysis with

  • regression analysis. First, Cox proposed a regression not on the characteristics

  • acquired by the individual at the end of his life or at the time of the observation

  • but on the characteristic aquired each year of life. In a way, each year lived

  • by each member of the sample constitutes an observation.

  • The reference category of the regression is not unique for the whole sample

  • but it is specific to each observation period. This series of probabilities makes it possible to

  • establish a reference survival curve, also called a baseline survival function.

  • This is the nonparametric part of the model. Then the Cox regression model

  • calculates the effect of the explanatory variables on the annual risk of

  • experiencing the event. Each variable is associated with a regression coefficient

  • that measures the average effect of this variable on the annual risk.

  • This is the parametric part of the model. In this model h0(t) is the hazard function

  • for the reference category, Bi is a series of coefficients associated with

  • indicator variables Xij. The model therefore has a nonparametric component

  • the baseline hazard function formed from the series of hazards h0(t),

  • and a parametric component, the vector of independent variables.

  • Because of these 2 components, the model is also called the semi-parametric model.

  • in fact, for statistical computations reasons, it is the logarithms of the hazards

  • and not the hazards themselves that are modeled in an additive model.

  • The model is part of the family of log-linear models. But at the moment of analysis, it is usually

  • the exponential of the coefficients that are interpreted as multiplicative effects.

  • The coefficients of the regression do not have an easy

  • and immediate interpretation.

  • From the causal relation point of view,

  • the only explanatory element in this minimal model is the entry of the

  • individual into the population subjected to the risk with such or such characteristics.

  • The relation of the diagram reads: entry into observation O

  • at time (t - 1) with X being a possible cause of the occurrence of

  • event E in the interval (t - 1, t). This representation follows the

  • principle of the anteriority of the cause X on the effect E.

  • The probability of occurrence of the event varies depending on whether the individual has

  • characteristics X or not. It is assumed that the observation time interval is

  • small enough that the risk is constant during the interval. Here again the

  • smaller the interval the weakest this assumption. The calculation is repeated

  • as many times as they are time intervals until the end of observation OBE.

  • Although X is not an event, we can consider it as such on the interval (t-1,t).

  • Indeed, if X is defined at the beginning of each time interval, and if

  • the calculated risk is assumed to be constant over the interval, we approach

  • the causal relationship where O, the observation entry at the beginning of

  • the interval is taken as an explanatory event, since one must be present at time (t-1)

  • to experience the risk in the interval (t - 1, t). We are very close

  • to the basic causal relationship but not quite. The effect X is not calculated

  • separately over each time interval but averaged over all time interval.

  • Each variable X is therefore not associated with a particular unit of time,

  • which distinguishes it from a cause precisely located in time and event.

  • One says that the effect of the variable is proportional to the annual probability

  • of knowing the event. This is why the Cox model

  • is called a proportional hazard model. Let's take a very simple example

  • with a single explanatory variable, for example sex.

  • The variable X is called X1 and

  • the corresponding coefficient B1. This model is as follows:

  • let's see 2 possible cases, either the individual is exposed or is is not.

  • For example either he is a man or is not. If the individual is exposed then X1 is equal to 1

  • and the model is written h0(t) * exp(B1). If the individual is not exposed,

  • then X1 is 0 and the expression is reduced to h0(t).

  • We can see that the exponential of the B1 does not depend on "t" and therefore applies

  • multiplicatively to all the values of h0(t). It is therefore assumed that the

  • explanatory variables apply to the entire hazard function whatever "t".

  • This assumption of proportionality is quite strong and it is necessary to test it

  • for each variable of the model. If it is not verified, the model becomes

  • inconsistent and it is then necessary to consider stratifying the sample

  • according to the incriminated variable. Graphical and statistical methods make

  • it possible to test this assumption which we'll see in the following screencast.

  • Thank you for your attention... and work well!

This is an introduction to modeling in event history analysis.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it