Subtitles section Play video
-
In our last presentation, we considered four ways in which abstract concepts can be turned into concrete measures. We considered, in
-
other words, the “how to” part of measurement. In this presentation, we shift our attention to judging how well or poorly we did when
-
assigning numbers to phenomena according to rules. We will specifically consider two ways of gauging the accuracy and error of our
-
measures—reliability and validity. Reliability is the extent to which a measurement
-
technique, applied repeatedly to the same unit of analysis, yields the same result. Here are a variety of ways of assessing reliability that are
-
discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this
-
presentation. The first is the test-retest method. The test-
-
retest this entails measuring a phenomenon at two points in time for same set of units, using same technique at both points. Reliability
-
increases as the difference between the values generated by the two measurements decreases. If I step into a scale, measure my
-
weight, step off the scale, and then step back on the scale, if I get the same weight, then the scale is a reliable way of measuring weight.
-
If I want to know the party identification of each student in the course, I might ask each one of you to place yourself on a standard seven-point
-
scale—strong Republican, weak Republican, independent Republican, independent Independent, independent Democrat, weak
-
Democrat, strong Democrat. If I want to assess the reliability of the seven-point scale as a way of measuring party identification, I could ask you
-
all the same question again at a later point in time. To the extent that your answers in the second round match the answers provided in
-
the first round, the seven-point scale would be said to be a reliable way of measuring party identification.
-
The test-retest method, it is important to note, can overestimate reliability. For example, you may remember the first answer you gave to the
-
party identification question, and answer the same way during the second round just to appear consistent, even if your actual party
-
identification changed in the interim period. The test-retest method may also underestimate reliability, as when your party identification
-
changes between the first and second rounds and your answers at both stages reflect your actual party identification at that moment.
-
A second way of assessing reliability is the split-halves method. In the split-halves method, the phenomenon is measured at single point in
-
time. This eliminates the possibility that the actual answer changes between rounds of measurement. To utilize the split-halves
-
method, the concept must be operationalized via a multi-item measure. For example, a person’s political ideology might be measured
-
through a series of four questions asking their attitudes toward universal health insurance, the minimum wage, tax cuts, and defense
-
spending. If individuals who respond in a liberal manner to the universal health insurance and minimum wage questions also respond
-
liberally to the text cut and defense spending questions, then the four questions together can be said to constitute a reliable multi-item
-
measure of political ideology. Note that the split- halves method hinges on the assumption that both components tap the underlying concept
-
equally well -if this assumption is not met, then the whole method falls apart.
-
The second way of gauging the accuracy and error of our measures is validity. Validity is the extent to which a measurement technique
-
actually measures the phenomenon it is supposed to measure. Once again, there are a variety of ways of assessing validity that are
-
discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this
-
presentation. The first method is face validity. Face validity is
-
the extent to which a measurement technique seems to measure the phenomenon that it is supposed to, “on the face of it.” Note that face
-
validity entails a subjective assessment on part of the researcher. It is not an empirical assessment. This subjective assessment is
-
considered to be a stronger claim to validity if a consensus exists among experts in the field. For example, there is widespread agreement
-
among pollsters and political scientists that the seven-point scale is a valid way of measuring party identification.
-
A second way of assessing validity is construct validity. Construct validity is the extent to which a measure is related to measures of other
-
variables in ways that are hypothesized. Let’s say that we hypothesize that human rights abuses occur with less frequency as the level of
-
democracy in countries increases. Our measures of human rights abuses and level of democracy would be said to have construct
-
validity if indeed it turned out that countries with higher levels of democracy did experience fewer human rights abuses. Note that construct
-
validity assumes that we have correctly specified the nature of the association between the variables in question. In other words, one
-
reason why our measures of democracy and human rights might not be associated with one another is that democracies are not better than
-
other types of political systems in protecting human rights. Our measures of democracy and human rights might be accurate, but our
-
incorrect hypothesis leads us to conclude that the measures do not possess construct validity.
-
Before wrapping up this presentation, I want to say one more thing about reliability and validity, specifically regarding the relationship between
-
these two standards for assessing the accuracy and error of measures. Note that reliable measures can be invalid. For example, I can
-
step onto a scale multiple times and always get the same weight. But what if this weight is incorrect? Such a scale would be reliable, but
-
not valid. Also note that valid measures are necessarily reliable. That is, if a measure accurately taps the underlying phenomenon,
-
then it must by definition do so across repeat measurements.