• In our last presentation, we considered four ways in which abstract concepts can be turned into concrete measures. We considered, in

• other words, thehow topart of measurement. In this presentation, we shift our attention to judging how well or poorly we did when

• assigning numbers to phenomena according to rules. We will specifically consider two ways of gauging the accuracy and error of our

• measuresreliability and validity. Reliability is the extent to which a measurement

• technique, applied repeatedly to the same unit of analysis, yields the same result. Here are a variety of ways of assessing reliability that are

• discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this

• presentation. The first is the test-retest method. The test-

• retest this entails measuring a phenomenon at two points in time for same set of units, using same technique at both points. Reliability

• increases as the difference between the values generated by the two measurements decreases. If I step into a scale, measure my

• weight, step off the scale, and then step back on the scale, if I get the same weight, then the scale is a reliable way of measuring weight.

• If I want to know the party identification of each student in the course, I might ask each one of you to place yourself on a standard seven-point

• scalestrong Republican, weak Republican, independent Republican, independent Independent, independent Democrat, weak

• Democrat, strong Democrat. If I want to assess the reliability of the seven-point scale as a way of measuring party identification, I could ask you

• all the same question again at a later point in time. To the extent that your answers in the second round match the answers provided in

• the first round, the seven-point scale would be said to be a reliable way of measuring party identification.

• The test-retest method, it is important to note, can overestimate reliability. For example, you may remember the first answer you gave to the

• party identification question, and answer the same way during the second round just to appear consistent, even if your actual party

• identification changed in the interim period. The test-retest method may also underestimate reliability, as when your party identification

• changes between the first and second rounds and your answers at both stages reflect your actual party identification at that moment.

• A second way of assessing reliability is the split-halves method. In the split-halves method, the phenomenon is measured at single point in

• time. This eliminates the possibility that the actual answer changes between rounds of measurement. To utilize the split-halves

• method, the concept must be operationalized via a multi-item measure. For example, a person’s political ideology might be measured

• through a series of four questions asking their attitudes toward universal health insurance, the minimum wage, tax cuts, and defense

• spending. If individuals who respond in a liberal manner to the universal health insurance and minimum wage questions also respond

• liberally to the text cut and defense spending questions, then the four questions together can be said to constitute a reliable multi-item

• measure of political ideology. Note that the split- halves method hinges on the assumption that both components tap the underlying concept

• equally well -if this assumption is not met, then the whole method falls apart.

• The second way of gauging the accuracy and error of our measures is validity. Validity is the extent to which a measurement technique

• actually measures the phenomenon it is supposed to measure. Once again, there are a variety of ways of assessing validity that are

• discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this

• presentation. The first method is face validity. Face validity is

• the extent to which a measurement technique seems to measure the phenomenon that it is supposed to, “on the face of it.” Note that face

• validity entails a subjective assessment on part of the researcher. It is not an empirical assessment. This subjective assessment is

• considered to be a stronger claim to validity if a consensus exists among experts in the field. For example, there is widespread agreement

• among pollsters and political scientists that the seven-point scale is a valid way of measuring party identification.

• A second way of assessing validity is construct validity. Construct validity is the extent to which a measure is related to measures of other

• variables in ways that are hypothesized. Let’s say that we hypothesize that human rights abuses occur with less frequency as the level of

• democracy in countries increases. Our measures of human rights abuses and level of democracy would be said to have construct

• validity if indeed it turned out that countries with higher levels of democracy did experience fewer human rights abuses. Note that construct

• validity assumes that we have correctly specified the nature of the association between the variables in question. In other words, one

• reason why our measures of democracy and human rights might not be associated with one another is that democracies are not better than

• other types of political systems in protecting human rights. Our measures of democracy and human rights might be accurate, but our

• incorrect hypothesis leads us to conclude that the measures do not possess construct validity.

• Before wrapping up this presentation, I want to say one more thing about reliability and validity, specifically regarding the relationship between

• these two standards for assessing the accuracy and error of measures. Note that reliable measures can be invalid. For example, I can

• step onto a scale multiple times and always get the same weight. But what if this weight is incorrect? Such a scale would be reliable, but

• not valid. Also note that valid measures are necessarily reliable. That is, if a measure accurately taps the underlying phenomenon,

• then it must by definition do so across repeat measurements.

