## Subtitles section Play video

• In our last presentation, we considered four ways in which abstract concepts can be turned into concrete measures. We considered, in

• other words, thehow topart of measurement. In this presentation, we shift our attention to judging how well or poorly we did when

• assigning numbers to phenomena according to rules. We will specifically consider two ways of gauging the accuracy and error of our

• measuresreliability and validity. Reliability is the extent to which a measurement

• technique, applied repeatedly to the same unit of analysis, yields the same result. Here are a variety of ways of assessing reliability that are

• discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this

• presentation. The first is the test-retest method. The test-

• retest this entails measuring a phenomenon at two points in time for same set of units, using same technique at both points. Reliability

• increases as the difference between the values generated by the two measurements decreases. If I step into a scale, measure my

• weight, step off the scale, and then step back on the scale, if I get the same weight, then the scale is a reliable way of measuring weight.

• If I want to know the party identification of each student in the course, I might ask each one of you to place yourself on a standard seven-point

• scalestrong Republican, weak Republican, independent Republican, independent Independent, independent Democrat, weak

• Democrat, strong Democrat. If I want to assess the reliability of the seven-point scale as a way of measuring party identification, I could ask you

• all the same question again at a later point in time. To the extent that your answers in the second round match the answers provided in

• the first round, the seven-point scale would be said to be a reliable way of measuring party identification.

• The test-retest method, it is important to note, can overestimate reliability. For example, you may remember the first answer you gave to the

• party identification question, and answer the same way during the second round just to appear consistent, even if your actual party

• identification changed in the interim period. The test-retest method may also underestimate reliability, as when your party identification

• changes between the first and second rounds and your answers at both stages reflect your actual party identification at that moment.

• A second way of assessing reliability is the split-halves method. In the split-halves method, the phenomenon is measured at single point in

• time. This eliminates the possibility that the actual answer changes between rounds of measurement. To utilize the split-halves

• method, the concept must be operationalized via a multi-item measure. For example, a person’s political ideology might be measured

• through a series of four questions asking their attitudes toward universal health insurance, the minimum wage, tax cuts, and defense

• spending. If individuals who respond in a liberal manner to the universal health insurance and minimum wage questions also respond

• liberally to the text cut and defense spending questions, then the four questions together can be said to constitute a reliable multi-item

• measure of political ideology. Note that the split- halves method hinges on the assumption that both components tap the underlying concept

• equally well -if this assumption is not met, then the whole method falls apart.

• The second way of gauging the accuracy and error of our measures is validity. Validity is the extent to which a measurement technique

• actually measures the phenomenon it is supposed to measure. Once again, there are a variety of ways of assessing validity that are

• discussed in the Johnson and Reynolds textbook. Rather than lay out all of these methods, we will focus on two in this

• presentation. The first method is face validity. Face validity is

• the extent to which a measurement technique seems to measure the phenomenon that it is supposed to, “on the face of it.” Note that face

• validity entails a subjective assessment on part of the researcher. It is not an empirical assessment. This subjective assessment is

• considered to be a stronger claim to validity if a consensus exists among experts in the field. For example, there is widespread agreement

• among pollsters and political scientists that the seven-point scale is a valid way of measuring party identification.

• A second way of assessing validity is construct validity. Construct validity is the extent to which a measure is related to measures of other

• variables in ways that are hypothesized. Let’s say that we hypothesize that human rights abuses occur with less frequency as the level of

• democracy in countries increases. Our measures of human rights abuses and level of democracy would be said to have construct

• validity if indeed it turned out that countries with higher levels of democracy did experience fewer human rights abuses. Note that construct

• validity assumes that we have correctly specified the nature of the association between the variables in question. In other words, one

• reason why our measures of democracy and human rights might not be associated with one another is that democracies are not better than

• other types of political systems in protecting human rights. Our measures of democracy and human rights might be accurate, but our

• incorrect hypothesis leads us to conclude that the measures do not possess construct validity.

• Before wrapping up this presentation, I want to say one more thing about reliability and validity, specifically regarding the relationship between

• these two standards for assessing the accuracy and error of measures. Note that reliable measures can be invalid. For example, I can

• step onto a scale multiple times and always get the same weight. But what if this weight is incorrect? Such a scale would be reliable, but

• not valid. Also note that valid measures are necessarily reliable. That is, if a measure accurately taps the underlying phenomenon,

• then it must by definition do so across repeat measurements.

In our last presentation, we considered four ways in which abstract concepts can be turned into concrete measures. We considered, in

Subtitles and vocabulary

Operation of videos Adjust the video here to display the subtitles

B1 INT US validity reliability identification method scale assessing

Video vocabulary