Placeholder Image

Subtitles section Play video

  • In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger.

  • Their study involved real people, truthfully reported data, and commonplace statistical analyses.

  • So how did they do it?

  • The answer lies in a statistical method scientists often use to try to figure out whether their results mean something or if they're random noise.

  • In fact, the whole point of the music study was to point out ways this method can be misused.

  • A famous thought experiment explains the method: there are eight cups of tea, four with the milk added first, and four with the tea added first. A participant must determine which are which according to taste.

  • There are 70 different ways the cups can be sorted into two groups of four, and only one is correct. So, can she taste the difference? That's our research question.

  • To analyze her choices, we define what's called a null hypothesis: that she can't distinguish the teas.

  • If she can't distinguish the teas, she'll still get the right answer 1 in 70 times by chance. 1 in 70 is roughly .014. That single number is called a p-value.

  • In many fields, a p-value of .05 or below is considered statistically significant, meaning there's enough evidence to reject the null hypothesis.

  • Based on a p-value of .014, they'd rule out the null hypothesis that she can't distinguish the teas.

  • Though p-values are commonly used by both researchers and journals to evaluate scientific results, they're really confusing, even for many scientists.

  • That's partly because all a p-value actually tells us is the probability of getting a certain result, assuming the null hypothesis is true.

  • So if she correctly sorts the teas, the p-value is the probability of her doing so assuming she can't tell the difference.

  • But the reverse isn't true: the p-value doesn't tell us the probability that she can taste the difference, which is what we're trying to find out.

  • So if a p-value doesn't answer the research question, why does the scientific community use it?

  • Well, because even though a p-value doesn't directly state the probability that the results are due to random chance, it usually gives a pretty reliable indication.

  • At least, it does when used correctly. And that's where many researchers, and even whole fields, have run into trouble.

  • Most real studies are more complex than the tea experiment. Scientists can test their research question in multiple ways, and some of these tests might produce a statistically significant result, while others don't.

  • It might seem like a good idea to test every possibility. But it's not, because with each additional test, the chance of a false positive increases.

  • Searching for a low p-value, and then presenting only that analysis, is often called p-hacking.

  • It's like throwing darts until you hit a bullseye and then saying you only threw the dart that hit the bull's eye. This is exactly what the music researchers did.

  • They played three groups of participants each a different song and collected lots of information about them.

  • The analysis they published included only two out of the three groups.

  • Of all the information they collected, their analysis only used participants' fathers' agetocontrol for variation in baseline age across participants.”

  • They also paused their experiment after every ten participants, and continued if the p-value was above .05, but stopped when it dipped below .05.

  • They found that participants who heard one song were 1.5 years younger than those who heard the other song, with a p-value of .04.

  • Usually it's much tougher to spot p-hacking, because we don't know the results are impossible: the whole point of doing experiments is to learn something new.

  • Fortunately, there's a simple way to make p-values more reliable: pre-registering a detailed plan for the experiment and analysis beforehand that others can check, so researchers can't keep trying different analyses until they find a significant result.

  • And, in the true spirit of scientific inquiry, there's even a new field that's basically science doing science on itself: studying scientific practices in order to improve them.

  • This new field has emerged in response to a crisis in science, and p-hacking is just one part of that crisis. So, what's going on? And can we fix it? Learn more with this video.

In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger.

Subtitles and vocabulary

Operation of videos Adjust the video here to display the subtitles

B1 US TED-Ed null hypothesis probability hypothesis method scientific

The method that can "prove" almost anything - James A. Smith

  • 2070 127
    Minjane posted on 2021/09/18
Video vocabulary