Placeholder Image

Subtitles section Play video

  • In 2012, a researcher named Glenn Begley published a commentary in the journal Nature.

  • He said that during his decade as the head of cancer research for Amgen -- an American

  • pharmaceutical company -- he’d tried to reproduce the results of 53 so-calledlandmark

  • cancer studies.

  • But his team wasn’t able to replicate 47 out of those 53 studies. That means that /88

  • percent/ of these really important cancer studies couldn’t be reproduced.

  • Then, in August 2015, a psychologist named Brian Nosek published /another/ paper, this

  • time in the journal Science.

  • Over the course of the previous three years, he said, he’d organized repeats of 100 psychological

  • studies.

  • 97 of the original experiments had reported statistically significant results -- that

  • is, results that were most likely caused by variables in the experiment, and not some

  • coincidence.

  • But when his team tried to reproduce those results, they only got /36/ significant results

  • -- barely a third of what the original studies had found.

  • It seems like every few months, there’s some kind of news about problems with the

  • scientific publishing industry.

  • A lot of people -- both scientists and science enthusiasts -- are concerned.

  • Why does this keep happening?

  • And what can be done to fix the system?

  • [intro]

  • Right now, the scientific publishing industry is going through what’s being called a Replication,

  • or Reproducibility Crisis.

  • Researchers are repeating earlier studies, trying to reproduce the experiments as closely

  • as possible.

  • One group might publish findings that look promising, and other groups might use those

  • results to develop their own experiments. But if the original study was wrong, that’s

  • a whole lot of time and money right down the drain.

  • In theory, these repeat studies should be finding the same results as the original experiments.

  • If a cancer drug worked in one study, and a separate group repeats that study under

  • the same conditions, the cancer drug should still work.

  • But that’s not what happened with Begley’s cancer studies. And researchers in other fields

  • have been having the same problem.

  • Theyre repeating earlier studies, and they aren’t getting the same results.

  • So why are these inaccurate results getting published in the first place?

  • Well, sometimes people really are just making things up, but that’s relatively rare.

  • Usually, it has to do with misleading research tools, the way a study is designed, or the

  • way data are interpreted.

  • Take an example from the world of biomedicine.

  • Researchers who work with proteins will often use antibodies to help them with their research.

  • You might know antibodies as a part of your immune system that helps target outside invaders,

  • but in scientific research, antibodies can be used to target specific proteins.

  • And lately, there’s been a lot of evidence that these antibodies aren’t as reliable

  • as scientists have been led to believe.

  • Companies produce these antibodies so researchers can buy them, and theyll say in their catalog

  • which antibodies go with which proteins.

  • The problem is, those labels aren’t always right. And if researchers don’t check to

  • make sure that their antibody works the way it should, they can misinterpret their results.

  • One analysis, published in 2011 in a journal called Nature Structural & Molecular Biology,

  • tested 246 of these antibodies, each of which was said to only bind with one particular

  • protein.

  • But it turned out that about a quarter of them actually targeted more than one protein.

  • And four of /those/ … actually targeted the wrong kind of protein.

  • Researchers were using this stuff to detect proteins in their experiments, but the antibodies

  • could have been binding with a completely different materials -- creating false positives

  • and therefore, flawed results.

  • That’s exactly what happened to researchers at Mount Sinai Hospital in Toronto.

  • They wasted two years and half a million dollars using an antibody to look for a specific protein

  • that they thought might be connected to pancreatic cancer.

  • Then they figured out that the whole time, the antibody had actually been binding to

  • a /different/ cancer protein, and didn’t even target the protein they were looking

  • for.

  • So the antibody-production industry is having some quality control problems, and it’s

  • affecting a lot of biomedical research.

  • Some companies have already taken steps to try and ensure quality -- one reviewed its

  • entire catalogue in 2014 and cut about a third of the antibodies it had been offering.

  • Now, researchers /could/ try testing the antibodies themselves, to make sure they only bind to

  • the protein theyre supposed to. But that’s like conducting a whole separate study before

  • they even get to start on the main project.

  • Most research groups don’t have the time or money to do that.

  • But now that scientists are aware of the issue, they can at least be more careful about where

  • they get their antibodies.

  • Having accurate tools for research isn’t enough, though. Part of the reproducibility

  • crisis also has to do with how experiments are designed.

  • This can be a problem in all kinds of different fields, but it’s especially an issue for

  • psychology, where results often depend on human experience, which can be very subjective.

  • Experiments are supposed to be designed to control for as many external factors as possible,

  • so that you can tell if your experiment is actually what’s leading to the effect.

  • But in psychology, you can’t really control for all possible external factors. Too many

  • of them just have to do with the fact that humans are human.

  • One classic experiment, for example, showed that when when people had been exposed to

  • words related to aging, they walked more slowly.

  • Another research group tried to replicate that study and failed, but that doesn’t

  • necessarily prove or disprove the effect.

  • It’s possible that the replication study exposed the subjects to too /many/ aging-related

  • words, which might have ruined the subconscious effect.

  • Factors that weren’t directly related to the study could have also affected the results

  • -- like what color the room was, or the day of the week.

  • When such tiny differences can change the results of a study, it’s not too surprising

  • that when they reviewed those 100 psychology papers, Nosek’s research group was only

  • able to replicate 36 out of 97 successful studies.

  • But it also makes the results of the original studies pretty weak.

  • At the very least, being able to replicate a study can show the strength of its results,

  • so some scientists have been calling for more replication to be done -- in lots of fields,

  • but especially in psychology.

  • Because it can just be hard to determine the strength of the results of psychological experiments.

  • The fact that journals are so selective about what they publish is another reason the results

  • of a study might turn out to be false.

  • A lot of the time, researchers are pressured to make their findings look as strong as possible.

  • When you publish papers, you get funding to do more research, and the more grant money

  • you bring in, the more likely it is that the academic institution sponsoring you will want

  • to keep you.

  • The problem is, journals are MUCH more likely to publish positive results than negative

  • ones.

  • Say youre a biologist, and you spend three months working on a potential cancer drug.

  • If after three months, you get positive results -- suggesting the drug worked -- then a journal

  • will probably want to publish those results.

  • But if you end up with negative results -- the drug didn’t work -- that’s just not as

  • interesting or exciting.

  • So negative results almost always go unpublished -- which means that there’s a lot of pressure

  • on researchers to conduct experiments that /do/ have positive results.

  • For example, Begley -- the biologist who led the cancer replication studies at Amgen -- tried

  • an experiment 50 times and hadn’t been able to reproduce the original results.

  • The lead researcher on the original study told Begley that /his/ team had tried the

  • experiment 6 times, and gotten positive results once.

  • So that’s what they published -- that one positive result.

  • There’s so much pressure to publish significant findings that researchers might not always

  • include all of their data in their analysis.

  • To get around this problem, some experts have suggested creating a new standard, where researchers

  • include a line in their papers saying that theyve reported any excluded data and all

  • aspects of their analysis.

  • If including that statement became standard practice, then if a paper /didn’t/ have

  • that line, that would be a red flag.

  • But even if researchers /do/ include all their data, they might just be doing the analysis

  • wrong.

  • Because: data analysis involves math.

  • Sometimes a lot of math. But researchers in a lot of fields -- like psychology and biology

  • -- aren’t necessarily trained in all that math.

  • You don’t always need to take courses in advanced statistical methods to get a degree

  • in biology.

  • So, sometimes, the data analysis that researchers do is justwrong.

  • And the peer reviewers don’t always catch it, because they haven’t been trained in

  • those methods, either.

  • Thenthere are p-values.

  • The term p-value is short for probability value, and it’s often used as a kind of

  • shorthand for the significance of scientific findings.

  • To calculate a p-value, you first assume the /opposite/ of what you want to prove.

  • So if you were testing a cancer drug, for instance, and you found that it kills cancer

  • cells.

  • To calculate the p-value for your study, you’d start by assuming that the drug /doesn’t/

  • kill cancer cells.

  • Then, you’d calculate the odds that the cancer cells would die anyway. That would

  • be your p-value.

  • In other words, a p-value tells you the probability that the results of an experiment were a total

  • coincidence.

  • So for that cancer drug youre testing, a p-value of less than .01 would mean that

  • there’s a less than 1% chance the cancer cells would die even if the drug didn’t

  • kill cancer cells.

  • Usually, the standard for whether results are worth publishing is a p-value of less

  • than .05 -- which would translate to less than a 5% chance of the cancer cells dying

  • by coincidence.

  • 5% is a 1 in 20 chance, which is pretty low.

  • And there are /lots/ of studies that get published with p-values just under .05.

  • Odds are that for at least a few of them, the results will be a coincidence -- and the

  • findings will be wrong.

  • That’s why a lot of people argue that p-values aren’t a good metric for whether results

  • are significant.

  • Instead, they suggest placing more emphasis on things like effect size, which tells you

  • more than just whether an experiment produced some kind of change. It tells you how /big/

  • the change was.

  • They also suggest more sharing of data -- including unpublished data -- something that’s gradually

  • becoming more popular and accepted.

  • So, yes -- there is a replication crisis, and it’s been highlighting a lot of problems

  • with the scientific research and publication process.

  • But scientists are also doing their best to solve it.