Subtitles section Play video
Hi. It's Mr. Andersen and welcome to my podcast on the Chi-squared test. Chi-squared
test if you look at the equation lots of students get scared right away. It's really simple
once you figure it out. So don't be scared away, but Chi-squared test especially in AP
biology, especially in science is very important. And it's a way to compare when you collect
data, is the variation in your data just due to chance or is it due to one of the variables
that you're actually testing. And so the first thing you should figure out is what are the,
what do all these variables mean?
So the first one, this right here stands for Chi-squared. And so this was developed way
in the early part of the 1900s by Carl Pearson. Pearson's Chi-squared test. So, what is this
then? That is going to be a sum. So we're going to add up a number of values in a Chi-squared
test. What does the O stand for? Well that's going to be for the data you actually collect.
And so we call that observed data. And then the E values are going to be the expected
values. And so if you're ever doing an experiment, you can actually figure out your expected
values before you start. And then you just simply compare them to your observed values.
Let me give you an example of that with these coins over here.
Let's say I flip a coin 100 times. And I get
62 heads and I get 38 tails. Well is that due to just chance? Or is there something
wrong with the coin? Or the way that I'm flipping the coin? And so the Chi-squared test allows
us to actually answer that. And so what I'm thinking in my head is something called a
Null Hypothesis. And so if we're flipping a coin 100 times. And I think I said 62 head
and 38 tails. Well that would be the observed value that we get in an experiment. But there'd
also be expected values because you know it should be 50 heads and 50 tails. And so you
used something called a null hypothesis in this case where you're saying there's not
statistical significant difference between the observed values and the expected frequencies
that we expect to get and what do we actually find.
And so it's cool, Chi-squared, because we
can actually measure our data, or look at our data and see is there a statistical difference
between those two. The best way to get good at Chi-squared is actually to do some problems.
Before we get to that there's two terms that I have to define. One is degrees of freedom
and then one is critical values. And so the whole point of a Chi-squared test is either
to accept or reject our null hypothesis. And so you have to either exceed or don't exceed
your critical value. But first of all we have to figure out where that number is in this
big chart right here.
First thing is something called degrees of freedom. So since we're comparing outcomes,
you have to have at least two outcomes in your experiment. So in this case if we have
heads and tails, we have two outcomes that we could get, so we'll say that's 2. And then
we simply subtract the number 1 from that to get the degrees of freedom. And so in this
case we have two outcomes minus 1 and so we would have 1 degree of freedom. Now you might
think to yourself why isn' there a zero on this chart? Well, if you just have one outcome
you have nothing to compare it to. So that's an easy way to think about that. So we figured
out that there is one degree of freedom in this case. The next thing you're looking at
is for a critical value. And the critical value that we'll always use in the class is
the 0.05 value. And so that's going to be this column right here. So the first thing
you do is find the 0.05 value and you don't worry about all of the other numbers. So that's
3.841 is something I just know because it means that I'm in the right chart or I'm in
the right column.
A way that I explain this to kids is that you can think of that as being 95% sure that
you're either accepting or rejecting your null hypothesis. And you can see that our
critical values get higher over here. So you can think as we move this way, if we really
want to be sure we'd have to exceed a higher critical value. So what's our null hypothesis
again. Null hypothesis's no statistical difference between observed and expected and so we either
accept or reject that value. So in this case our critical value would be 3.841. And so
when you calculate Chi-squared, if you get a number that is higher than 3.841 then you
reject that null hypothesis. And so there actually is something aside from just chance
that is causing you to get more heads than tails. And if you don't exceed the critical
value then you accept that null hypothesis. And this is usually what ends up happening,
unless you have a variable that's impacting your results. Let's apply this in a couple
of different cases.
So this is my wife here. I asked her to flip a coin and so I asked the statistics teacher
how much data do you have to get before you can actually apply the Chi-squared test? And
Mr. Humberger said something magic about 30. And so I want to exceed that number in each
of these experiments and so this is my wife down here. This is her hand. And what she's
going to do is she's going to, let me get a value you can see, she's going to flip 50
coins. You can see she's really fast so she's flipping 50 coins and then she's sorting them
out. And so if we look at that, the first thing, even before you collect the data is
we could look at the expected values. And so we've got heads or tails. And so if you
flip 50 coins how many do we expect to come up as heads? The right answer would be 25.
And how many would we expect to come up as tails? 25 as well. Now let's say your data
is not as even as that. If you're looking at fruit flies it might be 134 or 133. Well
let's say I flip 51 coins for example instead of 50 then my expected values would be 25.5
and 25.5. So expected values since they're just due to probability don't have to be a
whole number.
If we look at our observed values, well let's look down here. How many heads did we get?
28 heads. And how many tails did we get? So that would just be 22. Okay. So now we're
going to apply Chi-squared and come up with a critical value. And so, what does that mean?
Well let me get this out of the way. So we're going to take our equation which is O minus
E squared over E, and we're going to do that for the heads column and then we're going
to do it for the tails column. So we've also got O minus E squared over E for the tails
column. And so our observed value is going to be 28. So it's 28 minus 25, which is expected,
squared over 25. Now this sum means that we're going to add these two values together so
I'm going to put a plus sign right here. Now we're going to do the tails side. So what's
our observed? It's 22 minus 25 squared over 25. So you can do this in your head. 28 minus
25 is 3, square that is 9. 9 over 25 plus 22 minus 25 is negative 3 squared. It's 9
over 25. And so our answer is 18 over 25 which equals 0.72.
Okay. So that's our Chi-squared value for
this data that we just collected. Now let's go over here to our critical values. Well
we said that we had 1 degree of freedom, because there's two outcomes. 2 minus 1 is 1. So we're
in this right here, this row right here. And then here is our magical 0.05 column and so
our critical value is 3.841. And so if we get a number higher than that we reject our
null hypothesis. We didn't, so we got a value that is lower than that, 0.72 so that means
we have to accept our null hypothesis. That means that my wife did a great job. There's
nothing wrong with the coins. There's not way more heads then there should be and so
we have to accept the null hypothesis that there's no statistical difference between
what we observe and what we expect to see.
So now let's try a little more complex problem. Now we've got dice. So we've got 36 dice.
So let me get this out here. So our expected values, well there are six things you could
get. So we could get a 1, 2, 3, 4, 5 or 6. And so let's play this out. So expected values,
since I have 36 dice here, we would expect to get 6 of each of those numbers coming up.
So I'm just taking 36 total dice divided by 6 so I got 6. But let's see what we get for
observed values. Oh, it looks like we're getting a lot of sixes. So if we look at the observed
values for one here we get 2 ones. We look at the twos, we get 4 of those. For the threes
it looks like 8 threes. For the fours we get 9. For the fives we just get 3. And then for
the sixes, look at all the sixes, so we get 10 right here. Okay. Now we have to figure
out a Chi-squared value. So let me get this out of the way.
And I'm going to stop talking and do the math
and speed up the video a little bit. And so hopefully I don't screw up any of this. So
that is 58 over 6 which is 9.6. So that is our Chi-squared value. It's 9.6 in this case.
Since we added all these up. So now we've got to go over here to our chart. And so first
of all we have to figure out how many degrees of freedom do we have. Well, since there are
6 different outcomes and we take 6 minus 1, so we've got 5. We're in this column of the
0.05 right here so if I read across our critical value is 11.070. And so if we look at that,
did our value go higher than that, no it's only 9.6, it's lower than that, so in this
case since it's 9.6, even though we had all of those sixes we still need to accept our
null hypothesis that there's no statistical significance between or difference between
what we observed and then what we expected.
So now let's leave you with this question. So in the animal behavior podcast as I talk
about that, we're looking at pill bugs and if they spend more time in the wet or if they
spend more time in the dry. And so if you look at the values right here, this is recording
how much time they spend in the wet and how much time they spend in the dry. So what I've
done is we would expect since there are 10 pill bugs we'd have 5 on each side. But since
it looks like they're spending more time on the wet, you can even see them in the video
here spending more time in the wet, I take the average of the wet and the average of
the dry column. And that gives me my wet and my dry and so now I'm not going to show you
how to do this one, but try to apply Chi-squared to figure out if there's a statistical difference
between the expected values of what we expect and what we observed. And you can put your
answer down in the comments. And so I hope that's helpful.