Subtitles section Play video
-
Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics. Sometimes random
-
variation can make it tricky to tell when there are true differences or if it's just random.
-
Like whether a sample difference of $20 a
-
month represents a real difference between the average rates of two car insurance companies.
-
Or whether a 1 point increase in your AP Stats grade for every hour you study represents
-
a real relationship between the two.
-
These situations seem pretty different, but when we get down to it, they share a similar
-
pattern. There's actually one idea, which--with a few tweaks--can help us answer ALL of our
-
“is it random...or is it real” questions.
-
That's what test statistics do. Test statistics allow us to quantify how close things are
-
to our expectations or theories. Something that's not always easy for us to do based
-
on our gut feelings. And test statistics allow us to add a little more mathematical rigor
-
to the process, so that we can make decisions about these questions.
-
INTRO
-
In previous episodes, z-scores helped us understand the idea that differences are relative.
-
A difference of 1 second is meaningful when you are looking at the differences in
-
the average time it takes two groups of elite Olympic athletes to complete a 100 meter freestyle swim.
-
It's less meaningful when you're looking at the differences in the average
-
time it takes two groups of recreational swimmers.
-
The amount of variance in a group is really important in judging a difference. Elite Olympic
-
athletes vary only a little. Their 100 meter times are relatively close together, and a
-
10th of a second can mean the difference between a gold and a bronze medal. Whereas non professionals
-
have more variation; the fastest swimmers could finish a whole minute before the slower
-
swimmers.
-
A difference of 1 second isn't a big deal between two groups of recreational swimmers
-
because the difference is small compared to the natural variation we'd expect to see.
-
Two groups of casual swimmers may differ by 10 or more seconds, even if their true underlying
-
times were the same, just because of random variation.
-
That's why test statistics look at the difference between data and what we'd expect to see
-
if the null hypothesis is true. But they also include some very important context: a measure
-
of “average” variation we'd expect to see, like how much novice or pro swimmers
-
differ. Test statistics help us quantify whether data fits our null hypothesis well.
-
A z-score is a test statistic. Let's look at a simple example. Say your IQ is 130. You're
-
so smart. And the population mean is 100.
-
On average we expect someone to be about 15 points from the mean. So the difference we
-
observed, 30, is twice the amount that we'd expect to see on average. Your z score would be 2.
-
And you can z-score any normal distribution--like a population distribution. But also a sampling
-
distribution which is the distribution of all possible group means for a certain sample size.
-
You might remember we first learned about sampling distribution in episode 19.
-
We often have questions about groups of people. Finding out that you're two standard deviations
-
above the mean for IQ is pretty ego boosting, but it won't really help further science.
-
We could look at whether children with more than 100 books in their home have a higher
-
than average IQs. Let's say we take a random sample of 25 children with over 100 books.
-
Then we measure their IQs. The average IQ is 110.
-
We can calculate a z-score for our particular group mean. The steps are exactly the same,
-
we're just now looking at the sampling distribution of sample means rather than the population distribution.
-
Instead of taking an individual score and subtracting the population mean, we take a
-
group mean and subtract the mean of our sampling distribution under the null hypothesis. Then
-
we divide by the standard error, which is the standard deviation of the sampling distribution.
-
So, the z-score--also called the z-statistic--tells us how many standard errors away from the
-
sampling distribution mean our group mean is.
-
Z-statistics around 1 or -1 tell us that the sample mean is the typical distance we'd
-
expect a typical sample mean to be from the mean of the null hypothesis.
-
Z-statistics that are a lot bigger in magnitude than 1 or -1 mean that this sample mean is
-
more extreme.
-
Which matches the general form of a test statistic:
-
The p-value will tell us how rare or extreme our data is so that we can figure out whether
-
we think there's an effect. Like whether children with more than 100 books in their
-
home have a higher than average IQ. Historically we've done this with tables, but most statistical
-
programs, even Excel, can calculate this.
-
We can use z-tests to do hypothesis tests about means, differences between means, proportions,
-
or even differences between proportions.
-
A researcher may want to know whether people in a certain region who got this year's
-
flu vaccine were less likely to get the flu. They randomly sample 1000 people and found
-
that 600 people got the flu vaccine, and 400 didn't.
-
Out of the 600 people who got the vaccine, 20% still got the flu. Out of the 400 people
-
who did not get the vaccine, 26% got the flu.
-
It seems like you're more likely to get the flu if you didn't get a flu shot, but
-
we're not sure if this difference is pretty small compared to random variation, or pretty large.
-
To calculate our z-statistic for this question,
-
we first have to remember our general form:
-
There's a 6% difference between the proportion of the vaccinated and unvaccinated groups,
-
and we want to know how “different” 6% is from 0%.
-
A difference of 0% would mean there's no difference between flu rates between the two groups.
-
So our observed difference is 6 minus 0 percent, or 6%.
-
For this question, the “average variation” of what percent of people get the flu is the
-
standard error from our sampling distribution. We calculate it using the average proportion
-
of people who got the flu, and didn't get the flu:
-
If our observed difference of 6% is large compared to the standard error--which is the
-
amount of variation we expect by chance--we consider the difference to be “statistically
-
significant”. We've found evidence suggesting the null might not be accurate.
-
There's two main ways of telling whether this z-statistic, which is about 2.2295 in
-
our case, represents a statistically significant result.
-
The first way is to calculate a “critical” value. A critical value is a value of our
-
test statistic that marks the limits of our “extreme” values. A test statistic that
-
is more extreme than these critical values (that is it's towards the tails) causes
-
us to reject the null .
-
We calculate our critical value by finding out which test-statistic value corresponds
-
to the top 0.5, 1, or 5% most extreme values. For a z-test with alpha = 0.05, the critical
-
values are 1.96 and -1.96.
-
If your z-statistic is more extreme than the critical value, you call it “statistically
-
significant”. So, we found evidence...in this case...that the flu shot is working.
-
But sometimes, a z-test won't apply. And when that happens, we can use the t-distribution
-
and corresponding t-statistic to conduct a hypothesis test.
-
The t-test is just like our z-test. It uses the same general formula for its t-statistic.
-
But we use a t-test if we don't know the true population standard deviation.
-
As you can see, it looks like our z-statistic, except that we're using our sample standard
-
deviation instead of the population standard deviation in the denominator.
-
The t-distribution looks like the z-distribution, but with thicker tails. The tails are thicker
-
because we're estimating the true population standard deviation.
-
Estimation adds a little more uncertainty ...which means thicker tails, since extreme
-
values are a little more common. But as we get more and more data, the t-distribution
-
converges to the z-distribution, so with really large samples, the z and t-tests should give
-
us similar p-values.
-
If we're ever in a situation where we had the population standard deviation, a z-test
-
is the way to go. But a t-test is useful when we don't have that information.
-
For example, we can use a t-test to ask whether the average wait time at a car repair shop
-
across the street is different from the time you'll wait at a larger shop 10 minutes away.
-
We collect data from 50 customers who need to take their cars in for major repairs. 25
-
are randomly assigned to go to the smaller repair shop, and the other 25 are sent to
-
the larger shop.
-
After measuring the amount of time it took for repairs to be completed, we find that
-
people who went to the smaller shop had an average wait time of 14 days. People who went
-
to the larger shop had an average wait time of 13.25 days, which means there was a difference
-
of 0.75 days in wait time.
-
But we don't know whether it's likely that this 0.75 day difference is just due
-
to random variation between customers....at least not until we conduct a t-test on the
-
difference between the means of the two groups.
-
Before we do our test, we need to decide on an alpha level. We set our alpha at 0.01,
-
because we want to be a bit more cautious about rejecting the null hypothesis than we
-
would be if we used the standard of 0.05.
-
Now we can calculate the t-statistic for our two-sample t-test. If the null hypothesis
-
was true, then there would be no real difference between the mean wait times of the two groups.
-
And the alternative hypothesis is that the two means are not equal.
-
The two sample t-statistic again follows the general form:
-
We observed a 0.75 day difference in wait times between groups. We'd expect to see
-
a difference of 0 if the null were true. Our measure of average variation is the standard error.
-
The standard error is the typical distance that a sample mean will be from the population mean.
-
This time, we're looking at the sampling distribution of differences between means--all
-
the possible differences between two groups-- which is why the standard error formula may
-
look a little different.
-
Putting it all together we get a t-statistic of about 2.65.
-
If we plug that into our computer, we can see that this test statistic has a p-value
-
of about .0108. Since we set our alpha at 0.01, a p-value needs to be smaller than 0.01
-
to reject the null hypothesis. Ours isn't. Barely, but it isn't.
-
So it might have seemed like the larger repair shop was definitely going to be faster but
-
it's actually not so clear. And this doesn't mean that there isn't a difference, we just
-
couldn't find any evidence that there was one.
-
So if you're trying to decide which shop to take you car to, maybe consider something
-
other than speed. And we could do similar test experiments for cost or reliability or friendliness.
-
You might notice that throughout the examples in this episode, we used two methods of deciding
-
whether something was significant: critical values and p-values.
-
These two methods are equivalent. Large test statistics and small p-values both refer to
-
samples that are extreme. A test statistic that's bigger than our
-
critical value would allow us to reject the null hypothesis. And any test-statistic that's
-
larger than the critical value will have a p-value less than 0.05. So, the two methods
-
will lead us to the same conclusion.
-
If you have trouble remembering it, this rhyme may help: “Reject H-Oh if the p is too low”
-
These two methods are equivalent. But we often use p-values instead of critical values. This
-
is because each test-statistic, like the z or t statistics, have different critical values,
-
but a p-value of less than 0.05 means that your sample is in the top 5% of extreme samples
-
no matter if you use a z or t test-statistic - or some of the other test-statistic we haven't
-
discussed like F or chi-square.
-
Test statistics form the basis of how we can test if things are actually different or what
-
we seeing is just normal variation. They help us know how likely it is that our results
-
are normal, or if something interesting is going on.
-
Like whether drinking that water upside down is actually stopping your hiccups faster
-
than doing nothing. Then you can test drinking pickle juice to stop hiccups. Or really slowly
-
eating a spoonful of creamy peanut butter. Let the testing commence! Thanks for watching.
-
I'll see you next time.