Name: Test Statistics: Crash Course Statistics #26
Uploaded: 2019-11-01T01:22:32.000Z
Duration: 12 min 50 s
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics. Sometimes random

Adverbs of degree

variation can make it tricky to tell when there are true differences or if it's just random.

Like whether a sample difference of $20 a

month represents a real difference between the average rates of two car insurance companies.

Or whether a 1 point increase in your AP Stats grade for every hour you study represents

These situations seem pretty different, but when we get down to it, they share a similar

pattern. There's actually one idea, which--with a few tweaks--can help us answer ALL of our

“is it random...or is it real” questions.

That's what test statistics do. Test statistics allow us to quantify how close things are

to our expectations or theories. Something that's not always easy for us to do based

on our gut feelings. And test statistics allow us to add a little more mathematical rigor

to the process, so that we can make decisions about these questions.

In previous episodes, z-scores helped us understand the idea that differences are relative.

A difference of 1 second is meaningful when you are looking at the differences in

the average time it takes two groups of elite Olympic athletes to complete a 100 meter freestyle swim.

It's less meaningful when you're looking at the differences in the average

time it takes two groups of recreational swimmers.

The amount of variance in a group is really important in judging a difference. Elite Olympic

athletes vary only a little. Their 100 meter times are relatively close together, and a

10th of a second can mean the difference between a gold and a bronze medal. Whereas non professionals

have more variation; the fastest swimmers could finish a whole minute before the slower

A difference of 1 second isn't a big deal between two groups of recreational swimmers

because the difference is small compared to the natural variation we'd expect to see.

Two groups of casual swimmers may differ by 10 or more seconds, even if their true underlying

times were the same, just because of random variation.

That's why test statistics look at the difference between data and what we'd expect to see

if the null hypothesis is true. But they also include some very important context: a measure

of “average” variation we'd expect to see, like how much novice or pro swimmers

differ. Test statistics help us quantify whether data fits our null hypothesis well.

A z-score is a test statistic. Let's look at a simple example. Say your IQ is 130. You're

so smart. And the population mean is 100.

On average we expect someone to be about 15 points from the mean. So the difference we

observed, 30, is twice the amount that we'd expect to see on average. Your z score would be 2.

And you can z-score any normal distribution--like a population distribution. But also a sampling

distribution which is the distribution of all possible group means for a certain sample size.

You might remember we first learned about sampling distribution in episode 19.

We often have questions about groups of people. Finding out that you're two standard deviations

above the mean for IQ is pretty ego boosting, but it won't really help further science.

We could look at whether children with more than 100 books in their home have a higher

than average IQs. Let's say we take a random sample of 25 children with over 100 books.

Then we measure their IQs. The average IQ is 110.

We can calculate a z-score for our particular group mean. The steps are exactly the same,

we're just now looking at the sampling distribution of sample means rather than the population distribution.

Instead of taking an individual score and subtracting the population mean, we take a

group mean and subtract the mean of our sampling distribution under the null hypothesis. Then

we divide by the standard error, which is the standard deviation of the sampling distribution.

So, the z-score--also called the z-statistic--tells us how many standard errors away from the

sampling distribution mean our group mean is.

Z-statistics around 1 or -1 tell us that the sample mean is the typical distance we'd

expect a typical sample mean to be from the mean of the null hypothesis.

Z-statistics that are a lot bigger in magnitude than 1 or -1 mean that this sample mean is

Which matches the general form of a test statistic:

The p-value will tell us how rare or extreme our data is so that we can figure out whether

we think there's an effect. Like whether children with more than 100 books in their

home have a higher than average IQ. Historically we've done this with tables, but most statistical

programs, even Excel, can calculate this.

We can use z-tests to do hypothesis tests about means, differences between means, proportions,

A researcher may want to know whether people in a certain region who got this year's

flu vaccine were less likely to get the flu. They randomly sample 1000 people and found

that 600 people got the flu vaccine, and 400 didn't.

Out of the 600 people who got the vaccine, 20% still got the flu. Out of the 400 people

who did not get the vaccine, 26% got the flu.

It seems like you're more likely to get the flu if you didn't get a flu shot, but

we're not sure if this difference is pretty small compared to random variation, or pretty large.

To calculate our z-statistic for this question,

we first have to remember our general form:

There's a 6% difference between the proportion of the vaccinated and unvaccinated groups,

and we want to know how “different” 6% is from 0%.

A difference of 0% would mean there's no difference between flu rates between the two groups.

So our observed difference is 6 minus 0 percent, or 6%.

For this question, the “average variation” of what percent of people get the flu is the

standard error from our sampling distribution. We calculate it using the average proportion

of people who got the flu, and didn't get the flu:

If our observed difference of 6% is large compared to the standard error--which is the

amount of variation we expect by chance--we consider the difference to be “statistically

significant”. We've found evidence suggesting the null might not be accurate.

There's two main ways of telling whether this z-statistic, which is about 2.2295 in

our case, represents a statistically significant result.

The first way is to calculate a “critical” value. A critical value is a value of our

test statistic that marks the limits of our “extreme” values. A test statistic that

is more extreme than these critical values (that is it's towards the tails) causes

We calculate our critical value by finding out which test-statistic value corresponds

to the top 0.5, 1, or 5% most extreme values. For a z-test with alpha = 0.05, the critical

If your z-statistic is more extreme than the critical value, you call it “statistically

significant”. So, we found evidence...in this case...that the flu shot is working.

But sometimes, a z-test won't apply. And when that happens, we can use the t-distribution

and corresponding t-statistic to conduct a hypothesis test.

The t-test is just like our z-test. It uses the same general formula for its t-statistic.

But we use a t-test if we don't know the true population standard deviation.

As you can see, it looks like our z-statistic, except that we're using our sample standard

deviation instead of the population standard deviation in the denominator.

The t-distribution looks like the z-distribution, but with thicker tails. The tails are thicker

because we're estimating the true population standard deviation.

Subtitles ListPlay Video

Test Statistics: Crash Course Statistics #26

episode

equivalent

significant

critical