Name: Variance, Standard Deviation, Coefficient of Variation
Uploaded: 2020-03-09T00:41:12.000Z
Duration: 10 min 8 s
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

Transition words & phrases

There are many ways to quantify variability, however, we will focus on the most common

ones: variance, standard deviation, and coefficient of variation.

Adverbs of manner

In the field of statistics, we will typically use different formulas when working with population

When you have the whole population, each data point is known so you are 100% sure of the

When you take a sample of this population and you compute a sample statistic, it is

interpreted as an approximation of the population parameter.

Moreover, if you extract 10 different samples from the same population, you will get 10

Statisticians have solved the problem by adjusting the algebraic formulas for many statistics

Therefore, we will explore both population and sample formulas, as they are both used.

You must be asking yourself why there are unique formulas for the mean, median and mode.

Well, actually, the sample mean is the average of the sample data points, while the population

mean is the average of the population data points.

Technically there are two different formulas, but they are computed in the same way.

After this short clarification, it’s time to get onto variance.

Variance measures the dispersion of a set of data points around their mean value.

Population variance, denoted by sigma squared, is equal to the sum of squared differences

between the observed values and the population mean, divided by the total number of observations.

Sample variance, on the other hand, is denoted by s squared and is equal to the sum of squared

differences between observed sample values and the sample mean, divided by the number

*** When you are getting acquainted with statistics,

it is hard to grasp everything right away.

Therefore, let’s stop for a second to examine the formula for the population and try to

The main part of the formula is its numerator, so that’s what we want to comprehend.

The sum of differences between the observations and the mean, squared.

Hmm… so, the closer a number to the mean, the lower the result we will obtain, right?

And the further away from the mean it lies, the larger this difference.

But why do we elevate to the second degree?

Squaring the differences has two main purposes.

First, by squaring the numbers, we always get non-negative computations.

Without going too deep into the mathematics of it, it is intuitive that dispersion cannot

Dispersion is about distance and distance cannot be negative.

If, on the other hand, we calculate the difference and do not elevate to the second degree, we

would obtain both positive and negative values that when summed would cancel out, leaving

us with no information about the dispersion.

Second, squaring amplifies the effect of large differences.

For example, if the mean is 0 and you have an observation of 100, the squared spread

We have a population of five observations – 1, 2, 3, 4 and 5.

We start by calculating the mean: 1+2+3+4+5 divided by 5 equals 3.

Then we apply the formula we just saw: 1 minus 3 squared, plus, 2 minus 3 squared, plus,

3 minus 3, squared, plus, 4 minus 3, squared, plus, 5 minus 3, squared.

All of these components have to be divided by 5.

So, the population variance of the data set is 2.

This would only be suitable if we were told that these five observations were a sample

The numerator is the same, but the denominator is going to be 4, instead of 5, giving us

To conclude the variance topic, we should interpret the result.

Why is the sample variance bigger than the population variance?

In the first case, we knew the population, that is, we had all the data and we calculated

In the second case, we were told that 1, 2, 3, 4 and 5 was a sample, drawn from a bigger

Imagine the population of this sample were these 9 numbers: 1, 1, 1, 2, 3, 4, 5, 5 and

Clearly, the numbers are the same, but there is a concentration around the two extremes

So, our sample variance has rightfully corrected upwards in order to reflect the higher potential

This is the reason why there are different formulas for sample and population data.

*** While variance is a common measure of data

dispersion, in most cases the figure you will obtain is pretty large and hard to compare

The easy fix is to calculate its square root and obtain a statistic known as standard deviation.

In most analyses you perform, standard deviation will be much more meaningful than variance.

As we saw in the previous lecture, there are different measures for the population and

Consequently, there is also population and sample standard deviation.

The formulas are: the square root of the population variance and square root of the sample variance

I believe there is no need for an example of the calculation, right?

If you have a calculator in your hands, you’ll be able to do the job.

The other measure we still have to introduce is the coefficient of variation.

It is equal to the standard deviation, divided by the mean.

Another name for the term is relative standard deviation.

This is an easy way to remember its formula – it is simply the standard deviation relative

As you probably guessed, there is a population and sample formula once again.

So, standard deviation is the most common measure of variability for a single data set.

But why do we need yet another measure such as the coefficient of variation?

Well, comparing the standard deviations of two different data sets is meaningless, but

Subtitles ListPlay Video

Variance, Standard Deviation, Coefficient of Variation

recap

average

negative

intuitive