Placeholder Image

Subtitles section Play video

  • This lecture is going to serve as an overview of what a probability distribution is and

  • what main characteristics it has.

  • Simply put, a distribution shows the possible values a variable can take and how frequently

  • they occur.

  • Before we start, let us introduce some important notation we will use for the remainder of

  • the course.

  • Assume thatupper-case Y” represents the actual outcome of an event andlowercase

  • y” represents one of the possible outcomes.

  • One way to denote the likelihood of reaching a particular outcome “y”, is P of, Y equals

  • y.

  • We can also express it as “p of y”.

  • For example, uppercase “Y” could represent the number of red marbles we draw out of a

  • bag and lowercase “y” would be a specific number, like 3 or 5.

  • Then, we express the probability of getting exactly 5 red marbles as “P, of Y equals

  • 5”, or “p of 5”.

  • Since “p of y” expresses the probability for each distinct outcome, we call this the

  • probability function.

  • Good job, folks!

  • So, probability distributions, or simply probabilities, measure the likelihood of an outcome depending

  • on how often it features in the sample space.

  • Recall that we constructed the probability frequency distribution of an event in the

  • introductory section of the course.

  • We recorded the frequency for each unique value and divide it by the total number of

  • elements in the sample space.

  • Usually, that is the way we construct these probabilities when we have a finite number

  • of possible outcomes.

  • If we had an infinite number of possibilities, then recording the frequency for each one

  • becomes impossible, becausethere are infinitely many of them!

  • For instance, imagine you are a data scientist and want to analyse the time it takes for

  • your code to run.

  • Any single compilation could take anywhere from a few the milliseconds to several days.

  • Often the result will be between a few milliseconds and a few minutes.

  • If we record time in seconds, we lose precision which we want to avoid.

  • To do so we need to use the smallest possible measurement of time.

  • Since every milli-, micro-, or even nanosecond could be split in half for greater accuracy,

  • no such thing exists.

  • Less than an hour from now we will talk in more detail about continuous distributions

  • and how to deal with them.

  • Let’s introduce some key definitions.

  • Now, regardless of whether we have a finite or infinite number of possibilities, we define

  • distributions using only two characteristicsmean and variance.

  • Simply put, the mean of the distribution is its average value.

  • Variance, on the other hand, is essentially how spread out the data is.

  • We measure thisspreadby how far away from the mean all the values are.

  • We denote the mean of a distribution as the Greek lettermuand its variance as

  • sigma squared”.

  • Okay.

  • When analysing distributions, it is important to understand what kind of data we have - population

  • data or sample data.

  • Population data is the formal way of referring toallthe data, while sample data is

  • just a part of it.

  • For example, if an employer surveys an entire department about how they travel to work,

  • the data would represent the population of the department.

  • However, this same data would also be just a sample of the employees in the whole company.

  • Something to remember when using sample data is that we adopt different notation for the

  • mean and variance.

  • We denote sample mean as “x barand sample variance as “s” squared.

  • One flaw of variance is that it is measured in squared units.

  • For example, if you are measuring time in seconds, the variance would be measured in

  • seconds squared.

  • Usually, there is no direct interpretation of that value.

  • To make more sense of variance, we introduce a third characteristic of the distribution,

  • called standard deviation.

  • Standard deviation is simply the positive square root of variance.

  • As you can suspect, we denote it assigmawhen dealing with a population, and as “s”

  • when dealing with a sample.

  • Unlike variance, standard deviation is measured in the same units as the mean.

  • Thus, we can directly interpret it and is often preferable.

  • One idea which we will use a lot is that any value betweenmu minus sigmaandmu

  • plus sigmafalls within one standard deviation away from the mean.

  • The more congested the middle of the distribution, the more data falls within that interval.

  • Similarly, the less data falls within the interval, the more dispersed the data is.

  • Fantastic!

  • It is important to know there exists a constant relationship between mean and variance for

  • any distribution.

  • By definition, the variance equals the expected value of the squared difference from the mean

  • for any value.

  • We denote this assigma squared, equals the expected value of Y minus mu, squared”.

  • After some simplification, this is equal to the expected value of “Y squaredminus

  • musquared.

  • As we will see in the coming lectures, if we are dealing with a specific distribution,

  • we can find a much more precise formula.

  • Okay, when we are getting acquainted with a certain dataset we want to analyse or make

  • predictions with, we are most interested in the mean, variance and type of the distribution.

  • In our next video we will introduce several distributions and the characteristics they

  • possess.

  • Thanks for watching!

  • 4.2 Types of distributions

  • Hello, again!

  • In this lecture we are going to talk about various types of probability distributions

  • and what kind of events they can be used to describe.

  • Certain distributions share features, so we group them into types.

  • Some, like rolling a die or picking a card, have a finite number of outcomes.

  • They follow discrete distributions and we use the formulas we already introduced to

  • calculate their probabilities and expected values.

  • Others, like recording time and distance in track & field, have infinitely many outcomes.

  • They follow continuous distributions and we use different formulas from the once we mentioned

  • so far.

  • Throughout the course of this video we are going to examine the characteristics of some

  • of the most common distributions.

  • For each one we will focus on an important aspect of it or when it is used.

  • Before we get into the specifics, you need to know the proper notation we implement when

  • defining distributions.

  • We start off by writing down the variable name for our set of values, followed by the

  • tildesign.

  • This is superseded by a capital letter depicting the type of the distribution and some characteristics

  • of the dataset in parenthesis.

  • The characteristics are usually, mean and variance but they may vary depending on the

  • type of the distribution.

  • Alright!

  • Let us start by talking about the discrete ones.

  • We will get an overview of them and then we will devote a separate lecture to each one.

  • So, we looked at problems relating to drawing cards from a deck or flipping a coin.

  • Both examples show events where all outcomes are equally likely.

  • Such outcomes are called equiprobable and these sorts of events follow a Uniform Distribution.

  • Then there are events with only two possible outcomestrue or false.

  • They follow a Bernoulli Distribution, regardless of whether one outcome is more likely to occur.

  • Any event with two outcomes can be transformed into a Bernoulli event.

  • We simply assign one of them to betrueand the other one to befalse”.

  • Imagine we are required to elect a captain for our college sports team.

  • The team consists of 7 native students and 3 international students.

  • We assign the captain being domestic to betrueand the captain being an international

  • asfalse”.

  • Since the outcome can now only betrueorfalse”, we have a Bernoulli distribution.

  • Now, if we want to carry out a similar experiment several times in a row, we are dealing with

  • a Binomial Distribution.

  • Just like the Bernoulli Distribution, the outcomes for each iteration are two, but we

  • have many iterations.

  • For example, we could be flipping the coin we mentioned earlier 3 times and trying to

  • calculate the likelihood of getting heads twice.

  • Lastly, we should mention the Poisson Distribution.

  • We use it when we want to test out how unusual an event frequency is for a given interval.

  • For example, imagine we know that so far Lebron James averages 35 points per game during the

  • regular season.

  • We want to know how likely it is that he will score 12 points in the first quarter of his

  • next game.

  • Since the frequency changes, so should our expectations for the outcome.

  • Using the Poisson distribution, we are able to determine the chance of Lebron scoring

  • exactly 12 points for the adjusted time interval.

  • Great, now on to the continuous distributions!

  • One thing to remember is that since we are dealing with continuous outcomes, the probability

  • distribution would be a curve as opposed to unconnected individual bars.

  • The first one we will talk about is the Normal Distribution.

  • The outcomes of many events in nature closely resemble this distribution, hence the name

  • Normal”.

  • For instance, according to numerous reports throughout the last few decades, the weight

  • of an adult male polar bear is usually around 500 kilograms.

  • However, there have been records of individual species weighing anywhere between 350kg and

  • 700kg.

  • Extreme values, like 350 and 700, are called outliers and do not feature very frequently

  • in Normal Distributions.

  • Sometimes, we have limited data for events that resemble a Normal distribution.

  • In those cases, we observe the Student’s-T distribution.

  • It serves as a small sample approximation of a Normal distribution.

  • Another difference is that the Student’s-T accommodates extreme values significantly

  • better.

  • Graphically, that is represented by the curve having fattertails”.

  • Overall, this results in more values extremely far away from the mean, so the curve would

  • probably more closely resemble a Student’s-T distribution than a Normal distribution.

  • Now imagine only looking at the recorded weights of the last 10 sightings across Alaska and

  • Canada.

  • The lower number of elements would make the occurrence of any extreme value represent

  • a much bigger part of the population than it should.

  • Good job, everyone!

  • Another continuous distribution we would like to introduce is the Chi-Squared distribution.

  • It is the first asymmetric continuous distribution we are dealing with as it only consists of

  • non-negative values.

  • Graphically, that means that the Chi-Squared distribution always starts from 0 on the left.

  • Depending on the average and maximum values within the set, the curve of the Chi Squared

  • graph is usually skewed to the left.

  • Unlike the previous two distributions, the Chi-Squared does not often mirror real life

  • events.

  • However, it is often used in Hypothesis Testing to help determine goodness of fit.

  • The next distribution on our list is the Exponential distribution.

  • The Exponential distribution is usually present when we are dealing with events that are rapidly

  • changing early on.

  • An easy to understand example is how online news articles generates hits.

  • They get most of their clicks when the topic is still fresh.

  • The more time passes, the more irrelevant it becomes as interest dies off.

  • The last continuous distribution we will mention is the Logistic distribution.

  • We often find it useful in forecast analysis when we try to determine a cut-off point for

  • a successful outcome.

  • For instance, take a competitive e-sport like Dota 2 . We can use a Logistic distribution

  • to determine how much of an in-game advantage at the 10-minute mark is necessary to confidently

  • predict victory for either team.

  • Just like with other types of forecasting, our predictions would never reach true certainty

  • but more on that later!

  • Woah!

  • Good job, folks!

  • In the next video we are going to focus on discrete distributions.

  • We will introduce formulas for competing Expected Values and Standard Deviations before looking

  • into each distribution individually.

  • Thanks for watching!

  • 4.3 Discrete Distributions

  • Welcome back!

  • In this video we will talk about discrete distributions and their characteristics.

  • Let’s get started!

  • Earlier in the course we mentioned that events with discrete distributions have finitely

  • many distinct outcomes.

  • Therefore, we can express the entire probability distribution with either a table, a graph

  • or a formula.

  • To do so we need to ensure that every unique outcome has a probability assigned to it.

  • Imagine you are playing darts.

  • Each distinct outcome has some probability assigned to it based on how big its associated

  • interval is.

  • Since we have finitely many possible outcomes, we are dealing with a discrete distribution.

  • Great!

  • In probability, we are often more interested in the likelihood of an interval than of an

  • individual value.

  • With discrete distributions, we can simply add up the probabilities for all the values

  • that fall within that range.

  • Recall the example where we drew a card 20 times.

  • Suppose we want to know the probability of drawing 3 spades or fewer.

  • We would first calculate the probability of getting 0, 1, 2 or 3 spades and then add them

  • up to find the probability of drawing 3 spades or fewer.

  • One peculiarity of discrete events is that theThe probability of Y being less than

  • or equal to y equals the probability of Y being less than y plus 1”.

  • In our last example, that would mean getting 3 spades or fewer is the same as getting fewer

  • than 4 spades.

  • Alright!

  • Now that you have an idea about discrete distributions, we can start exploring each type in more detail.

  • In the next video we are going to examine the Uniform Distribution.

  • Thanks for watching!

  • 4.4 Uniform Distribution

  • Hey, there!

  • In this lecture we are going to discuss the uniform distribution.

  • For starters, we use the letter U to define a uniform distribution, followed by the range

  • of the values in the dataset.

  • Therefore, we read the following statement asVariable “X” follows a discrete

  • uniform distribution ranging from 3 to 7”.

  • Events which follow the uniform distribution, are ones where all outcomes have equal probability.

  • One such event is rolling a single standard six-sided die.

  • When we roll a standard 6-sided die, we have equal chance of getting any value from 1 to

  • 6.

  • The graph of the probability distribution would have 6 equally tall bars, all reaching

  • up to one sixth.

  • Many events in gambling provide such odds, where each individual outcome is equally likely.

  • Not only that, but many everyday situations follow the Uniform distribution.