 ## Subtitles section Play video

• This lecture is going to serve as an overview of what a probability distribution is and

• what main characteristics it has.

• Simply put, a distribution shows the possible values a variable can take and how frequently

• they occur.

• Before we start, let us introduce some important notation we will use for the remainder of

• the course.

• Assume thatupper-case Y” represents the actual outcome of an event andlowercase

• y” represents one of the possible outcomes.

• One way to denote the likelihood of reaching a particular outcome “y”, is P of, Y equals

• y.

• We can also express it as “p of y”.

• For example, uppercase “Y” could represent the number of red marbles we draw out of a

• bag and lowercase “y” would be a specific number, like 3 or 5.

• Then, we express the probability of getting exactly 5 red marbles as “P, of Y equals

• 5”, or “p of 5”.

• Since “p of y” expresses the probability for each distinct outcome, we call this the

• probability function.

• Good job, folks!

• So, probability distributions, or simply probabilities, measure the likelihood of an outcome depending

• on how often it features in the sample space.

• Recall that we constructed the probability frequency distribution of an event in the

• introductory section of the course.

• We recorded the frequency for each unique value and divide it by the total number of

• elements in the sample space.

• Usually, that is the way we construct these probabilities when we have a finite number

• of possible outcomes.

• If we had an infinite number of possibilities, then recording the frequency for each one

• becomes impossible, becausethere are infinitely many of them!

• For instance, imagine you are a data scientist and want to analyse the time it takes for

• your code to run.

• Any single compilation could take anywhere from a few the milliseconds to several days.

• Often the result will be between a few milliseconds and a few minutes.

• If we record time in seconds, we lose precision which we want to avoid.

• To do so we need to use the smallest possible measurement of time.

• Since every milli-, micro-, or even nanosecond could be split in half for greater accuracy,

• no such thing exists.

• Less than an hour from now we will talk in more detail about continuous distributions

• and how to deal with them.

• Let’s introduce some key definitions.

• Now, regardless of whether we have a finite or infinite number of possibilities, we define

• distributions using only two characteristicsmean and variance.

• Simply put, the mean of the distribution is its average value.

• Variance, on the other hand, is essentially how spread out the data is.

• We measure thisspreadby how far away from the mean all the values are.

• We denote the mean of a distribution as the Greek lettermuand its variance as

• sigma squared”.

• Okay.

• When analysing distributions, it is important to understand what kind of data we have - population

• data or sample data.

• Population data is the formal way of referring toallthe data, while sample data is

• just a part of it.

• For example, if an employer surveys an entire department about how they travel to work,

• the data would represent the population of the department.

• However, this same data would also be just a sample of the employees in the whole company.

• Something to remember when using sample data is that we adopt different notation for the

• mean and variance.

• We denote sample mean as “x barand sample variance as “s” squared.

• One flaw of variance is that it is measured in squared units.

• For example, if you are measuring time in seconds, the variance would be measured in

• seconds squared.

• Usually, there is no direct interpretation of that value.

• To make more sense of variance, we introduce a third characteristic of the distribution,

• called standard deviation.

• Standard deviation is simply the positive square root of variance.

• As you can suspect, we denote it assigmawhen dealing with a population, and as “s”

• when dealing with a sample.

• Unlike variance, standard deviation is measured in the same units as the mean.

• Thus, we can directly interpret it and is often preferable.

• One idea which we will use a lot is that any value betweenmu minus sigmaandmu

• plus sigmafalls within one standard deviation away from the mean.

• The more congested the middle of the distribution, the more data falls within that interval.

• Similarly, the less data falls within the interval, the more dispersed the data is.

• Fantastic!

• It is important to know there exists a constant relationship between mean and variance for

• any distribution.

• By definition, the variance equals the expected value of the squared difference from the mean

• for any value.

• We denote this assigma squared, equals the expected value of Y minus mu, squared”.

• After some simplification, this is equal to the expected value of “Y squaredminus

• musquared.

• As we will see in the coming lectures, if we are dealing with a specific distribution,

• we can find a much more precise formula.

• Okay, when we are getting acquainted with a certain dataset we want to analyse or make

• predictions with, we are most interested in the mean, variance and type of the distribution.

• In our next video we will introduce several distributions and the characteristics they

• possess.

• Thanks for watching!

• 4.2 Types of distributions

• Hello, again!

• In this lecture we are going to talk about various types of probability distributions

• and what kind of events they can be used to describe.

• Certain distributions share features, so we group them into types.

• Some, like rolling a die or picking a card, have a finite number of outcomes.

• They follow discrete distributions and we use the formulas we already introduced to

• calculate their probabilities and expected values.

• Others, like recording time and distance in track & field, have infinitely many outcomes.

• They follow continuous distributions and we use different formulas from the once we mentioned

• so far.

• Throughout the course of this video we are going to examine the characteristics of some

• of the most common distributions.

• For each one we will focus on an important aspect of it or when it is used.

• Before we get into the specifics, you need to know the proper notation we implement when

• defining distributions.

• We start off by writing down the variable name for our set of values, followed by the

• tildesign.

• This is superseded by a capital letter depicting the type of the distribution and some characteristics

• of the dataset in parenthesis.

• The characteristics are usually, mean and variance but they may vary depending on the

• type of the distribution.

• Alright!

• Let us start by talking about the discrete ones.

• We will get an overview of them and then we will devote a separate lecture to each one.

• So, we looked at problems relating to drawing cards from a deck or flipping a coin.

• Both examples show events where all outcomes are equally likely.

• Such outcomes are called equiprobable and these sorts of events follow a Uniform Distribution.

• Then there are events with only two possible outcomestrue or false.

• They follow a Bernoulli Distribution, regardless of whether one outcome is more likely to occur.

• Any event with two outcomes can be transformed into a Bernoulli event.

• We simply assign one of them to betrueand the other one to befalse”.

• Imagine we are required to elect a captain for our college sports team.

• The team consists of 7 native students and 3 international students.

• We assign the captain being domestic to betrueand the captain being an international

• asfalse”.

• Since the outcome can now only betrueorfalse”, we have a Bernoulli distribution.

• Now, if we want to carry out a similar experiment several times in a row, we are dealing with

• a Binomial Distribution.

• Just like the Bernoulli Distribution, the outcomes for each iteration are two, but we

• have many iterations.

• For example, we could be flipping the coin we mentioned earlier 3 times and trying to

• calculate the likelihood of getting heads twice.

• Lastly, we should mention the Poisson Distribution.

• We use it when we want to test out how unusual an event frequency is for a given interval.

• For example, imagine we know that so far Lebron James averages 35 points per game during the

• regular season.

• We want to know how likely it is that he will score 12 points in the first quarter of his

• next game.

• Since the frequency changes, so should our expectations for the outcome.

• Using the Poisson distribution, we are able to determine the chance of Lebron scoring

• exactly 12 points for the adjusted time interval.

• Great, now on to the continuous distributions!

• One thing to remember is that since we are dealing with continuous outcomes, the probability

• distribution would be a curve as opposed to unconnected individual bars.

• The first one we will talk about is the Normal Distribution.

• The outcomes of many events in nature closely resemble this distribution, hence the name

• Normal”.

• For instance, according to numerous reports throughout the last few decades, the weight

• of an adult male polar bear is usually around 500 kilograms.

• However, there have been records of individual species weighing anywhere between 350kg and

• 700kg.

• Extreme values, like 350 and 700, are called outliers and do not feature very frequently

• in Normal Distributions.

• Sometimes, we have limited data for events that resemble a Normal distribution.

• In those cases, we observe the Student’s-T distribution.

• It serves as a small sample approximation of a Normal distribution.

• Another difference is that the Student’s-T accommodates extreme values significantly

• better.

• Graphically, that is represented by the curve having fattertails”.

• Overall, this results in more values extremely far away from the mean, so the curve would

• probably more closely resemble a Student’s-T distribution than a Normal distribution.

• Now imagine only looking at the recorded weights of the last 10 sightings across Alaska and

• Canada.

• The lower number of elements would make the occurrence of any extreme value represent

• a much bigger part of the population than it should.

• Good job, everyone!

• Another continuous distribution we would like to introduce is the Chi-Squared distribution.

• It is the first asymmetric continuous distribution we are dealing with as it only consists of

• non-negative values.

• Graphically, that means that the Chi-Squared distribution always starts from 0 on the left.

• Depending on the average and maximum values within the set, the curve of the Chi Squared

• graph is usually skewed to the left.

• Unlike the previous two distributions, the Chi-Squared does not often mirror real life

• events.

• However, it is often used in Hypothesis Testing to help determine goodness of fit.

• The next distribution on our list is the Exponential distribution.

• The Exponential distribution is usually present when we are dealing with events that are rapidly

• changing early on.

• An easy to understand example is how online news articles generates hits.

• They get most of their clicks when the topic is still fresh.

• The more time passes, the more irrelevant it becomes as interest dies off.

• The last continuous distribution we will mention is the Logistic distribution.

• We often find it useful in forecast analysis when we try to determine a cut-off point for

• a successful outcome.

• For instance, take a competitive e-sport like Dota 2 . We can use a Logistic distribution

• to determine how much of an in-game advantage at the 10-minute mark is necessary to confidently

• predict victory for either team.

• Just like with other types of forecasting, our predictions would never reach true certainty

• but more on that later!

• Woah!

• Good job, folks!

• In the next video we are going to focus on discrete distributions.

• We will introduce formulas for competing Expected Values and Standard Deviations before looking

• into each distribution individually.

• Thanks for watching!

• 4.3 Discrete Distributions

• Welcome back!

• In this video we will talk about discrete distributions and their characteristics.

• Let’s get started!

• Earlier in the course we mentioned that events with discrete distributions have finitely

• many distinct outcomes.

• Therefore, we can express the entire probability distribution with either a table, a graph

• or a formula.

• To do so we need to ensure that every unique outcome has a probability assigned to it.

• Imagine you are playing darts.

• Each distinct outcome has some probability assigned to it based on how big its associated

• interval is.

• Since we have finitely many possible outcomes, we are dealing with a discrete distribution.

• Great!

• In probability, we are often more interested in the likelihood of an interval than of an

• individual value.

• With discrete distributions, we can simply add up the probabilities for all the values

• that fall within that range.

• Recall the example where we drew a card 20 times.

• Suppose we want to know the probability of drawing 3 spades or fewer.

• We would first calculate the probability of getting 0, 1, 2 or 3 spades and then add them

• up to find the probability of drawing 3 spades or fewer.

• One peculiarity of discrete events is that theThe probability of Y being less than

• or equal to y equals the probability of Y being less than y plus 1”.

• In our last example, that would mean getting 3 spades or fewer is the same as getting fewer

• than 4 spades.

• Alright!

• Now that you have an idea about discrete distributions, we can start exploring each type in more detail.

• In the next video we are going to examine the Uniform Distribution.

• Thanks for watching!

• 4.4 Uniform Distribution

• Hey, there!

• In this lecture we are going to discuss the uniform distribution.

• For starters, we use the letter U to define a uniform distribution, followed by the range

• of the values in the dataset.

• Therefore, we read the following statement asVariable “X” follows a discrete

• uniform distribution ranging from 3 to 7”.

• Events which follow the uniform distribution, are ones where all outcomes have equal probability.

• One such event is rolling a single standard six-sided die.

• When we roll a standard 6-sided die, we have equal chance of getting any value from 1 to

• 6.

• The graph of the probability distribution would have 6 equally tall bars, all reaching