Discrete Probability Distributions

It is the second article in the 3-part article series on the probability distributions.

The first part talked about Statistics, Probability, and distribution curves. In this part, I will talk about commonly used discrete probability distributions, including Binomial, Multinomial, Bernoulli, Poisson, and a particular case of Uniform distribution.

Discrete Probability Distributions

In the last article, we saw what a probability distribution is and how we can represent it using a density curve for all the possible outcomes.

An experiment with finite or countable outcomes, such as getting a Head or a Tail, or getting a number between 1-6 after rolling dice, etc. is represented with discrete probability distributions.

PMF: These probability distributions are represented by using a mathematical function. For discrete outcomes, this function is called the probability mass function, or PMF. It defines the probability distribution for each distinct outcome (1 to 6). The above picture depicts the PMF for a dice-roll experiment graphically.

Please note that the sum of probabilities in PMF for all outcomes will always equal one.

CDF: In specific statistical scenarios, we are interested in knowing the probability that the outcome will have a value less than or equal to a specific value. For example, during a dice-roll experiment, we might be interested in knowing the probability that the outcome is less than 4. This probability is represented by using the Cumulative Distribution Function or CDF. It is a function that provides a probability that a discrete outcome will have a value less than or equal to a specific value. The below figure represents the CDF for a dice roll experiment. Note how it reaches to 1 for the largest possible outcome.

In short,

  • Probability Mass Function (PMF): Probability for a discrete outcome
  • Cumulative Distribution Function (CDF): Probability less than or equal to a value

We will discuss some popular discrete probability distributions in this article:

  • Poisson distribution
  • Bernoulli and binomial distributions
  • Multinoulli and multinomial distributions
  • Discrete uniform distribution

Poisson Distribution:

Imagine you are a bank teller. In order to distribute pamphlets of a new loan scheme to the customers, you need to know how many customers will visit every hour. Now this number can vary a lot. It can start from zero (no visiting customers) to anything very large (overall size of the customer base). In other words, any random number of customers can visit every hour. This number is also discrete (1,2,3, and so on) in nature. The probability of such a countable outcome in a fixed amount of time is done by Poisson distribution.

All such distributions are characterized by the average (mean) number of outcomes per hour(λ). This value signifies that every hour, on an average, λ number of customers visit the bank.

In the above figure, notice how the probability mass function with λ = 10 peaks for ten customers and is less otherwise. Coincidently, the probability is the same for 9 and 10 customers visiting the branch in this case.

You can use this tool to generate Poisson distribution curves for any lambda value.

Bernoulli & Binomial distributions

In many real-life scenarios, we need to classify data samples as either a Yes or a No.

For example, often, we need to tag an email as spam or not. Similarly, we need to know if a team wins or loses (Success or Failure) in a given match. It is an example of a binary (binary means two) outcome problem.

Now imagine we say an email has a 20% chance of being spam, then it will not be spam with an 80% chance. That is, any given email will have binary probabilistic outcomes of either being spam or not.

“Bernoulli distribution gives us a way to model binary outcome problems.”

This probability distribution of any binary outcome event is explained by Bernoulli distribution. The above figure represents a PMF for a Bernoulli distribution. The probability of spamminess of an email, p, is the parameter of this distribution.

Now imagine if someone asks about the number of emails that are spam in an inbox of 20 emails? What distribution will this number follow? Again, the number of emails can vary between 0 and 20, making the outcome a probabilistic discrete value. Let’s assume that each email is either spam or not independent of others and follows the same Bernoulli distribution (with the same probability of spamminess, p). The number of spam emails in an inbox of 20 emails will then follow a Binomial distribution.

“Binomial distribution models a bundle of independent and same Bernoulli experiments.”

Notice that the probability of observing 4 or 5 spam emails is higher than finding ten or more spam emails. It is because the likeliness of individual email to be spam is very less. Had this probability been substantial, we would have expected a larger number of emails to be spam.

The number of independent Bernoulli experiments, denoted by (n), and the probability parameter (p) of each Bernoulli distribution are the two parameters of this distribution.

For example: In the email spam case above, we had 20 independent emails to consider and the probability of spamminess of each email was 0.2. Hence, n=20 and p=0.2 in this case.

Only after we know these two parameters, we can sketch their PMF and CDF using any statistical tool. An interesting such tool can be accessed here. You just need to insert any suitable value of n and p here to see the changes.

In short, Bernoulli distribution is used for experiments with two possible outcomes. On the other hand, Binomial distribution is used for a bundle of independent and same Bernoulli experiments.

Bernoulli and Binomial distributions find their intensive applications in the Machine Learning domain, e.g., classifying loan defaulters from others, filtering spam emails, etc. The field of this discussion is for deeper understanding and we will omit that in this article.

Discrete Uniform Distribution:

This distribution gets its name from the fact that the probabilities for all the possible outcomes are the same or uniform. No outcome has more chances of occurring than the others in this case.

We use uniform distribution in cases where we assume our outcomes to be equally probable. For example, when we roll a fair dice, we expect any of the six numbers to come up with equal chances. This probabilistic event is a case of uniform distribution for discrete outcomes.

Notice how for each number that may appear on top while rolling a dice, the probability is the same = 1/6 (approx = 0.17). It is a case of Uniform Distribution. Similarly, tossing a fair coin is also an example of uniform distribution.

Quiz: If you have a biased coin that results in Heads more often than Tails. Then what will be its probability distribution curve – Uniform or Bernoulli?

Answer: Since the coin does not result in Heads or Tails with equal probability; hence, we cannot use uniform distribution here. Instead, Bernoulli distribution should be used in this case.

Multinoulli & Multinomial distributions

The Multinoulli distribution, also called the Categorical distribution, covers the case when the possible outcomes are more than two. Due to this nature, it is also called the Generalized Binomial distribution.

For example, during a disease transmission, there can be three possibilities for a given person. Firstly, the person cannot contract the disease; secondly, she can contract and survive the disease and thirdly, she can contract and succumb to the disease. There are three different possible discrete outcomes in this case. Hence its probability follows a Multinoulli distribution. A typical PMF of such an experiment is depicted in the below figure:

Now, if someone has to find out how many people will contract the disease among a population of 1000 individuals. Then, this will again form a bundle of multiple independent and same Multinoulli experiments. The number of infected out of a population then follows a Multinomial distribution. It is a generalization of the Binomial distribution with more than two discrete outcomes. We will omit further discussions on this concept in this article.

Conclusion

Although there are a few other discrete distributions that find their applications in daily life, I have restricted this discussion only to commonly used ones in this article. We have also seen how we can use these concepts to model our surrounding events. I hope you have learned something significant from those examples. I have also disposed of a few tools for your experimentation purpose. Keep playing around with those to learn more conceptual nuisances!