Teaching statistics can be challenging, especially when it comes to concepts related to probability and randomness. However, using programming languages such as R can make these topics more accessible and engaging for students. In this blog post, we will explore how to use R to teach students the following statistical concepts: a random process generates results that are determined by chance, an outcome is the result of a trial of a random process, an event is a collection of outcomes, simulation is a way to model random events, such that simulated outcomes closely match real-world outcomes, the relative frequency of an outcome or event in simulated or empirical data can be used to estimate the probability of that outcome or event, and the law of large numbers states that simulated (empirical) probabilities tend to get closer to the true probability as the number of trials increases.

  1. A random process generates results that are determined by chance: The first step in teaching probability is to introduce the concept of randomness. Randomness is the idea that the outcome of a process is not predictable and can be determined by chance. To demonstrate this concept in R, we can use the sample() function, which generates a random sample of elements from a given set. For example, suppose we want to simulate the flipping of a coin, we can use the following code:
# simulate flipping a coin
coin <- c("heads", "tails")
sample(coin, size = 1)

This code will randomly select either “heads” or “tails” as the outcome of the coin flip.

  1. An outcome is the result of a trial of a random process: Once students understand the concept of randomness, the next step is to introduce the concept of an outcome. An outcome is a possible result of a trial of a random process. To illustrate this concept in R, we can use the same sample() function. For example, suppose we want to simulate rolling a six-sided die, we can use the following code:
# simulate rolling a six-sided die
die <- c(1, 2, 3, 4, 5, 6)
sample(die, size = 1)

This code will randomly select a number between 1 and 6 as the outcome of the die roll.

  1. An event is a collection of outcomes: An event is a collection of outcomes that share a common property or characteristic. To demonstrate this concept in R, we can use the subset() function, which subsets a data frame based on a condition. For example, suppose we want to simulate rolling two six-sided dice and getting a sum of 7, we can use the following code:
# simulate rolling two six-sided dice and getting a sum of 7
die <- c(1, 2, 3, 4, 5, 6)
sum_7 <- subset(expand.grid(die, die), Var1 + Var2 == 7)
sample_n(sum_7, size = 1)

This code will randomly select a pair of numbers between 1 and 6 that add up to 7 as the outcome of rolling two dice.

  1. Simulation is a way to model random events, such that simulated outcomes closely match real-world outcomes. All possible outcomes are associated with a value to be determined by chance. Record the counts of simulated outcomes and the count total: Simulation is a powerful tool for teaching probability because it allows students to model random events and test hypotheses. To simulate events in R, we can use the replicate() function, which runs a function a specified number of times and returns the results as a vector. For example, suppose we want to simulate rolling a six-sided die 100 times, we can use the following code:
die <- c(1, 2, 3, 4, 5, 6)
rolls1 <- replicate(100, sample(die, size = 1, replace = TRUE))
rolls1
##   [1] 4 3 3 4 4 4 5 4 2 3 6 3 3 3 1 2 6 1 3 2 2 3 2 6 4 6 3 2 6 2 3 1 3 6 2 3 6
##  [38] 2 5 5 3 5 6 4 3 2 1 3 6 4 4 1 3 2 2 4 4 5 3 4 2 4 4 5 5 6 5 4 1 4 4 3 3 4
##  [75] 5 2 1 6 1 1 2 2 1 5 5 4 2 1 6 6 4 2 6 4 1 5 6 6 3 1

table(rolls1)
## rolls1
##  1  2  3  4  5  6 
## 13 18 20 21 12 16
rolls2 <- sample(die, size = 100, replace = TRUE)
rolls2
##   [1] 6 5 6 6 3 6 2 1 1 5 5 2 2 4 1 3 2 4 2 2 3 4 1 5 2 6 2 3 4 1 4 1 1 6 5 6 5
##  [38] 3 5 3 4 3 1 2 2 4 3 1 3 1 1 6 5 5 2 1 4 5 4 5 2 2 1 6 4 6 1 4 3 1 2 4 3 2
##  [75] 5 4 6 3 1 5 4 6 4 4 2 3 2 2 6 5 3 2 6 3 5 4 2 3 2 3
table(rolls2)
## rolls2
##  1  2  3  4  5  6 
## 16 21 17 17 15 14

This code will simulate rolling a die 100 times, record the outcomes, and then count the number of times each outcome occurred.

  1. The relative frequency of an outcome or event in simulated or empirical data can be used to estimate the probability of that outcome or event: Once we have simulated data, we can use the relative frequency of an outcome or event to estimate its probability. The relative frequency is the number of times an outcome or event occurs divided by the total number of trials. To calculate the relative frequency in R, we can use the prop.table() function, which calculates the proportion of each value in a vector. For example, suppose we want to estimate the probability of rolling a six on a six-sided die based on 100 rolls, we can use the following code:
die <- c(1, 2, 3, 4, 5, 6)
rolls1 <- replicate(100, sample(die, size = 1, replace = TRUE))
rolls1
##   [1] 5 4 3 6 5 3 6 5 3 5 3 6 1 4 6 5 2 6 6 5 2 3 6 2 3 1 2 3 3 4 2 4 5 4 1 1 6
##  [38] 3 2 2 6 2 2 1 5 5 5 2 3 4 6 1 6 4 1 2 6 1 3 3 5 1 2 1 1 5 5 2 1 3 5 5 6 5
##  [75] 5 6 6 2 4 5 2 6 5 6 2 5 4 6 3 2 6 5 4 1 4 5 5 5 2 1
table(rolls1)
## rolls1
##  1  2  3  4  5  6 
## 14 18 14 11 24 19

rolls2 <- sample(die, size = 100, replace = TRUE)
rolls2
##   [1] 6 6 4 2 2 2 3 6 3 2 2 1 6 6 3 2 3 1 5 1 3 1 3 3 1 2 2 2 5 2 1 2 1 5 1 2 4
##  [38] 1 6 3 4 6 1 4 3 1 1 2 5 6 5 2 6 6 2 4 3 2 1 5 2 3 5 3 5 2 2 6 4 2 2 2 3 4
##  [75] 6 1 4 4 2 3 2 3 4 2 5 3 6 1 3 5 4 1 1 1 5 2 2 1 1 5
table(rolls2)
## rolls2
##  1  2  3  4  5  6 
## 20 27 17 11 12 13

prop.table(table(rolls1))
## rolls1
##    1    2    3    4    5    6 
## 0.14 0.18 0.14 0.11 0.24 0.19

prop.table(table(rolls2))
## rolls2
##    1    2    3    4    5    6 
## 0.20 0.27 0.17 0.11 0.12 0.13

This code will simulate rolling a die 100 times, count the number of times a six was rolled, and then estimate the probability of rolling a six by dividing the count of sixes by the total number of rolls.

  1. The law of large numbers states that simulated (empirical) probabilities tend to get closer to the true probability as the number of trials increases: Finally, we introduce the law of large numbers, which states that as the number of trials increases, the simulated probabilities will converge to the true probability. To illustrate this concept in R, we can simulate rolling a six-sided die 10000 times and compare the estimated probability of rolling a six to the true probability of 1/6 (or 0.1667).
die <- c(1, 2, 3, 4, 5, 6)


rolls <- sample(die, size = 10000, replace = TRUE)


table(rolls)
## rolls
##    1    2    3    4    5    6 
## 1676 1652 1678 1644 1665 1685
prop.table(table(rolls))
## rolls
##      1      2      3      4      5      6 
## 0.1676 0.1652 0.1678 0.1644 0.1665 0.1685

This code will simulate rolling a die 10000 times, estimate the probability of rolling a six, and then compare it to the true probability of 1/6. As the number of trials increases, the estimated probability will get closer and closer to the true probability.

In conclusion, using R to teach statistics students the concepts of probability and randomness can be a powerful tool to make these concepts more accessible and engaging to your students. By simulating events and estimating probabilities, students can develop a better understanding of these abstract concepts and how they apply in the real world.