Compound Event Probability Calculator

Probability

Calculate probabilities for repeated independent events using the binomial model. Determine the likelihood of seeing an event occur a specific number of times across multiple trials—whether you're rolling dice, analyzing conversion rates, or evaluating quality control outcomes.

OVERVIEW & CONCEPTS

This calculator answers questions like: "If an event has probability \(p\) of occurring on each trial, and I run \(n\) independent trials, what is the probability I see the event occur exactly \(k\) times?" or "What is the probability I see it at least \(k\) times?"

Key concepts illustrated

Binomial distribution: models the number of successes in a fixed number of independent trials, each with the same probability of success. This is the foundation for understanding repeated events in marketing, gaming, quality control, and many other domains.

Independent trials: each trial's outcome does not affect others. For example, each die roll is independent, and each customer's decision to convert can be treated as independent if your sample is large enough relative to the population.

"At least" and "at most" calculations: these use the complement rule or cumulative probabilities. For example, \(P(X \ge k) = 1 - P(X < k) = 1 - P(X \le k-1)\), which the app computes directly from the binomial cumulative distribution function.

Normal and Poisson approximations: when \(n\) is large, the binomial distribution can be approximated by a normal distribution (especially when \(np\) and \(n(1-p)\) are both large) or by a Poisson distribution (when \(n\) is large and \(p\) is small). The app shows these approximations and compares them to the exact binomial probabilities.

Understanding the Binomial Model

The binomial distribution is one of the most fundamental discrete probability distributions, applicable whenever you have:

  • Fixed number of trials: You must know in advance how many times the event will be attempted (denoted \(n\)).
  • Binary outcomes: Each trial results in success or failure (yes/no, convert/don't convert, heads/tails).
  • Constant probability: The probability \(p\) remains the same for every trial.
  • Independence: The outcome of one trial doesn't affect others.

When these conditions hold, the number of successes \(X\) follows a binomial distribution with parameters \(n\) and \(p\), written as \(X \sim \text{Binomial}(n, p)\).

When to Use Approximations

Exact Binomial: Always correct but can be computationally intensive when \(n\) is very large (e.g., \(n > 1000\)). Use this when you need precise probabilities and computational cost isn't a concern.

Normal Approximation: Works well when both \(np \ge 10\) and \(n(1-p) \ge 10\). The binomial distribution becomes symmetric and bell-shaped as \(n\) increases, making it well-approximated by a normal distribution with mean \(\mu = np\) and standard deviation \(\sigma = \sqrt{np(1-p)}\). Use the continuity correction (adding or subtracting 0.5) for better accuracy.

Poisson Approximation: Best when \(n\) is large and \(p\) is small, typically when \(np < 10\) and \(n > 20\). This situation arises in rare-event scenarios (e.g., defects in manufacturing, rare conversions). The Poisson distribution with \(\lambda = np\) provides a good approximation and simplifies calculations.

Common Pitfall: Don't use the normal approximation when \(p\) is very close to 0 or 1 unless \(n\) is extremely large. The distribution will be heavily skewed, and the normal approximation will perform poorly.

Relationship to Other Distributions

Bernoulli Distribution: A binomial distribution with \(n = 1\) is called a Bernoulli distribution. It represents a single trial with success probability \(p\).

Geometric Distribution: While the binomial asks "How many successes in \(n\) trials?", the geometric distribution asks "How many trials until the first success?" If you're waiting for an event to occur (e.g., time until first conversion), use the geometric distribution.

Negative Binomial: Extends the geometric concept: "How many trials until I see \(r\) successes?" This is useful when you're modeling time-to-event scenarios with multiple events.

Hypergeometric Distribution: Use this instead of binomial when sampling without replacement from a finite population. For example, drawing cards from a deck without replacing them violates the independence assumption of the binomial.

MARKETING & PRACTICAL SCENARIOS

Use these presets to explore realistic probability questions: conversion rates in marketing campaigns, quality control in production, dice rolling in games, email open rates, and more. Each scenario demonstrates how the binomial model applies to real-world repeated-event problems.

EVENT & TRIAL SETTINGS

Configure the event and trials

Define a single event with a known probability (like "customer converts", "die shows 1", or "email is opened"), specify how many independent trials you're running, and choose the outcome you want to calculate the probability for (exactly \(k\) successes, at least \(k\), at most \(k\), or between two values).

A short description of the event you're tracking (e.g., "customer converts", "email opened", "die shows 1").

The probability that the event occurs on a single trial. For a six-sided die showing 1, this is \(1/6 \approx 0.1667\).

How many independent trials you're running. For dice, this is the number of rolls.

The threshold you care about—used for "exactly", "at least", or "at most" calculations.

Choose whether you want the probability of exactly \(k\) successes, at least \(k\), at most \(k\), or a range.

Use exact binomial or choose an approximation (normal works when \(n\) is large; Poisson when \(n\) is large and \(p\) is small).

How many Monte Carlo simulations to run for comparison with theoretical probabilities.

More about these settings

Event probability \(p\): the chance the event happens on a single trial. For a fair coin, \(p = 0.5\); for a six-sided die showing 1, \(p = 1/6 \approx 0.1667\); for a marketing conversion rate of 3%, \(p = 0.03\).

Number of trials \(n\): how many times you repeat the event. For dice, this is rolls; for marketing, this might be ad impressions, email sends, or website visits.

Target successes \(k\): the specific count you're interested in. If you want to know the probability of seeing exactly 7 conversions out of 100 visitors, set \(k = 7\) and choose "Exactly \(k\)".

Approximation modes: the exact binomial is always correct but can be slow for very large \(n\). The normal approximation works well when \(np \ge 10\) and \(n(1-p) \ge 10\). The Poisson approximation is useful when \(n\) is large and \(p\) is small (typically \(np < 10\)).

VISUAL OUTPUT

Probability mass function

The chart shows the probability distribution for the number of successes \(X\). Blue bars represent the theoretical probabilities from the chosen model; orange bars show the empirical distribution from simulations. The highlighted region corresponds to your selected probability mode.

Cumulative distribution function

This chart shows \(P(X \le k)\) for each value of \(k\). The cumulative distribution makes it easy to read "at most" probabilities directly and visualize how quickly probability accumulates as \(k\) increases.

How to interpret these charts

The PMF (probability mass function) shows the probability of seeing exactly each possible number of successes. The height of each bar is \(P(X = k)\). When you select "at least \(k\)" or "at most \(k\)", the app highlights the relevant bars so you can see which outcomes contribute to the probability.

The CDF (cumulative distribution function) adds up probabilities from left to right, showing \(P(X \le k)\). This is especially useful for "at most" questions: just read the height of the curve at your target \(k\). The CDF always starts at 0 when \(k\) is below the minimum possible value and approaches 1 as \(k\) increases.

Simulated bars (orange): the app runs Monte Carlo simulations, each consisting of \(n\) independent trials. It counts how many successes occurred in each simulation run, then plots the empirical distribution. With enough simulations, the orange bars should closely match the blue theoretical bars, illustrating the connection between probability models and observed frequencies.

Interpretation Aid: Reading PMF Charts

Shape tells a story: A symmetric, bell-shaped PMF suggests \(p\) is near 0.5 and \(n\) is reasonably large. A right-skewed PMF (tail to the right) indicates \(p\) is small—most trials result in few successes. A left-skewed PMF (tail to the left) indicates \(p\) is large—most trials result in many successes.

Peak location: The mode (highest bar) is near \(np\), the expected number of successes. If \(np = 10\) and the peak is around 10, your distribution is centered as expected.

Spread: A wide PMF means high variability in outcomes; a narrow PMF means outcomes are concentrated near the mean. Variance is \(np(1-p)\), so outcomes are most variable when \(p = 0.5\).

Highlighted regions: When you select "at least \(k\)", the app highlights all bars from \(k\) onward. The total area of highlighted bars equals your target probability. This visual cue helps you see whether you're looking at a tail probability (rare outcome) or a central probability (common outcome).

Interpretation Aid: Reading CDF Charts

Direct "at most" answers: The CDF at \(k\) gives \(P(X \le k)\) directly. If you want to know the probability of 10 or fewer successes, find \(k = 10\) on the x-axis and read the y-value.

"At least" calculations: For \(P(X \ge k)\), use \(1 - P(X \le k-1)\). Find the CDF at \(k-1\) and subtract from 1.

Range probabilities: To find \(P(k_1 \le X \le k_2)\), compute \(\text{CDF}(k_2) - \text{CDF}(k_1 - 1)\).

Steepness: A steep CDF indicates probabilities are concentrated in a narrow range. A gradual CDF means outcomes are spread out. The steepest part of the CDF corresponds to the mode of the PMF.

Comparison with simulation: The orange line shows the empirical CDF from your simulations. With enough simulation runs, this should closely track the blue theoretical curve, validating the binomial model.

MATH DETAILS & WORKED EXAMPLES

Target probability (current mode):
Simulated probability (current mode):
Expected number of successes \(E[X]\):
Standard deviation \(\sigma_X\):

APA-Style Statistical Reporting

Configure event parameters and run simulation to see formal statistical reporting.

Managerial Interpretation

Business-facing interpretation will appear after running calculations.

Interpreting these metrics

Target probability: the theoretical probability for your selected mode (exactly \(k\), at least \(k\), at most \(k\), or between \(k_1\) and \(k_2\)). This is computed from the binomial distribution (or the chosen approximation).

Simulated probability: the proportion of simulated trial sets that satisfy your selected condition. With enough simulations, this should converge to the target probability.

Expected number of successes \(E[X] = np\): the average number of successes you would see if you repeated this experiment many times. For a binomial distribution, this is simply the number of trials times the probability of success on each trial.

Standard deviation \(\sigma_X = \sqrt{np(1-p)}\): measures the typical spread of the number of successes around the expected value. A larger standard deviation means more variability in outcomes.

Understanding the binomial coefficient \(\binom{n}{k}\)

The binomial coefficient \(\binom{n}{k}\), read as "\(n\) choose \(k\)", counts the number of ways to choose \(k\) successes from \(n\) trials when order doesn't matter. It's computed as: \[ \binom{n}{k} = \frac{n!}{k!\,(n-k)!} \] where \(n!\) (n factorial) means \(n \times (n-1) \times \dots \times 2 \times 1\).

For example, if you roll a die 10 times and want to know how many different patterns have exactly 3 ones, \(\binom{10}{3} = 120\) tells you there are 120 distinct ways to arrange three ones among ten rolls.

In the binomial probability formula, \(\binom{n}{k}\) accounts for all the different orderings of \(k\) successes and \(n-k\) failures, which is why it appears in front of the \(p^k(1-p)^{n-k}\) term.

General equations

Worked with your numbers

Complete probability distribution

\(k\) (successes) Theoretical \(P(X = k)\) Simulated frequency Simulated \(P(X = k)\) Cumulative \(P(X \le k)\)
How to read this table

\(k\) (successes): each row corresponds to a specific number of successes. For example, if \(k = 5\), that row shows the probability of seeing exactly 5 successes out of \(n\) trials.

Theoretical \(P(X = k)\): the exact probability from the binomial formula (or chosen approximation). In the example row for \(k = 7\), this might show 0.0523, meaning there's a 5.23% chance of seeing exactly 7 successes.

Simulated frequency: how many of your simulated trial sets resulted in exactly \(k\) successes. For example, if you ran 3,000 simulations and 157 of them had exactly 7 successes, this column shows 157.

Simulated \(P(X = k)\): the simulated frequency divided by the total number of simulations. In the example, 157/3000 = 0.0523, which should be close to the theoretical value if you ran enough simulations.

Cumulative \(P(X \le k)\): adds up all theoretical probabilities from 0 through \(k\). If this value is 0.8234 for \(k = 7\), you can interpret that as "there's an 82.34% chance of seeing 7 or fewer successes."