Mathematics · June 2026

Probability and Statistics Explained

Statistics turns raw data into understanding; probability turns uncertainty into a calculable quantity. Together, they underpin medicine, economics, science, and everyday decision-making — yet they are also easy to misread, which is why learning the fundamentals carefully matters.

Probability: The Basics

Probability measures how likely an event is to occur, on a scale from 0 (impossible) to 1 (certain). A probability of 0.5 means the event occurs in roughly half of all trials over the long run.

Theoretical probability is calculated from the structure of the situation, before any experiment is run:

P(event) = number of favourable outcomes ÷ total number of equally likely outcomes

Rolling a fair six-sided die and getting a 4: there is 1 favourable outcome out of 6 equally likely ones, so P(4) = 1/6 ≈ 0.167.

Experimental (empirical) probability is measured from actual trials:

P(event) = number of times event occurred ÷ total number of trials

If you flip a coin 200 times and get heads 97 times, the experimental probability of heads is 97/200 = 0.485. This will not match the theoretical probability of 0.5 exactly in any finite experiment — but as the number of trials grows, the experimental probability converges toward the theoretical value. This is the law of large numbers.

Complement Rule

The probability that an event does not happen is 1 minus the probability that it does: P(not A) = 1 − P(A). If there is a 30% chance of rain, there is a 70% chance of no rain. This rule is especially useful when it is easier to calculate the complement than the event itself.

Combined Events: And, Or, Not

When two events are independent — the outcome of one does not affect the other — the probability that both occur is found by multiplying:

P(A and B) = P(A) × P(B)

Rolling a 3 on a die and then flipping tails: P(3) = 1/6, P(tails) = 1/2, so P(both) = 1/12.

For mutually exclusive events (which cannot both happen at once), the probability that at least one occurs is found by adding:

P(A or B) = P(A) + P(B)

Rolling a 2 or a 5 on a single die: P(2) + P(5) = 1/6 + 1/6 = 1/3.

When events are not mutually exclusive (they can both occur), you must subtract the overlap to avoid double-counting:

P(A or B) = P(A) + P(B) − P(A and B)

Measures of Central Tendency

Statistics begins with describing a data set. The three standard measures of centre each capture something different.

The mean (arithmetic average) is calculated by summing all values and dividing by the count. It uses every data point and is sensitive to extreme values. A single outlier — a billionaire in a room of teachers — can dramatically raise the mean income of the group without representing anyone's typical experience.

The median is the middle value when data are arranged in order. For an even number of values, it is the average of the two middle values. The median is resistant to outliers, which makes it better for describing skewed distributions. Median household income is more informative than mean household income for most policy purposes.

The mode is the most frequently occurring value. It is the only measure that makes sense for categorical (non-numerical) data — you can find the modal shoe size or the modal favourite colour, but not the mean favourite colour. A data set can have more than one mode (bimodal) or no repeated values at all.

Spread: Range, Variance, and Standard Deviation

Knowing the centre of a data set is not enough. Two classes might both have a mean exam score of 68%, yet one might have scores tightly clustered between 60% and 76%, while the other has scores scattered from 20% to 100%. Measures of spread quantify this difference.

The range is simply the maximum minus the minimum. It is quick to calculate but ignores everything except the two extremes, so a single outlier can make it misleading.

The variance measures the average squared distance of each data point from the mean. Squaring the distances ensures that values above and below the mean do not cancel out, and it penalises large deviations heavily.

The standard deviation (σ for a population, s for a sample) is the square root of the variance. Because it is in the same units as the original data — not squared units — it is far more interpretable. A standard deviation of 8 marks on an exam means that most scores fall within about 8 marks of the mean in either direction.

The Normal Distribution

Many naturally occurring measurements — height, IQ scores, measurement errors, and much else — cluster around a central value with symmetric tails on either side. This bell-shaped pattern is the normal distribution. It is described entirely by two parameters: the mean (which sets where the peak is) and the standard deviation (which controls how wide or narrow the bell is).

The 68-95-99.7 rule (also called the empirical rule) gives quick benchmarks for any normal distribution:

About 68% of values fall within 1 standard deviation of the mean.
About 95% fall within 2 standard deviations.
About 99.7% fall within 3 standard deviations.

If adult male heights are normally distributed with a mean of 175 cm and a standard deviation of 7 cm, then roughly 95% of men are between 161 cm and 189 cm tall, and a man who is 196 cm tall (3 standard deviations above the mean) is in the top 0.15% of the population.

Common Pitfalls in Interpreting Data

Several well-known traps catch students and professionals alike.

Correlation is not causation. Two variables that move together are correlated, but that does not mean one causes the other. Ice cream sales and drowning rates both rise in summer — not because ice cream causes drowning, but because a third variable (hot weather) drives both. Establishing causation requires controlled experiments, not correlation alone.

Sample size matters. A poll of 50 people gives a much less reliable estimate of public opinion than one of 5,000. Small samples produce wide confidence intervals and are vulnerable to random variation producing misleading results.

Misleading graphs. A bar chart with a y-axis that does not start at zero can make small differences look dramatic. Always check the scale before drawing conclusions from a graph.

Summary

Probability quantifies uncertainty on a scale from 0 to 1, using theoretical structure or experimental frequency. Combined events are handled with multiplication (independent) or addition (mutually exclusive). Statistics describes data through measures of centre — mean, median, mode — and spread — range and standard deviation. The normal distribution models many real-world phenomena and allows precise probability statements using the 68-95-99.7 rule. Critical reading of data means watching for the mean/median distinction, sample size, and the trap of inferring causation from correlation.