Statistics – The Social Science Guy

Confidence intervals can be an easy way to understand the amount of uncertainty in a sample estimate of a population, like the mean or proportion. This allows you to draw inferences on population values from each sample taken.

Making Estimations

How much your next pay raise will be? Perhaps $3,000 or $5,000? Hmm, it’s extremely difficult to estimate an exact number. You would have more success if you estimated your raise within a range of numbers.

Meaning, if you were to say that your next pay raise would be between $1,000 and $10,000, you would be more confident in your estimate. This is essentially how confidence intervals work.

What is a Confidence Interval?

A Confidence Interval (CI) refers to the amount of uncertainty associated with a sample population estimate (the mean or proportion) of a true population.

Say you wanted to determine the average age of victims of robberies in Chicago last year. Now, while there is a true answer, say 30 years old, the best you can do is find an interval that that true answer probably lies in, say, 20-40 years old.

The confidence interval is the sample mean or proportion plus or minus the margin of error (ME), the value used to calculate the upper limit (40) and lower limit (20) of the sample statistic.

Before calculating the CI from a sample mean or proportion, choose either a 90%, 95%, or 99% confidence level (CL). This is the amount of uncertainty in the sampling method. Meaning each time the same sampling method is used, the true population value would be represented in 90%, 95%, or 99% of all the sample estimated CI’s. That also means that 10%, or 5%, or 1% would not contain the true population score.

Calculating CI – Mean

Let’s see how to calculate a confidence interval using the mean.

Identify a population, select a representative sample, and note the number of the sample (n).
Calculate the mean by adding all of the sample values and divide by n.
Select the CL (typically 95%) and locate the corresponding t (z value), which is 1.96 for 95% CI. (NOTE: There are tables of pre-calculated z values for various confidence levels to utilize as a resource).
Calculate the standard deviation (s) by subtracting each value in your sample from the mean, then square each result, then calculate the mean of all of those squared differences. This is known as the variance.
Take the square root of the variance.
Calculate the CI with the following formula:

Note that n uses the degrees of freedom, which is n – 1.

Finally, write out your CI mean ± the margin of error. The CI is between the upper limit (the sample mean plus the margin of error), and lower limit (the sample mean minus the margin of error).

Real World Mean Example

Let’s return to our earlier example. What is the average age of robbery victims in Chicago last year?

Randomly sample 100 police reports of Chicago robberies last year. n = 100
Record the ages of the victims, add them all up all, and divide by 100 to get the mean. Say the mean age in this case is 34.25 years.
Utilizing a 95% CL, which has a standard z value of 1.96; calculate the standard deviation. With a mean of 34.25 and a standard deviation of 10, a margin of error of 8 is calculated with the CI formula.
Your CI is 34.35 ± 8, or 26.35 to 42.35.

You can now say with 95% confidence, that if the true average age of all Chicago robbery victims last year was known, it would fall between 26.35 and 42.35 years of age.

Calculating CI – Proportion

When you’re faced with a population measured by categorical data (ex. gender) you can calculate the CI using a proportion with the following steps and formula:

1. Select an appropriate CL (95% is the most common, which is z*= 1.96).

2. Find the sample proportion by dividing the number of individuals that have the common shared characteristic of interest in the sample and divide that number by the total sample size.

3. Multiply the sample proportion by 1 minus the sample proportion, then divide by the sample size.

4. Take the square root of the result.

5. Multiply the answer by z*(1.96 for 95% CL)

6. Finally use the sample proportion plus or minus the result (margin of error) to give the confidence interval.

Real World Proportion Example

Let’s continue with the theme of robberies. This time, what if you wanted to determine the confidence interval for the percentage of female victims of robberies in Chicago. From the same sample (n=100), let’s say that 57 out of the 100 were female victims.

By dividing the number of females by 100 to get 0.57.
Using a CL of 95% which gives us a z* of 1.96, you first multiply 0.57 x (1-.57) divided by n = 100 which gives you 0.0025.

Then take the square root of that number which gives you .05.
Multiply .05 x 1.96, which equals .1.
Write out your CI with the margin of error, which in this case would be .57 ±.1.

As a result, with 95% confidence, if the actual percentage of female robbery victims in Chicago was known, it would be between .47 and .67, or between 47% and 67%.

Lesson Summary

When studying a population, it is easier to generalize with a sample rather than studying the entire population. A Confidence Interval (CI) refers to the amount of uncertainty associated with a sample population estimate (the mean or proportion) of a true population. It’s represented by the CI ± the margin of error (upper and lower limits of the value).

To calculate the CI using a mean:

Select the CL (typically 95%) and locate the corresponding t (z value), which is 1.96 for 95% CI.
Calculate the standard deviation (s) by subtracting each value in your sample from the mean, square each result, then calculate the mean of all of those squared differences. This is known as the variance.
Take the square root of the variance.
Calculate the CI with the following formula:

To calculate the CI using a proportion, use:

1. Select an appropriate CL (95% is the most common, which is z*= 1.96).

2. Find the sample proportion by dividing the number of individuals of interest in the sample and divide that number by the total sample size.