Mean¶

Measures of central tendency describe the center of our distribution of data. There are three common statistics that are used as measures of center: mean, median, and mode. Each has its own strengths, depending on the data we are working with.

Perhaps the most common statistic for summarizing data is the average, or mean. The population mean is denoted by the Greek symbol mu (μ), and the sample mean is written as x̄ (pronounced as X-bar). The sample mean is calculated by summing all of the values and dividing by the count of values.

For example, the mean of [0, 1, 1, 2, 9] is 2.6 because (0 + 1 + 1 + 2 + 9)/5.

You can perform this calculation in Python by writing the same mathematical expression as you would by hand:

One important thing to note about the mean is it is very sensitive to outliers. Outliers are values in our data set that are created by a different generative process than the rest of our data; it is often extreme in value when compared to the rest of the values found in the data set. For example, the 9 from the above example is much larger than the other numbers. This actually made the mean higher than all of the other values found in the set. It is possible that 9 is an outlier; it is hard to say when our sample size is so small and we have no context for the data set. Again, be critical of where your data is coming from and how it is generated.

Data Science 1

Mean¶