Central Tendency in Python

Here we continue our discussion of using statistics to analyze data with several additional descriptive statistics, including: • mean—the average value in a set of values. • median—the middle value when all the values are arranged in sorted order. • mode—the most frequently occurring value. These are measures of central tendency—each is a way of producing a single value that represents a “central” value in a set of values, i.e., a value which is in some sense typical of the others. Let’s calculate the mean, median and mode on a list of integers. The following session creates a list called grades, then uses the built-in sum and len functions to calculate the mean “by hand”—sum calculates the total of the grades (397) and len returns the number of grades (5):

grades = [85, 93, 45, 89, 85]
average = sum(grades) / len(grades)

The Python Standard Library’s statistics module provides statements for calculating the mean, median and mode. To use these capabilities, first import the statistics module:

import statistics

Then, you can access these statistics-statements followed by the name of the function to call. The following calculates the grades list’s mean, median and mode, using the statistics module’s mean, median and mode statements:

statistics.mean(grades)
statistics.median(grades)
statistics.mode(grades)

Each statement requires an iterable to work — in this case, the list grades.

The mode statement causes a StatisticsError for lists like [85, 93, 45, 89, 85, 93] in which there are two or more “most frequent” values. Such a set of values is said to be bimodal. Here, both 85 and 93 occur twice. We’ll revisit this problem later this semester.