Sampling¶

The data we have is referred to as the sample, which was observed from a population. There is an important thing to remember before we attempt any analysis; our sample must be a random sample that is representative of the population. This means that the data must be sampled without bias. For example, if we are asking people if they like a certain sports team, we can’t only ask fans of the Rockies or just female fans. Ideally, we should have members of all distinct groups from the population in our sample.

In this class, we will be using pre-existing data sources for many of our projects. Although you would hope that its collector went out of their way to ensure the sample was as random as possible, it is best to be critical about what is included in the data set and what is excluded; both cases can result in bias that you will need to address if your “data story” is going to withstand critique.

Data Science 1

Sampling¶