What is Data Science?

In order to better answer this question, I think it’s important to first ask, “what is data?”. In my opinion, data are descriptions of the world around us, collected through observation. These observations can be about literally anything; from the Earth’s weather to the human mind; data can be collected on any subject that interests you!

These observations allow you to collect a bunch of raw information about the system you are observing. We call this mass of information a data set. Back in the day, data sets lived on paper. As technology advanced, data scientists realized that it would be a lot easier to store data sets on computers. Why did we make the change? First of all, the most powerful data sets are HUGE and maintaining paper records of any decently sized data set is a pain. Imagine if Facebook printed out a copy of every picture that was posted on Instagram? That would equate to 40 billion pictures 1. Where would all that paper live? For context, this is what it would look like to stack 1 billion pieces of paper on top of one another:

Everest vs. Paper


YUP, that’s Mt. Everest (cough THE TALLEST MOUNTAIN IN THE WORLD cough). Don’t forget to multiply that stack of paper by 40! That will give you a good understanding of just how big a SINGLE data set can be these days.

Did I mention modern data sets are HUGE? Without the use of computers, it would be IMPOSSIBLE to analyze and make sense of any of this data. Imagine trying to calculate the average of 1 billion numbers by hand? That’s a hard pass for me.

Fluffy says "no"


We can’t skip the analysis component though because this step allows us to make robust conclusions from incomplete information; it’s how we finding meaning in the world around us! For example, what if we wanted to better understand climate change? First, we would need to observe regional temperatures from across the world, over some period of time. We would do this a lot because the more data we collect, the more likely we can be sure that our conclusions are correct. However, we can’t just look at the data and instantly understand what’s going on. We need statistics to help us see the important trends that secretly exist in our data sets, and we need computers to do this at scale.

Data science is all of these things wrapped up into a single field of study. In other words, data scientists ask questions about complex systems and then try to answer these questions by analyzing data sets using computational tools. They often focus on a specific domain, like music or medicine, but they all utilize this process of thought.

Data science is also hard to get away from these days! Every profession that you can possibly imagine is starting to realize they are in need of a good data scientist. As a result, many universities are creating new data science majors, and requiring most majors (including those in the humanities) to take at least one data science course. In fact, this course was modeled after UC Berkeley’s Data 8, which is UC Berkeley’s new data science class that most majors must take now. If you feel like college isn’t for you, that’s ok! Many companies are starting to realize that they need to provide alternative routes to data science jobs 2.

Before we get started, I want to quickly mention that even though this course is modeled after a college-level class, we do not assume that you have any programming experience or any background in mathematics beyond algebra. So if you’re low-key panicking right now, take a deep breath; you will be okay. :) The world has too many unanswered questions and difficult challenges to leave this type of critical reasoning to only a few specialists. We believe that all members of society (and that means you!) can build the capacity to reason about data. The tools, techniques, and data sets are all readily available; this text aims to make them accessible to everyone.

And with that, it’s time to get started!


1

Just how big is a billion? Check out this website to try and wrap your head around the size of 1 billion.

2

Google is working on creating an alternative certification program in data analysis. Check out this article to learn more.