Summarizing Data¶
So far, we have examined the structure of the dataframe object we creataed from the earthquake data, but we don’t know anything about the data. Pandas provides several methods for easily getting summary statistics and getting to know our data better. Now that we know what our data looks like, the next step is to get summary statistics with the describe()
method.
Again, here is our setup for our earthquake dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame('earthquake.csv')
Here is how we call the describe
method:
df.describe()
We get the 5-number summary along with the count, mean, and standard deviation of the numeric columns.
Output:
The describe()
method makes it easy to get a snapshot of our data, but sometimes we just want a particular statistics, either for a specific column or for all the columns. Pandas makes this pretty easy as well!
The following table includes methods that work for both Series and DataFrames:
With Series objects, we have some additional methods to use to describe our data:
Finally, Index objects also have several methods to help describe and summarize our data: