Chapter 5. Getting Started with pandas
pandas will be a major tool of interest throughout much of the rest of the
book. It contains data structures and data manipulation tools designed to
make data cleaning and analysis fast and easy in Python. pandas is often
used in tandem with numerical computing tools like NumPy and SciPy,
analytical libraries like statsmodels and scikit-learn, and data
visualization libraries like matplotlib. pandas adopts significant parts of
NumPy’s idiomatic style of array-based computing, especially array-based
functions and a preference for data processing without
for
loops.
While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous numerical array data.
Since becoming an open source project in 2010, pandas has matured into a quite large library that’s applicable in a broad set of real-world use cases. The developer community has grown to over 800 distinct contributors, who’ve been helping build the project as they’ve used it to solve their day-to-day data problems.
Throughout the rest of the book, I use the following import convention for pandas:
In
[
1
]:
import
pandas
as
pd
Thus, whenever you see pd.
in code,
it’s referring to pandas. You may also find it easier to import Series and
DataFrame into the local namespace since they are so frequently used:
In
[
2
]:
from
pandas
import
Series
,
DataFrame