Pandas is one of the most popular libraries in Python for data science, machine learning and
AI. It is one of my personal favorite libraries, and it's extensively used to read data and to
process data. It's literally one of the best ways because oftentimes data comes in a tabler
format where you have rows and columns, right? Of course, I can store rows and columns of
a data in a numpy array also. But as we will see, pandas has a bunch of very, very useful
functions which will help us wherever you have tabular data, wherever you have rows and
columns. Python within Python pandas is one of the most powerful libraries, especially to
process tabler data. And in data science and machine learning and AI, we do a lot of analysis
of data which is stored in tables. And hence pandas has become one of the most popular and
my personal favorites. Now let's see why it is useful. I'll give you an example by comparing it
against using, by comparing against. How can we do the same tasks with and without
pandas? Let's see an example so that you'll get a better appreciation of why pandas is so
powerful and useful. So for this task, I have a data set called the weather data set where I
have data. So this is how the data set looks like. For each day. I have temperature, dew point,
humidity, sea level, pressure, visibility, wind speed. I have multiple data points like this for
every day. This is one whole month of data for the month of January in 2016. And by the
way, these dates are mmdy. Okay, so those of you who follow the british system, which is dd
m y y, don't get confused. This is January 1, January 2, so on and so forth, right? So all of this
data is in a CSV file. Now to load a CSV file in pandas, it's literally extremely simple. Import
pandas as PD and PD read CSV. It reads a CSV file for you. And I call it df because data frame.
Because Python. Sorry. In pandas, data frame is the most important data structure, just like
in numpy. In numpy, your most important data structure is a multi dimensional array.
Similarly, in pandas, your most important data structure is called a data frame. A data frame
is nothing but a table. Intuitively, a data frame is nothing but a table. Okay, we'll come to
understand what data frames are and some basics of data frames in the next video. For now,
let's try to remember that what pandas stores it as. Let's imagine the data frame is just a
table in which everything is stored now, if I want to visualize a data frame, all I have to do is
just type DF. And this is the table I have. Of course, it has eSt, which stands for eastern
standard Time. I have temperature, dew point, sea level, visibility. Here I have events. If
there is no event that day, it is just nan. Or it could be rain, or it could be fog, snow. Or it
could be snow. Right? There are different types of events, right? You have wind direction,
you have cloud cover, you have precipitation, wind speed, all these features, which are
weather variables. Right? Now, I've simply loaded the data from CSV in literally one line of
code. Just using this one line of code, I have loaded the data from a CSV file by just using
read CSV. Extremely elegant, extremely simple. Now, having seen this, now let's go to
answering our questions. What was our question? Our first question was, what is the
maximum temperature in the month of January? All I have to say is df temperature. DF
temperature. Now, so I have my table like this, right? I have my table where each row
represents a day. Each row represents a day in the month of Jan. Now, I have a temperature
column here. When I say DF temperature, it takes this whole column, and if I say max, it now
picks the max value here. And the max value is 50. Very, very simple. Here. Right here, I
think 50 doesn't mean 50 degrees centigrade, because in New York in summer, you don't
get 50 degrees centigrade. I think it's 50 degrees Fahrenheit, not centigrade. 50 degrees
centigrade, I think, is unheard of in New York City. Anyway, so the next question that we
have here is, you want to know on which dates did you train? That's, again, very simple. So
here you're getting the dates, right. You're saying dfe est, but which dates to pick? You now
have a boolean expression here. This expression says, wherever the events column equals
to rain, right? You're saying df events equals to rain. This is a boolean expression, right?
This will return true only for those rows where the events column had the value rain, and
all such rows you want to pick up. And this is how you literally get it in one line of code.
Similarly, if you want the average wind speed, right, you just get the column, which is wind
speed, miles per hour. You just say df wind speed, miles per hour, mean. So this gets you the
whole column. In your table, right? In your table, there will be a column called wind speed,
wind speed, miles per hour. You just get that whole column, and you say mean, which
means the average. So you literally computed, you literally answered the three questions
you had in literally four lines of code. You loaded the data in one line. And for each of these
questions, to answer these questions, you just literally needed to write one line of code. And
for those of you who have not installed pandas, it's very, very simple. You just go to your
command line. I am hoping that you already have Pip three installed. You just say pip three
install pandas, right? It'll simply install pandas for you. Pip three basically stands for Python
installer, for Python version three. Okay? So if you have only Python three, you could just
say pip install pandas. If you have not installed it, I hope you have already installed it. In
case you've not installed it, please use this. Now comes the fun part because we have gotten
the answers for all of these questions in literally four lines of code. What would happen if
you did not have pandas? Imagine if you did not have pandas, you would have to write a file.
See, this is the size of code that you will have to write if you did not have pandas. It's
actually lot of code, okay? Because computing average wind speed, again, you have to do a
for loop all of that, getting average rhine, you have to go through each getting the days of
rain, similarly getting maximum temperature, actually loading the data itself will take you a
lot of, lot of lines of code, right? Because you'll have to now understand what is a CSV, how
to read a CSV, how to parse a CSV, all that. So what we have here is almost like 30 40 lines of
code which using pandas, I have solved the same problem in literally four lines of code. So
that's the power of pandas. Prior to pandas, we used to write lot of code like this, even for
simple computations. Pandas made our life extremely simple. From 40 lines of code. From
40 lines of code. 40 plus lines of code. We have gone to four lines of code using pandas.
That's the power of pandas, especially for data science where you have to load data and
process data in multiple ways. We'll see, we'll see in the next video about data frames.