KEMBAR78
Python Numpy Pandas1 | PDF | Computer Science | Computing
0% found this document useful (0 votes)
10 views11 pages

Python Numpy Pandas1

Uploaded by

Arittra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Python Numpy Pandas1

Uploaded by

Arittra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 11

f=open(r"d:\student.

txt","r")
x=f.read()
print(x)

f=open(r"d:\ret.txt","r+")
d='hello'
f.write(d)
f.close()
f=open(r"d:\ret.txt","r+")
x=f.read()
print(x)

with open('workfile') as f:
... read_data = f.read()
f.closed

f.readline() // to read each line

with open(r"d:\ret.txt","r+") as f:
for x in f:
print(x,end='')

If you want to read all the lines of a file in a list you can also use list(f) or
f.readlines().

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier transform,
and matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you
can use it freely.

NumPy stands for Numerical Python.

In Python we have lists that serve the purpose of arrays, but they are slow to
process.

NumPy aims to provide an array object that is up to 50x faster than traditional
Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting


functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very
important.

NumPy is used to work with arrays. The array object in NumPy is called ndarray.

We can create a NumPy ndarray object by using the array() function.

import numpy

arr = numpy.array([1, 2, 3, 4, 5])

print(arr)
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

print(type(arr))

Create a 0-D array with value 42


import numpy as np

arr = np.array(42)

print(arr)

Create a 1-D array containing the values 1,2,3,4,5:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr)

Check how many dimensions the arrays have:

import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])

print(a.ndim)
print(b.ndim)

Get third and fourth elements from the following array and add them.

import numpy as np

arr = np.array([1, 2, 3, 4])

print(arr[2] + arr[3])
2D Array

Access the element on the first row, second column:

import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1])

Slicing arrays
Slicing in python means taking elements from one given index to another given
index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

Slice elements from index 1 to index 5 from the following array:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5])

Return every other element from index 1 to index 5:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5:2])

By default Python have these data types:

strings - used to represent text data, the text is given under quote marks. e.g.
"ABCD"
integer - used to represent integer numbers. e.g. -1, -2, -3
float - used to represent real numbers. e.g. 1.2, 42.42
boolean - used to represent True or False.
complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j

NumPy has some extra data types, and refer to data types with one character, like i
for integers, u for unsigned integers etc.

Below is a list of all data types in NumPy and the characters used to represent
them.

i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
V - fixed chunk of memory for other type ( void )

import numpy as np

arr = np.array([1, 2, 3, 4])

print(arr.dtype)

import numpy as np

arr = np.array(['apple', 'banana', 'cherry'])

print(arr.dtype)

Create an array with data type string:

import numpy as np

arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)

For i, u, f, S and U we can define size as well.

Create an array with data type 4 bytes integer:

import numpy as np

arr = np.array([1, 2, 3, 4], dtype='i4')

print(arr)
print(arr.dtype)

A non integer string like 'a' can not be converted to integer (will raise an
error):

import numpy as np

arr = np.array(['a', '2', '3'], dtype='i')

The Difference Between Copy and View


The main difference between a copy and a view of an array is that the copy is a new
array, and the view is just a view of the original array.

The copy owns the data and any changes made to the copy will not affect original
array, and any changes made to the original array will not affect the copy.

The view does not own the data and any changes made to the view will affect the
original array, and any changes made to the original array will affect the view.

Make a copy, change the original array, and display both arrays:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])


x = arr.copy()
arr[0] = 42

print(arr)
print(x)

Make a view, change the original array, and display both arrays:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])


x = arr.view()
arr[0] = 42

print(arr)
print(x)

Shape of an Array
The shape of an array is the number of elements in each dimension.

Get the Shape of an Array


NumPy arrays have an attribute called shape that returns a tuple with each index
having the number of corresponding elements.

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

Reshaping arrays
Reshaping means changing the shape of an array.

The shape of an array is the number of elements in each dimension.

By reshaping we can add or remove dimensions or change number of elements in each


dimension.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)

print(newarr)

Try converting 1D array with 8 elements to a 2D array with 3 elements in each


dimension (will raise an error):

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(3, 3)
print(newarr)

Flattening the arrays


Flattening array means converting a multidimensional array into a 1D array.

We can use reshape(-1) to do this.

Convert the array into a 1D array:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)

print(newarr)

Iterate

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
print(x)

Joining NumPy Arrays

Joining means putting contents of two or more arrays in a single array.

In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.

We pass a sequence of arrays that we want to join to the concatenate() function,


along with the axis. If axis is not explicitly passed, it is taken as 0.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

Join two 2-D arrays along rows (axis=1):

import numpy as np

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

Joining Arrays Using Stack Functions


Stacking is same as concatenation, the only difference is that stacking is done
along a new axis.

We can concatenate two 1-D arrays along the second axis which would result in
putting them one over the other, ie. stacking.

We pass a sequence of arrays that we want to join to the stack() method along with
the axis. If axis is not explicitly passed it is taken as 0.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr)

Splitting NumPy Arrays


Splitting is reverse operation of Joining.

Joining merges multiple arrays into one and Splitting breaks one array into
multiple.

We use array_split() for splitting arrays, we pass it the array we want to split
and the number of splits.

Split the array in 3 parts:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)

If the array has less elements than required, it will adjust from the end
accordingly.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

Splitting 2-D Arrays


Use the same syntax when splitting 2-D arrays.

Use the array_split() method, pass in the array you want to split and the number of
splits you want to do.

Split the 2-D array into three 2-D arrays.

import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

newarr = np.array_split(arr, 3)

print(newarr)

Pandas is a Python library.

Pandas is used to analyze data.

Basic-- Pandas Series, Dataframes, Read CSV, Read Json, Analyze DAta
Cleaning data-- clean data, Clean empty cells, clean wrong format, clean wrong
data, remove duplicates
Advanced-- Coorelation, plotting

Load a CSV file into a Pandas DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

What is Pandas?
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
and was created by Wes McKinney in 2008.

Why Use Pandas?


Pandas allows us to analyze big data and make conclusions based on statistical
theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?


What is average value?
Max value?
Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.

import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)
What is a Series?
A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

Labels
If nothing else is specified, the values are labeled with their index number. First
value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

Create your own labels:

import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

When you have created labels, you can access an item by referring to the label.

ExampleGet your own Python Server


Return the value of "y":

print(myvar["y"])

Key/Value Objects as Series


You can also use a key/value object, like a dictionary, when creating a Series.

Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

To select only some of the items in the dictionary, use the index argument and
specify only the items you want to include in the Series.

Example

Create a Series using only data from "day1" and "day2":


import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(myvar)

DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,


or a table with rows and columns.

Series is like a column, a DataFrame is the whole table.

Example

Create a simple Pandas DataFrame:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

Locate Row
As you can see from the result above, the DataFrame is like a table with rows and
columns.

Pandas use the loc attribute to return one or more specified row(s)

Return row 0:

#refer to the row index:


print(df.loc[0])

Return row 0 and 1:

#use a list of indexes:


print(df.loc[[0, 1]])

Named Indexes
With the index argument, you can name your own indexes.

dd a list of names to give each row a name:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

Locate Named Indexes


Use the named index in the loc attribute to return the specified row(s).

Return "day2":

#refer to the named index:


print(df.loc["day2"])

Load Files Into a DataFrame


If your data sets are stored in a file, Pandas can load them into a DataFrame.

Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv(r'C:\Users\Student\Desktop\diabetes.csv')

print(df)

You might also like