0 ratings0% found this document useful (0 votes) 31 views28 pagesNumpy
Foundation of data science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
F
Foundations of Data
412
iption
Data type one
a byte
bool_ Boolean (True or False) stored as
a ther int64
int Default integer type (same as C long: normally ether nt64 of in,
nt64)
inte Identical to C int (normally int32 oF ! |
imtp Integer used for indexing (same as C ssizet: normally either ing) ,
int64)
ints Byte (-128 to 127) .
intl6 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
into4 Integer (-9223372036854775808 to 9223372036854775807)
ui Unsigned integer (0 to 255)
uintl6 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64
floatl6
Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64, Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128
complex64 Complex number. represented by two 32-bit floats
complex128 Complex number, represented by two 64-bit floats
4.2 THE BASICS OF NUMPY ARRAYS
Data manipulation in Python is nearly synonymous with NumPy am
manipulation: even newer tools like Pandas are built around the NumPy array
Categories of basic array manipulations here:
(a) Attributes of arrays Determining the
size, shape, memory
consumption.
and data types of arrays.
(b) Indexing of arrays Getting and Setting the value of individu!
array elementsUU
python Libraries for Data Wrangling 413
(c) Slicing of arrays : Getting and setting smaller subarrays within
a larger array
(d) Reshaping of arrays Changing the shape of a given array
(e) Joining and splitting of arrays Combining multiple arrays into one, and
splitting one array into many
(a) NumPy Array Attributes
Define free random arrays, a one-dimensional, two-dimensional, and
three-dimensional arrays using NumPy’s random number generator. We will seed with
a set value in order to ensure that the same random arrays are generated during each
execution:
import numpy as np
np.random.seed(O}# seed for reproducibility
.5)# One-dimensional array
=(2.4)}# Two-dimensional array
=(3.4.5))# Three-dimensional array
x/=np.random.randint( 10,si
42=np.random.randint( 10,size
33=np.random.randint( 10.
Each array has the following attributes
ndim : number of dimensions
shape : size of each dimension
size : total size of the array
atype : data type of the array
itemsize + lists the size (in bytes) of each array element
nbytes lists the total size (in bytes) of the array:
Array attributes of the array x3 can be printed using the following code.
Print("x3 ndim: ", x3.ndim)
Print("x3 shape:”, x3.shape)
Print"x3 size: ", x3.size)
Print “dtype:". x3.dtype)
Print “itemsize:”, x3.itemsize. “bytes")
Print"nbytes:", x3.nbytes, “bytes”)
OUTPUT
ndim: 3
13
shapes (3, 4. 5)
aoo”
Foundations of Data
4.14 Scien,
23 size: 60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes
(b) Array Indexing: Accessing Single Elements
Array indexing is the same as accessing an array element. We can access a
array element by referring to its index number. The indexes in NumPy AITAYS stg
with 0, meaning that the first element has index 0, and the second has index | e
Inf5]: x1
Out[5]: array({5, 0, 3. 3. 7, 9])
In[6]: x1{0]
Out[6]: 5
Inf7]: x1[4]
Out{7}: 7
Example
Get third and fourth elements from the following array and add them.
import numpy as np
arr = nparray({1, 2, 3, 4])
print(arr[2] + arr{3])
To index from the end of the array,
you can use negative indices:
In[8]: x1[-1]
Out[8]: 9
In(9]: x1{-2]
Out[9]: 7
In a multi-dimensional array. items can be accessed using a comma-sepi™
tuple of indices:
In[ 10]: x2
Out[ 10]: arrav({[3. 5, 2. 4}.
17, 6. 8 8].
[1. 6.7. 7)))
In[11]: x2{0. 0} Haccess the element at 0
Out{ 1]: 3
Inf 12]. x2[2, O] faccess the elemeny at 20
Out 12}: 1
Inf 13]e x2[2. -1]
access the elemem ay >)Pe
pynon_Uibraries for Data Wranging a
oui[ 13]: 7
Example
Access the third element of the second array of the first array:
import numpy as np
an = np.array({[[1. 2. 3]. (4. 5. 6]. [17. 8. 9}. (10, 11, 1211)
prindarr(0. 1. 21)
ouput
6
Elements can be modified using any of the above index notation:
Inf 14]; x2[0, 0] =20
2
Oui[14]: array({[20, 5. 2. 4].
17, 6, 8 8).
(1. 6, 7, 71D)
NumPy arrays have a fixed type and in an attempt to insert a floating-point
value to an integer array, the fractional part will be truncated automatically.
Inf15|: x1[0] =3.14159# this will be truncated!
al
Out 15]: array({3, 0. 3. 3, 7. 91)
(©) Array Slicing: Accessing Subarrays
Square brackets can be used to access the subarrays with the slice notation,
marked by the colon (:) character The NumPy slicing syntax follows that of the
standard Python list; to access a slice of an array x. use this:
x{start:stop:step |
© If value for start is not given. It Is considered as 0.
© If value for end is not given. It considered as length of array in that
dimension
© ff value for step iy not given. ICIS considered as
Slicing in One-dimensional subarrays
?>>4 =nparranget !0)
>>>>4
array 12 4 4 $8.6 78 OTD
Pvt SY] m first five elements
arrasfo pot APFoundations of Data Scien
4.16
>>>>x[5:] # elements after index 5
arrav([5, 6. 7, 8, 9])
>>>x/4:7] # middle sub-array
arrav(|4, 5, 6]) .
ie 2] # en other element, default start and end and step value is 2
array([0. 2. 4, 6, 8])
>>>a{1::2] # every other element, starting at index 1
arrav({ 1, 3, 5, 7, 9})
Use the minus operator to refer to an index from the end. When the Step valy
1s Negative, the defaults for start and stop are swapped. This becomes a convenie
way to reverse an array:
>>>xfez-1] # alll elements, reversed
arra([9, 8, 7, 6, 5. 4. 3, 2. 1. O))
>>>x/5::-2] # reversed every other from index 5
array({S, 3. 1])
Example
Slice from the index 3 from the end to index 1 from the end:
import numpy as np
arr = np.arravi[1, 2. 3. 4. 5. 6, 7, 8})
printtarr{-3:-1]) # from -3 column upto -2 column, exclude «1 column
OUTPUT
{6 7]
Multi-dimensional subarrays
Multi-dimensional slices work i
: mthe same way, with multiple slices separ
by commas. For example:
>>>
arravt{{20. 5. 2. 4).
17.6.8 8].
£16.77"
PPP2L2 31 # owe rOWMOKL roms), three
cola. & 9 o
arravt{20. 5. 2). 2 columny)
17.6. 8)ppython Libraries for Data Wrangling _ 4.17
poradi3. : 2] # all rows, every other column
array(({20. 2].
17. 8).
Lh 7p
sub-array dimensions can even be reversed together:
popadficd el]
ravi 7. 7. 6 I.
18 86. 7].
[ 4. 2, 5. 20)
Accessing array rows and columns
|
|
Single row or column of an array can be accessed by combining indexing and
slicing, using an empty slice marked by a single colon (:):
>>pprin(x2[:.0])# first column of x2
1271)
>>>print(x2(0,:]# first row of x2
112524)
>>>printtx2(0])# equivalent to x2{0, :]
12524)
Subarrays as no-copy views
Array slices return views rather than copies of the array data. NumPy array slicing
differs from Python list slicing as in lists, slices will be copies.
Consider the two-dimensional array of the previous example
>>>printtx2)
(205 2 4)
17688)
llo7r7y
Let's extract a 22 sub-array from this:
>>>x2_sub = x2/-2, 22)
Print(x2_subj
IRo 5
1764
Changes made in the subarray will be reflected in the onginal array also
>>>.2_subfU. 0) =200
»>>>print 2 sub)ions _of D:
Ae Foundation ata Scien
([200 5]
(7 6)
>>>printix2)
1(200 5 2 4]
17688)
116771)
This default behaviour is actually quite u:
pieces of these datasets without the need to Copy the undet
seful and we can access and proces;
rlying larger data buffer
Creating copies of arrays
views. it is sometimes useful to explicit
Despite the significant features of array
This can be done with the copy() method
copy the data within an array or a subarray.
>>>x2_sub_copy=x2f:2.:2).copv)
>>>printx2_sub_copy)
1(200 5}
17 63)
Changes made in this subarray will not affect the original array.
>>>x2_sub_copy[0.0]=402
>>>print(x2_sub_copy)
[[402 3]
(76)
>>>print(x2)
{200 5 2 4)
[7688]
11677]]}
(d) Reshaping of Arrays
Another useful type of operation is reshaping of arrays. It is done using restuf
method. For example. to put the numbers | through 9 in a 3x3 grid, we caf bo
the following ee
>>>grid =nparrange( |. 10)-reshapet(3. 3)
>>>prinugrid)
mea
145 6]
178911python Libraries for Data Wrangling 4.19
Note
« The size of the initial array must match the size of the reshaped array.
«The reshape method will use a no-copy view of the initial array, but with
non-contiguous memory buffers this is not always the case.
Another common reshaping pattern is the conversion of a one-dimensional array
into a two-dimensional row or column matrix. This can be done with the reshape
method, or by making use of the new axis keyword within a slice operation:
p>ox =np.arrayi[l, 2, 3})
# row vector via reshape
>>>xreshape((I, 3))
array{{1, 2, 3]])
# row vector via new axis
>>>x[np.newaxis, :]
arrav({{1, 2. 31)
# column vector via reshape
>>>xreshape((3, 1))
array {{1].
(2).
BI)
# column vector via newaxis
>>>x/-, np.newaxis}
arrav({{ 1],
12).
BI)
(©) Array Concatenation and Splitting
All of the preceding routines worked on single arrays. It's also possible to
Combine multiple arrays into one. and to conversely split a single array into multiple
arrays, .
Concatenation of arrays
Concatenation, or joining of two arrays in NumPy, is done using the routines
"Pconcatenate. np.vstack, and np.hstack, np.concatenate takes a tuple or list of arrays
48 its first argument. as seen here
MI \snp.arravt{ 1.2.41)
venp.arrav({3.2.11)
up-concatenatet|\.0])Out{}: array({1. 2. 3, 3, 2, 1)
Inf ]: z = [100, 200, 300]
print(np.concatenate([x. ¥, =])) # ¢
Out{J: {1 2 3 32 1 100 200 300]
concatenate 3 array
Concatenation of two-dimensional arrays
grid=np.array({{1, 2, 3).
14, 5, 61)
In [J:# concatenate along the first axis
np.concatenate({grid. grid])
Outf J:array({{1, 2. 3].
(4, 5. 61,
(1. 2, 3}.
14. 5, 6)))
In []:# concetenate along the second axis (zero-indexed)
np.concatenate({grid, grid], axis=1)
Out{|:array({[i. 2. 3, 1. 2. 3}.
14, 5. 6. 4. 5, 61D)
Concatenation of arrays of mixed dimensions
For working with arrays of mixed dimensions,
#np.vstack :vertical stack
#np.hstack :horizontal stack
#np.dstack :stack arrays along the third axis
Inf J: x =np.arrasi{19, 20, 30))
grid=np.array([[9. 8. 7].
16. 5. 4/)
# vertically stack the arravy
np.vstack([x, grid])
Out }:arravi{[{10. 20. 30}.
19. 8. 7).
16. 5. 41)
In []: horizontally stack the arrays
y =np.arravi{ [100],
{100]))
ap-hstack [grid, y])
Foundations of Data Scien,pynor Libraries for Data Wrangling 4.21
pullaravl 9. 8 7, 100).
16. 5. 4, 100)})
splitting of arrays
Splitting breaks one array into multiple. The opposite of concatenation is splitting.
which is done by the functions np.split().np-hsplit(and np.vsplit(). pass a list of indices
giving the split points to these functions. N split-points, leads to N + 1 subarrays.
array_split(): for splitting arrays. pass the array to split and the number of splits.
The return value of the array_split() method is an array containing each of the split
as an array. Each split can be accessed just like any array element:
tn [Js x =[1,2,3,100,100.3.2.1]
cd.x2.x3=np.split(x,[3.5]} # 3 and 5 are the split points
print(x1.x2.x3)
uf}: [1 2 3) (99 99] [32 1)
numpy.vsplit() function split an array into multiple sub-arrays vertically
(row-wise). vsplit is equivalent to split with axis=0 (default), the array is always split
along the first axis regardless of the array dimension.
Syntax:
Aoump. s sphivary. mdices ov sections)
Mh [Jerit=np arranges 16) reshape)
end
Out) arravsi} 01 2 I
ieee
P89 tod
HD 14 4 ISI
my
upper dom eveng vyplingrul (21)
Pratapper)Foundations of Data
4.22
printilower)
Out{} {{0 1 2 3]
14567)
W89 10 1}
11213 14 USI)
The hyplit() function is used to split an array int
(column-wise). split is equivalent to split with axis
the second axis regardless of the array dimension
9 multiple sub-arrays horizon
|. the array 1s always split ‘lon,
np nspiit,2)
‘
fof
roy (BH
ta) ED
Syntax:
#numpy.hsplittary, indices_or_sections)
In []: lefiright=np.hsplit(grid.[2))
print(left)
print right)
Oul}: (101)
1451
P89)
(12 13))
23)
167)
0 I]
[14 15/1
** Similarly. np.dsplit will split arrays along the third axis
4.2 AGGREGATIONS
Computing aggregations gives insight into the nature of a potentially large dl
While processing large amount of data, the first step is to compute summary sill
for the data considered for analysis, Python numpy module has many wee!
’ as many age!\n Libraries for Data Wranglin,
423
functions to work with a single-din
functions are sum, min, max,
variance, argmin,
mensional
Mean, average,
argmax, percentile, cumprod,
Or multi-dimensional array. These
Product. median. standard deviation,
cumsum, and corrcoef.
(a) Sum
Built-in sum function:
alues in an array using the built-in sum
function:
>>>import numpy as np
peal = np.random.random; 100)
>>>sumt{L)
35612091 16604941
NumPy’s sum function: computin;
i the sum of all values in an array using
NumPys sum function is shown below:
>>>np.sum(L)
55612091 166049424
Python sum() Vs Numpy sum()
>>>big_array=np.random.rand( 1000000)
imeitsumibig_array)
Sumeitnp.sumibig_array)
67.9 ms 989 %s per loop (mean std. dev. of 7 runs. 10 loops each)
233 %s 3.16 %s per loop (mean std. dey. of 7 runs, 1000 loops each)
Numpysum() executes the operation in compiled code and 1s done much more
Wickly. This Python numpy sum function allows to use an opuonal argument called
@ axis. This helps to calculate the sum of a given axis
Original arrayFoundations of Data Sores
4.26
Syntax:
numpy.sum(a, axis, dtype, out) +
over the specified axis.
This function returns the sum of array clereg,
@ : input array.
jt is to be flattened.
axis : axis along which to calculate the sum value. Defaul
axis = 0: along the column
axis = 1 : along the row
out : Different array to place the result.
expected output. Default is None.
initial : (scalar. optional] Starting value of the sum.
Return : Sum of the array elements (a scalar value if axis is none) or array wi
sum values along the specified axis.
The array must have same dimensions
# Python Program numpy.sum{) method
import numpy as np
# 1D array
arr =np.array{20, 2. .2. 10. 4]
print(’\nSwm of arr >“. np.sumarr))
print("Sum of arrtuint8) : ". np.sumtarr. divpe =np.uint8))
print(“Sum of arrfloat32) :
". mp.sumarr. dtype =np.float32))
Output
Sum of arr : 36.2
Sum of arniuint8) + 36
Sum of arrtfloat32) + 36.2
In the following example. axis = 0 & axis-1 to find the sum of each colum
and row in an Numpy array. t
# Python Program for numpy.sumt) method
import numpy as np
# 2D array
arr = aparras{{14. 17, 12, 33. 44)
HIS, 6, 27. 8. 19)
(282 SIAL
pra NiSian of are > ap. sumtarey)
pan Sun of urrasis = ty
MP untarr avy =Oy)
prinn Sum of armas = 1) ~ ap smmare uci, =p
aan =), eens
Libraries for Data Wranglir
para asim of arr Keep dimension is True: \n
npsun(arr, axis =1, keepdims =True))
Output
sum of arr: 279
sun of arraxis = 0) : [52 25 93 42 67)
sum of arrtaxis = 1): [120 75 84]
(b) Minimum and Maximum
Similarly, Python has built-in min and max functions. used to find the minimum
value and maximum value of any given array:
mintbig_array), max(big_array)
Outf |:
{117171281 366346 14e-06, 0.99999767849687 16)
NumPy’s corresponding functions have similar syntax. and again operate much
more quickly:
In []:np.min(big_array). np.maxtbigarray)
Out{|:(1.17171281 366346 146-06, 0.99999767849687 16)
In [|:Stimeitmin( big_array)
Stimeitnp.min{ big_arrav)
10 loops, best of 3: 82.3 ms per loop
1000 loops, best of 3: 497 %s per loop
For min, max. sum, and several other NumPy aggregates. a shorter syntax ts to
use methods of the array object itself:
In [8]:printt big_array.mint), big_array.max, big_areay.sumt))
1.171712813660-06 0.999997678497 49991 1.628197
Example program
import numpy library
import numpy
# creating a munpy array of integers [ID array]
wr snumpy arravi[ 10. 2. 40. 83. 32.770
# finding the maximum and miniman clement in the array
Mma clement =numps. monary
‘min element =numpy nnnFoundations of Data Sey,
4.26
# printing the result
print(‘maximum element: "max_element)
print('minimum element: *. min_element)
Output
85
minimum element in the array is: 2
maximum element in the array is
Multidimensional aggregates
row or column in
an be done along @
¢ Aggregation operation ¢:
two-dimensional array.
© By default, each NumPy aggregatio
entire array
¢ Additional argument of the aggregate function 5]
the aggregate is computed.
aggregate on each column. values within each column will b
n function returns the aggregate over i
pecifies the axis along whic
© axis=
aggregated
+ aggregate on each row, values within each row will be aggregu
© axis:
Example program for 2D array aggregation
# import numpy library
import numpy
# creating a two dimensional
# numpy array of integers
a =numpy.array({{1], 22, 3].[4. 55. 161,17, 88, 22] ))
# finding the maximum and minimun element
max_element =numpy.masta)
min_element =numpy.min(a)
# printing the result
print('‘maximum element’, max_element)
print('minimum element:’, min element)
Output
maximum element: 88
munimum element: 3ype branes for Data Wrangiing 427
gsample program to aggregate 2D array
sg amgort mumps librars
gant MUM. aS: Np
z creating @ two dimensional numpy array of imegers
y= mparras([[ 11. 28. 31.14, 55. 16117, 88, 224))
4 finding the maximum and minimum element in each column and row
nax_element_column = np.maxta, 0) #column
nox_clement_row = np.maxta, 1) #row
in_clement_column = np.amin(arr, 0)
un_element_row = np.amintarr, 1)
column and row aggregation
# panting the result
prot maximum elements in each column:’. max_element_column)
prnt‘maximum elements in each row:'.max_element_row)
print'minimum elements in each column:”
print!'minimum elements in each row
Output
aximum elements in each column : [11 88 22]
maximum elements in each row : [28 55 88]
minimum elements in each column ; [4 28 3]
min_element_column)
min_element_row)
inimuom elements in each row : [3 4 7]
Other aggregation functions
SumPy provides many other aggregation functions and a few of them are listed below.
aggregates have a NaN-safe version that ignores missing values and compute
be result.
"psu np.nansum Compute sum of elements
"prod np.nanprod Compute product of elements
*p.meun np.nanmedn Compute mean of elements
"patd np.nansid Compute standard deviation
Par np.nanvar Compute variance
“min np nannut Find minimum element
PP us yp manna Find maximum element
Bargmm up rena gaun Find index of minimum element4.28 Foundations of Data So,
np.argmax np-nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np-percentile np.nanpercentile Compute rank-based statistics
elements
np.any N/A Evaluate whether any elements ate
true
np-all N/A Evaluate whether all elements ae
true
We will see these aggregates often throughout the rest of the book.
Example program to compute aggregation values
import numpy as np
array! = np.array({{10, 20, 30}. (40. 50, 60]])
print("Mean: ", np.mean(array!))
np.std(array!))
np.vartarrayl))
, Mp.sum(array!))
print("Prod: ", np.prodiarray1))
OUTPUT
Mean: 35.0
Std: 17.07825127659933
Var: 291.6666666666667
Sum: 210
Prad: 720000000
4.3 COMPUTATION ON NUMPY ARRays: UNIVERSAL FUNCTIONS
¢ A universal function (or
ndarrays in an element-by
ulune for shorty ix
* @ function that operdle®
-element fashion
e@ = It is a “vectorized™ wrapper for a 0
. une i’ bet
specific inputs and produces a fixed aS eee
er Of specitic outputs,
* These functions include standard
trig . .
arithmetic operations. handling comple nee functions, functo™ |
umbers,
fi
n
se
dlatistical functio™4.29
non Libraries for Data_Wranglin,
Challenges with Python Loops
python’s default implementation (known ax CPython) does some operations very
qowly. This is because of the dynamic and interpreted nature of the language
As data types are flexible, the sequence of operations cannot be compiled down
to efficient machine code. Python first examines the object's type and does a dynamic
jookup of the correct function to use for thit type
import numpy as mp
np random.seedO)
def compute_reciprocals( values):
output=np.empty(len(values)) :
for i in range(len(values)):
output[i] =1.0/ valuesli]
return output
.random.randint(1, 100,
big_array=
Gaimeit compute_reciprocals(big_array)
OUTPUT
I loop, best of 3: 2.91 s per loop
It takes several seconds to compute these million operations and to store the
result.
NumPy provides an efficient interface for this kind of compiled routine with
static type. This can be achieved by simply performing an operation on the array that
in turn applied to each element. This vectorized approach moves the loop into the
compiled layer that underlies NumPy and makes the execution faster.
43.1 Characteristics of ufunc
These functions operate on Numpy ndarray.
© Tt implements fast element-wise array operations
* It supports various features like array broadcasting, type casting ete.
© Numpy. universal functions are objects of numpy.ufune class.
«Python functions can alko be created as a universal function using from
pyfune library function.
d automatically during array arithmetic operations. For
© Some ufunes are calle
J internally to add two array using “+ operator
example.np.add(Q) ts calles
mPy arrays is very fast as they use vectorized operations
* Computation on Nu
fh NumPy’s universal functions (utunes)
implemented throug!Foundations of Data
4.30 Scien
There are two types of ufuncs:
(i) Unary ufuncs which operate on single input.
(ii) Binary ufuncs which operate on two inputs.
NumPy’s ufuncs make use of Python's native arithmetic aan The stand
addition. subtraction, multiplication. and division can all be used.
The following table lists the arithmetic operators implemented in NumPy:
paca Equivalent ufunc | Description
+ (npadd Addition (e.g. 1+ 1=2)
- np.subtract Subtraction (e.g.. 3-2 = 1)
- np.negative Unary negation (e.g.. — 2)
: np.multiply Multiplication (e.g.. 2* 3=6) 7
7 npdivide Division (eg, 32=15) "
77 np floor_divide Floor division (e.g. 37 2=1)
* np.power Exponentiation (e.g.. 2 ** 3 =8)
% |np.mod Modulusremainder (eg.9%4=)
import numpy as np
# Array Arithmetic
x = np.arrange(4)
print("x =", x)
prints + 5 =" 4 45)
print’. 5 x -5)
print "y #2 =" 2)
print" /2 =". x 2)
print" 1/2 =". « M2) # floor division
OUTPUT
v= (0123)
v4 55/5678)
v- Sf -$ -2/
vt25 40246]Python Libraries for Data Wrangling 4.31
1/2 210.05 1. 1.5]
22 (0011)
Unary ufunc
Unary - : Negation
** operator : Exponentiation
% operator > modulus
import numpy as np
# Array Arithmetic
x = np.arrange(4)
print * x)
print("x ** 2 =", x *#2)
print("x % 2 =", x %2)
OUTPUT
x= 10-1 -2-3]
r*2=/014 9)
1%2=/0101)
Trigonometric functions
These functions work on radians, so angles need to be converted to radians by
multiplying by pi/180. Only then we can call trigonometric functions, They take an,
array as input arguments. It includes functions like-
Description
Function
Sin, cos, tan compute sine, cosine and tangent of angles
calculate inverse sine
resin, arecos, arctan
hypot
sinh, cosh, tanh [eompute hyperbolic sine. cosine and tangent
calculate hypotenuse of given right triangle
aresinh, arccosh. arctanh [compute inverse hyperbolic sine, cosine and tangent
convert degree into radians
convert radians into degree|
4.32 Foundations of Data Soe,
import numpy as np
# Trigonometric functions
theta = np.linspace(0, np.pi, 3) #3 elements.between 0 and pi
print("theta =", theta)
print("sin(theta) =
prini("cos(theta)
Print("tan(theta)
OUTPUT
theta = [ 0. 1.57079633 314159265]
Sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16]
Cos(theta) = { 1.00000000e+00 6.12323400¢-17 - 400000000 +00]
fan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]
Absolute value
NumPy ufune nj
values.
np.sin(theta))
np.cos(theta))
~ mp.tan{theta))
p-absolute and its alias np.abs can be used for finding the abso
# Absolute value
import numpy as np
x=np.array([-2,-1,0,1,2])
print(np.absolute(x))
print(np.abs(x))
OUTPUT : — arraw({2. 1, 0, 1, 2)
array((2, 1, 0, 1, 2})
This ufunc can also handle complex data, in which the absolute value re
the magnitude:
x=np.array\[3-4j. 4-3), 2+0j. 0+1}])
np.abs(x)
Output: array({ 5. 5., 2. 1.)
Exponents and logarithms
import nunpy as np
x = ({10. 20, 30})
print("x =", x)
print("x =". np.exp(x))
print("2hx =", np.exp2x))
print’"3x =", up.powert3, xi)pynon Libraries for Data Wrangling 4.33
OUTPUT
y= (10, 2
ere
), 30)
1264658e+04 4.85165195e+08 1.06864746e+13]
1.02400000e+03 1.04857600e+06 1.07374182e+09]
x = [ 59049 3486784401 205891132094649)
The inverse of the exponentials, the logarithms, are also available. The basic
aplog is used for natural logarithm; Base-2 logarithm or the base-10 logarithm can
also be computed using the respective ufunc.
omport numpy as np
r=({l, 2, 4, 100))
print "x x)
print “In(x) =", np.log(x)
print("log2(x) =", np.log2(x))
print “loglO(x) =", np.log!0{x))
OUTPUT
r= (I. 2, 4, 100]
Ix) = [0. 0.69314718 1.38629436 4.60517019]
log2ix) = [0, 1. 2. 6.64385619]
logtO(x) = [0. 0.30103 0.60205999 2. |
Specialized versions for maintaining precision with very small input. When x is very
small, these functions give more precise values than the raw np.log or np.exp were to be
used.
"= (0. 0.001, 0.01. 0.1]
Print “exp(x) - 1 =", np.expml(x))
Prat "log 1 + x , np.loglpx))
Output
YIN) <1 = J 0. 0.0010005 0.01005017 0.10517092]
log + x1 = 1 0. 0.009995 0.00995033 0.09531018)
Specialized ufunes
Num! more ufuncs available, including hyperbolic trig functions,
bitwise a en operators, conversions from radians to degrees,
"ounding and remainders, etc.
zed and obscure ufuncs is the submodule scipy.special. If
Another more specialt A
is thematical function on the data, chances are
{°t Want to compute some obscure ma
'S implemented in scipy.specialFoundations of Data S
4.34 CT
from scipy import special .
“ ‘tions:
# Gamma functions (generalized factorials) and related functi
x = [1, 10, 100)
print("gamma(x)
special.gamma(x))
print("In|gamma(x)| =", special.gammatn(x))
print("beta(x, 2) special.beta(x, 2))
OUTPUT:
gamma(x) = [1.000000000+000 3.628800006¢ +005 9,33262154e+155]
In|gamma(x)| = [ 0. 12,80182748 359.13420537]
beta(x, 2) = [5.00000000e-01 9.09090909e-03 9.90099010e-05]
# Error function (integral of Gaussian)its complement,
x =np.array([0, 0.4, 0.7, 1.0])
print(“erfx) =", special.erfix))
print("erfetx) =", special.erfe(x))
print( “erfin(x) =", special.erfinv(x))
and its inverse
OUTPUT
erftx) = [0. 0.42839236 0.67780119 0.84270079]
erfe(x) = [1. 0.57160764 0.32219881 0.15729921]
erfinv(x) = [0. 0.37080716 0.73286908 inf]
Advanced Ufunc Features
. Specifying output : out argument
Out argument is used to write computation results directly to the memory locatidt
For all ufuncs, this can be done using the out argument of ihe function:
#out argument to store the output .
import numpy as np
x = np.arrange( 10)
y = np.empty(10)
np.multiply(x, 2, out=y)
print(y)
OUTPUT
[ 0. 2.4.6. 8 10. 12. 14. 16. 18)
This can be used with array views.
every other element of a specified array.
out argument can be significant.
Ri {
rae of a computation can be store!
In larger ay if
si
rays the memory savings “yn Libraries for Data Wrangling 4.35
Inf] 2 = np.arrange(5)
y = mp.ceros(10)
np.power(2, x, out=yl::2])
print("x: "\x)
print"y: ".y)
Out}: x: [0 1 2 3 4]
¥e[ 1. 0. 2.0. 4.0.8.0. 16.0.)
Aggregates
reduce method: A reduce repeatedly applies a given operation to the elements
of an array until only a single result remains.
#reduce on the add ufunc returns the sum of all elements in the array:
+ =np.arrange( I, 6)
‘padd.reduce(x)
ourPuT; 15
# reduce on the multiply ufunc results in the product of ull array elements:
* =np.arrange( 1, 6)
"p-multiply.reduce(x)
OUTPUT: 120
Accumulate method: to store all the intermediate results of the computation
* = np.arrange( |. 6)
"Padd.accumulate(x)
OUTPUT: array({ 1. 3, 6. 10. 15)
Suter products
Outer method: ufunc can compute the output of all pairs of two different inputs
Using the outer method. This creates output similar to a multiplication table
"= nparrangett, $)
"Paddoutens. .)
Ureur ‘array({{2, 3, 4. $1.
[2 4.5. OF
14.5. 6. 7).
15.6. 7. KIDa
436 Foundations of Data gg
4.3.2 Computations on arrays: Broadcasting
Arithmetic operations on arrays are usually done on Cornering clemen, y
two arrays are of exactly the same shape, then these operations are Performed
element by element arithmetic.
* Smaller array is “broadcast” across the larger array to make them
compatible shapes.
Broadcasting provides a means of vectorizing array operations so that looping
occurs in C instead of Python.
© It is done without making copies of data and leads to efficient algorithe
implementations.
Broadcasting sometimes causes inefficient use of memory that slow
computation.
NumPy operations are usually don
basis. In the simplest case, the two a
the following example:
e on pairs of arrays on an element-by-elemes
rays must have exactly the same shape, as 6
##arithmetic operation between arrays with same dimensions
>>>x=np.array({10.0,20.0,30.0])
>>>y=np.array(2.0,2.0,2.0])
>>>rty
array([22.. 22.. 32.])
Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interace®
between the two arrays:
* Rule 1: If the two arrays differ in the number of dimensions, the share ©
the one with fewer dimensions 'S prepended with ones
oT. 4 Es ae | the two arrays are dissimilar in all dimension ®
array with shape equal to | in that dim + match
Saree there a Nension is stretched Ww 1
@ Rule 3: If in any dimension the 4p “ae
an error 1s raised es disagree and neither 1s eal
Array with a scalar
e
sealer value is strewhel ™
the shape of other array. and perfor tion Na"
, 0
im «
SIRE ComPUtatiON The advantageyon Libraries for Data Wrangling
poadcasting is that this duplication of values
useful mental model about broadcasting.
ingot numpy as np
cenparray((0, 1, 2])
a5 #array and a scalar arithmetic
Output:
aray((5. 6, 7])
Arrays of different dimensions
2 ae ree eee Te
[hh p= [he 23d
Pho be LIE Cbs 20 3d
“Array broadcasting
port numpy as np
“=np.array({0, 1, 2]) # one dimensional array
M =nponest(3, 3)) # 3x3 matrix of all Is
Primi + a)
Surpur
“owl 1.2. 3.)
11,2. 34
C12. 35p
ay
"sion in order to match the shape of M
4.37
does not actually take place, but it is
Here the one-dimensional array a is stretched. or broadcast across the secondFoundations of Data Soe
4.38
Arrays can differ in their dimensions. For example, a 256 x 256 x3 ama, ,
RGB values, can be scaled in each color in the image by a different value,
multiplying the image by a one-dimensional array with 3 values.
Image( 3Darray):256x256x3
Scale(1Darray):3
Result(3Darray):256x256x3
Broadcasting of both arrays
(012+ {0} = [[0. 1. 2).
uy U1. 2, 3}.
Pil 12, 3. 4])
Both array are stretched to match a common shape.
import numpy as np
@ =np.arrange(3) # 1x3 array with 3 elements
np.arrange(3){:. np-newaxis] # 3x1 array with 3 elements
printa + b)
OUTPUT
array({[0. 1. 2).
U1. 2. 3).
12. 3. 41)
Broadcasting in ufunc of Numpy
Consider an array of 10 observations with 3 values each like the bp. Susi!
temperature of a patient taken at 10 different time period of the day. This &"
stored in 10 x3 array. ¥
#Broudcusting
X =ap.random.randomt(10, 3)
Xmean=X.mean(0) #Columawise mean value
print “Mean of x(col):".Xmean)
X_centered= X -Xmean enter the array
X_centered_mean=X_centered. meaniO) # Met
‘an of the Cemrered
vem array
print’\Meun of Xcentered:".X_centered_mean)
Output
Meun of tcoll: arrast] 053514715, 066567217
OAAINSS
Mean of Xcentered sarravi| 222044605 ,.17 21)
TTISON IAF 1.665884 8de-071|
python Libraries. for Data Wrangling 4.39
Practical Example of broadcasting: Vector Quantization
. Vector quantization (VQ) algorithm is used in information theory,
classification, and other related areas.
« The basic operation in VQ finds the closest point in a set of points, called
codes, to a given point, called the observation.
Arrays used in this example
sample: describes the weight and height of an athlete to be classified.
classes: represent different classes of athletes.
* Finding the closest point requires calculating the distance between
observation and each of the codes.
© The shortest distance provides the best match. In this example, codes|0] is
the nearest class indicating that the athlete is likely a basketball player.
‘import numpy as np
from numpy import array. argmin, sqrt, sum
sample = array({107.0, 198.0})
classes = array({{102.0, 203.0]. [132.0. 193.0]. [45.0, 155.0], 157.0, 173.0]])
diff = classes - sample # the broadcast happens here
dist = np.sqri( sum diff**2.axis=-1))
Print("Sample belongs to Class:".np.argmin(dist))
OUTPUT
Sumple belongs to Class: 0
Limitations of Broadcasting
1. Broadcasting does not work for all cases, and imposes strict rules that
must be satisfied for broadcasting w be performed.
Arithmetic, including broadcasting. can only be performed when the shape:
of each dimension in the arrays are equal or one has the dimension size
of 1
4.4 COMPARISONS, MASKS, BOOLEANLOGIC
Masking means to extract, modify, count, oF otherwise manipulate values in an)
aay based on some criterion, Boolean masking 1 typically the most efficient way
© quantity a sub-collection in a collection, The criterta iy represented ay a true of
“he boolean value