Introduction to Numpy
Pruthvish Rajput, Venus Patel
February 23, 2023
1 Introduction to NumPy
• Used for effectively loading, storing, and manipulating in-memory data in Python.
• Datasets can come from a wide range of sources and a wide range of formats
– documents
– images
– sound clips
– numerical measurements
• Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays
of numbers.
• For example, images–particularly digital images–can be thought of as simply two-dimensional
arrays of numbers representing pixel brightness across the area.
• Sound clips can be thought of as one-dimensional arrays of intensity versus time.
• Text can be converted in various ways into numerical representations, perhaps binary digits
representing the frequency of certain words or pairs of words.
• For this reason, efficient storage and manipulation of numerical arrays is absolutely funda-
mental to the process of doing data science.
• This chapter will cover NumPy in detail. NumPy (short for Numerical Python) provides an
efficient interface to store and operate on dense data buffers.
• In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide
much more efficient storage and data operations as the arrays grow larger in size.
2 Understaning datatypes in Python
Dynamically typed language
/* C code */
int result = 0;
for(int i=0; i<100; i++){
result += i;
}
# Python code
result = 0
for i in range(100):
result += i
1
# Python code
x = 4
x = "four"
/* C code */
int x = 4;
x = "four"; // FAILS
• This sort of flexibility is one piece that makes Python and other dynamically-typed
languages convenient and easy to use.
• Understanding how this works is an important piece of learning to analyze data efficiently
and effectively with Python.
• But, they also contain extra information about the type of the value. We’ll explore this
more in the sections that follow.
2.1 A Python Integer Is More Than Just an Integer
• The standard Python implementation is written in C.
• When we define an integer in Python, such as x = 10000, x is not just a “raw” integer. It’s
actually a pointer to a compound C structure, which contains several values.
struct _longobject {
long ob_refcnt;
PyTypeObject *ob_type;
size_t ob_size;
long ob_digit[1];
};
A single integer in Python 3.4 actually contains four pieces:
• ob_refcnt, a reference count that helps Python silently handle memory allocation and deal-
location
• ob_type, which encodes the type of the variable
• ob_size, which specifies the size of the following data members
• ob_digit, which contains the actual integer value that we expect the Python variable to
represent.
2
• A C integer is essentially a label for a position in memory whose bytes encode an integer
value.
• A Python integer is a pointer to a position in memory containing all the Python object
information, including the bytes that contain the integer value.
• This extra information in the Python integer structure is what allows Python to be coded so
freely and dynamically.
• All this additional information in Python types comes at a cost, however, which becomes
especially apparent in structures that combine many of these objects.
2.2 A Python List Is More Than Just a List
[1]: L = list(range(10))
L
[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2]: type(L[0])
[2]: int
[3]: L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]
[3]: [bool, str, float, int]
• At the implementation level, the array essentially contains a single pointer to one contiguous
block of data.
• The Python list, on the other hand, contains a pointer to a block of pointers, each of which
in turn points to a full Python object like the Python integer we saw earlier.
3
• Again, the advantage of the list is flexibility: because each list element is a full structure
containing both data and type information, the list can be filled with data of any desired
type.
• Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing
and manipulating data.
2.3 Fixed-Type Arrays in Python
• Python offers several different options for storing data in efficient, fixed-type data buffers.
• The built-in array module (available since Python 3.3) can be used to create dense arrays of
a uniform type.
[18]: import array
L = list(range(10))
A = array.array('i', L)
A
• While Python’s array object provides efficient storage of array-based data, NumPy adds to
this efficient operations on that data.
• We will explore these operations in later sections; here we’ll demonstrate several ways of
creating a NumPy array.
2.4 Creating Arrays from Python Lists
First, we can use np.array to create arrays from Python lists:
[1]: import numpy as np
a = np.array(np.linspace(0,10,100))
'''
b = np.zeros(100)
#print(a,b)
for index, element in enumerate(a):
#print(index, element)
#input()
b[index] = 1/a[index]
#b[index] = 1/element
print(a,b)
'''
b=1/a
/tmp/ipykernel_38445/1177857886.py:15: RuntimeWarning: divide by zero
encountered in true_divide
b=1/a
If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:
[ ]:
4
2.5 Creating Arrays from Scratch
Especially for larger arrays, it is more efficient to create arrays from scratch using routines built
into NumPy. Here are several examples: zeros, ones, full
[1]: import numpy as np
a = np.zeros((3,4))
b = np.ones((4,3))
c = np.full((5,5),fill_value = 4)
print(a)
print(b)
print(c)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]]
3 Create an array filled with a linear sequence
[2]: # Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
d = np.arange(0,20,2)
print(d)
[ 0 2 4 6 8 10 12 14 16 18]
4 Evenly spaced array
[3]: # Create an array of five values evenly spaced between 0 and 1
e = np.linspace(0,1,5)
e
[3]: array([0. , 0.25, 0.5 , 0.75, 1. ])
5
5 Create a 3x3 array of uniformly distributed
[7]: # Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3,3))
np.random.normal?
6 Create a 3x3 array of normally distributed
[9]: # Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(5,1,(3,3))
[9]: array([[4.9312537 , 4.43851156, 5.57260415],
[4.33924454, 6.2291077 , 5.18805873],
[6.32576801, 6.00020179, 4.86648286]])
7 Create a 3x3 array of random integers in the interval [0, 10)
[10]: np.random.randint(0,10,(3,3))
[10]: array([[2, 4, 2],
[0, 6, 8],
[2, 6, 2]])
8 Create a 3x3 identity matrix
[11]: np.eye(3)
[11]: array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
9 Create an uninitialized array of three integers
[12]: # The values will be whatever happens to already exist at that memory location
np.empty(3)
[12]: array([1., 1., 1.])
6
9.1 NumPy Standard Data Types
• NumPy arrays contain values of a single type, so it is important to have detailed knowledge
of those types and their limitations.
• Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other
related languages.
The standard NumPy data types are listed in the following table. Note that when constructing an
array, they can be specified using a string:
np.zeros(10, dtype='int16')
Or using the associated NumPy object:
np.zeros(10, dtype=np.int16)
Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long;
normally either int64 or int32)
intc Identical to C int (normally int32 or
int64)
intp Integer used for indexing (same as C
ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to
9223372036854775807)
Data type Description
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to
18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits
exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits
exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits
exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two
32-bit floats
complex128 Complex number, represented by two
64-bit floats