Numpy
Numpy
NumPy is a basic package for scientific computing with Python and especially for data analysis. In fact, this
library is the basis of a large amount of mathematical and scientific Python packages, and among them,
as you will see later in the book, is the pandas library. This library, specialized for data analysis, is fully
developed using the concepts introduced by NumPy. In fact, the built-in tools provided by the standard
Python library could be too simple or inadequate for most of the calculations in data analysis.
Having knowledge of the NumPy library is important to being able to use all scientific Python packages,
and particularly, to use and understand the pandas library. The pandas library is the main subject of the
following chapters.
If you are already familiar with this library, you can proceed directly to the next chapter; otherwise you
can view this chapter as a way to review the basic concepts or to regain familiarity with it by running the
examples in this chapter.
Figure 3-1. NumPy releases in the last five years, with the new NumPy logo [CC BY-SA 4.0 Isabela
Presedo-Floyd]
If instead you want to work without the support of this distribution, use the command-line pip
command to install the NumPy library (see https://pypi.org/project/numpy/):
Once NumPy is installed on your distribution, to import the NumPy module in your Python session,
write the following:
If, on the other hand, you are writing code, in order to access NumPy and its functions, you have to
insert this instruction at the beginning of the Python code.
46
Chapter 3 ■ the Numpy Library
You can easily check that a newly created object is an ndarray by passing the new variable to the type()
function.
>>> type(a)
<class 'numpy.ndarray'>
In order to know the associated dtype to the newly created ndarray, you have to use the dtype attribute.
■ Note the result of dtype, shape, and other attributes can vary among different operating systems and
python distributions.
>>> a.dtype
dtype('int32')
The just-created array has one axis, and then its rank is 1, while its shape should be (3,1). To obtain
these values from the corresponding array, it is sufficient to use the ndim attribute for getting the axes, the
size attribute to determine the array length, and the shape attribute to get its shape.
>>> a.ndim
1
>>> a.size
3
>>> a.shape
(3,)
47
Chapter 3 ■ the Numpy Library
What you have just seen is the simplest case of a one-dimensional array. But the use of arrays can be
easily extended to several dimensions. For example, if you define a two-dimensional array 2x2:
This array has rank 2, since it has two axes, each of length 2.
Another important attribute is itemsize, which can be used with ndarray objects. It defines the size
in bytes of each item in the array, and data is the buffer containing the actual elements of the array. This
second attribute is still not generally used, because to access the data in the array you use the indexing
mechanism, which you will see in the next sections.
>>> b.itemsize
8
>>> b.data
<memory at 0x000001A8AD526A80>
Create an Array
To create a new array, you can follow different paths. The most common path is the one you saw in the
previous section through a list or sequence of lists as arguments to the array() function.
The array() function, in addition to lists, can accept tuples and sequences of tuples.
48
Chapter 3 ■ the Numpy Library
Types of Data
So far you have seen only simple integer and float numeric values, but NumPy arrays are designed to contain
a wide variety of data types (see Table 3-1). For example, you can use the data type string:
49
Chapter 3 ■ the Numpy Library
While the ones() function creates an array full of ones in a very similar way.
By default, the two functions created arrays with the float64 data type. A feature that is particularly
useful is arange(). This function generates NumPy arrays with numerical sequences that respond to
particular rules depending on the passed arguments. For example, if you want to generate a sequence of
values between 0 and 10, you will be passed only one argument to the function—the value with which you
want to end the sequence.
If instead of starting from 0 you want to start from another value, you simply specify two arguments: the
first is the starting value and the second is the final value.
50
Chapter 3 ■ the Numpy Library
It is also possible to generate a sequence of values with precise intervals between them. If the third
argument of the arange() function is specified, this will represent the gap between one value and the next
one in the sequence of values.
So far you have only created one-dimensional arrays. To generate two-dimensional arrays, you can still
continue to use the arange() function but combined with the reshape() function. This function divides a
linear array in different parts in the manner specified by the shape argument.
Another function very similar to arange() is linspace(). This function still takes as its first two
arguments the initial and end values of the sequence, but the third argument, instead of specifying the
distance between one element and the next, defines the number of elements into which you want the
interval to be split.
>>> np.linspace(0,10,5)
array([ 0. , 2.5, 5. , 7.5, 10. ])
Finally, another method to obtain arrays already containing values is to fill them with random values.
This is possible using the random() function of the numpy.random module. This function will generate an
array with as many elements as specified in the argument.
>>> np.random.random(3)
array([ 0.78610272, 0.90630642, 0.80007102])
The numbers obtained will vary with every run. To create a multidimensional array, you simply pass the
size of the array as an argument.
>>> np.random.random((3,3))
array([[ 0.07878569, 0.7176506 , 0.05662501],
[ 0.82919021, 0.80349121, 0.30254079],
[ 0.93347404, 0.65868278, 0.37379618]])
Basic Operations
So far you have seen how to create a new NumPy array and how items are defined in it. Now it is the time to
see how to apply various operations to these arrays.
51
Chapter 3 ■ the Numpy Library
Arithmetic Operators
The first operations that you will perform on arrays are the arithmetic operators. The most obvious are
adding and multiplying an array by a scalar.
>>> a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> a+4
array([4, 5, 6, 7])
>>> a*2
array([0, 2, 4, 6])
These operators can also be used between two arrays. In NumPy, these operations are element-wise,
that is, the operators are applied only between corresponding elements. These objects occupy the same
position, so that the end result is a new array containing the results in the same location of the operands (see
Figure 3-2).
>>> b = np.arange(4,8)
>>> b
array([4, 5, 6, 7])
>>> a + b
array([ 4, 6, 8, 10])
>>> a – b
array([–4, –4, –4, –4])
>>> a * b
array([ 0, 5, 12, 21])
Moreover, these operators are also available for functions, provided that the value returned is a NumPy
array. For example, you can multiply the array by the sine or the square root of the elements of array b.
>>> a * np.sin(b)
array([–0. , –0.95892427, –0.558831 , 1.9709598 ])
>>> a * np.sqrt(b)
array([ 0. , 2.23606798, 4.89897949, 7.93725393])
52
Chapter 3 ■ the Numpy Library
Moving on to the multidimensional case, even here the arithmetic operators continue to operate
element-wise.
>>> np.dot(A,B)
array([[ 3., 3., 3.],
[ 12., 12., 12.],
[ 21., 21., 21.]])
The result at each position is the sum of the products of each element of the corresponding row of the
first matrix with the corresponding element of the corresponding column of the second matrix. Figure 3-3
illustrates the process carried out during the matrix product (run for two elements).
53
Chapter 3 ■ the Numpy Library
An alternative way to write the matrix product is to use the dot() function as an object’s function of one
of the two matrices.
>>> A.dot(B)
array([[ 3., 3., 3.],
[ 12., 12., 12.],
[ 21., 21., 21.]])
Note that because the matrix product is not a commutative operation, the order of the operands is
important. Indeed, A * B is not equal to B * A.
>>> np.dot(B,A)
array([[ 9., 12., 15.],
[ 9., 12., 15.],
[ 9., 12., 15.]])
>>> a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> a += 1
>>> a
array([1, 2, 3, 4])
>>> a –= 1
>>> a
array([0, 1, 2, 3])
Therefore, using these operators is much more extensive than the simple incremental operators that
increase the values by one unit, and they can be applied in many cases. For instance, you need them every
time you want to change the values in an array without generating a new array.
>>> a += 4
>>> a
array([4, 5, 6, 7])
>>> a *= 2
>>> a
array([ 8, 10, 12, 14])
54
Chapter 3 ■ the Numpy Library
There are many mathematical and trigonometric operations that meet this definition; for example,
calculating the square root with sqrt(), the logarithm with log(), or the sin with sin().
>>> a = np.arange(1, 5)
>>> a
array([1, 2, 3, 4])
>>> np.sqrt(a)
array([ 1. , 1.41421356, 1.73205081, 2. ])
>>> np.log(a)
array([ 0. , 0.69314718, 1.09861229, 1.38629436])
>>> np.sin(a)
array([ 0.84147098, 0.90929743, 0.14112001, –0.7568025 ])
Many common math functions are already implemented in the NumPy library.
Aggregate Functions
Aggregate functions perform an operation on a set of values, an array for example, and produce a single
result. Therefore, the sum of all the elements in an array is an aggregate function. Many functions of this kind
are implemented in the ndarray class and so can be invoked directly from the array on which you want to
perform the calculation.
Indexing
Array indexing always uses square brackets ([ ]) to index the elements of the array so that the elements can
then be referred individually for various uses, such as extracting a value, selecting items, or even assigning a
new value.
55
Chapter 3 ■ the Numpy Library
When you create a new array, an appropriate scale index is also automatically created (see Figure 3-4).
In order to access a single element of an array, you can refer to its index.
The NumPy arrays also accept negative indexes. These indexes have the same incremental sequence
from 0 to –1, –2, and so on, but in practice they cause the final element to move gradually toward the initial
element, which is the one with the more negative index value.
>>> a[–1]
15
>>> a[–6]
10
To select multiple items at once, you can pass an array of indexes in square brackets.
Moving on to the two-dimensional case, namely the matrices, they are represented as rectangular
arrays consisting of rows and columns, defined by two axes, where axis 0 is represented by the rows and axis
1 is represented by the columns. Thus, indexing in this case is represented by a pair of values: the first value
is the index of the row and the second is the index of the column. Therefore, if you want to access the values
or select elements in the matrix, you still use square brackets, but this time there are two values [row index,
column index] (see Figure 3-5).
56
Chapter 3 ■ the Numpy Library
If you want to remove the element of the third column in the second row, you have to insert the
pair [1, 2].
>>> A[1, 2]
15
Slicing
Slicing allows you to extract portions of an array to generate new arrays. When you use the Python lists to
slice arrays, the resulting arrays are copies, but in NumPy, the arrays are views of the same underlying buffer.
Depending on the portion of the array that you want to extract (or view), you must use the slice syntax;
that is, you use a sequence of numbers separated by colons (:) within square brackets.
If you want to extract a portion of the array, for example one that goes from the second to the sixth
element, you have to insert the index of the starting element, that is 1, and the index of the final element, that
is 5, separated by a colon (:).
Now if you want to extract an item from the previous portion and skip a specific number of following
items, then extract the next and skip again, you can use a third number that defines the gap in the sequence
of the elements. For example, with a value of 2, the array will take the elements in an alternating fashion.
>>> a[1:5:2]
array([11, 13])
57
Chapter 3 ■ the Numpy Library
To better understand the slice syntax, you also should look at cases where you do not use explicit
numerical values. If you omit the first number, NumPy implicitly interprets this number as 0 (i.e., the initial
element of the array). If you omit the second number, this will be interpreted as the maximum index of
the array; and if you omit the last number, this will be interpreted as 1. All the elements will be considered
without intervals.
>>> a[::2]
array([10, 12, 14])
>>> a[:5:2]
array([10, 12, 14])
>>> a[:5:]
array([10, 11, 12, 13, 14])
In the case of a two-dimensional array, the slicing syntax still applies, but it is separately defined for the
rows and columns. For example, if you want to extract only the first row:
As you can see in the second index, if you leave only the colon without defining a number, you will
select all the columns. Instead, if you want to extract all the values of the first column, you have to write the
inverse.
>>> A[:,0]
array([10, 13, 16])
Instead, if you want to extract a smaller matrix, you need to explicitly define all intervals with indexes
that define them.
If the indexes of the rows or columns to be extracted are not contiguous, you can specify an array of
indexes.
58
Chapter 3 ■ the Numpy Library
Iterating an Array
In Python, iterating the items in an array is really very simple; you just need to use the for construct.
>>> for i in a:
... print(i)
...
10
11
12
13
14
15
Of course, even here, moving to the two-dimensional case, you could think of applying the solution of
two nested loops with the for construct. The first loop will scan the rows of the array, and the second loop
will scan the columns. Actually, if you apply the for loop to a matrix, it will always perform a scan according
to the first axis.
If you want to make an iteration element by element, you can use the following construct, using the for
loop on A.flat.
However, despite all this, NumPy offers an alternative and more elegant solution than the for loop.
Generally, you need to apply an iteration to apply a function on the rows, on the columns, or on an
individual item. If you want to launch an aggregate function that returns a value calculated for every single
column or for every single row, there is an optimal way that leaves it to NumPy to manage the iteration: the
apply_along_axis() function.
59
Chapter 3 ■ the Numpy Library
This function takes three arguments: the aggregate function, the axis on which to apply the iteration,
and the array. If the axis option equals 0, then the iteration evaluates the elements column by column,
whereas if axis equals 1 then the iteration evaluates the elements row by row. For example, you can
calculate the average values first by column and then by row.
The previous case uses a function already defined in the NumPy library, but nothing prevents you
from defining your own functions. You also used an aggregate function. However, nothing forbids you from
using an ufunc. In this case, iterating by column and by row produces the same result. In fact, using a ufunc
performs one iteration element-by-element.
As you can see, the ufunc function halves the value of each element of the input array, regardless of
whether the iteration is performed by row or by column.
Once a matrix of random numbers is defined, if you apply an operator condition, you will receive as a
return value a Boolean array containing true values in the positions in which the condition is satisfied. In
this example, that is all the positions in which the values are less than 0.5.
60
Chapter 3 ■ the Numpy Library
Actually, the Boolean arrays are used implicitly for making selections of parts of arrays. In fact, by
inserting the previous condition directly inside the square brackets, you can extract all elements smaller
than 0.5, so as to obtain a new array.
Shape Manipulation
You already saw, when creating a two-dimensional array, that it is possible to convert a one-dimensional
array into a matrix, thanks to the reshape() function.
>>> a = np.random.random(12)
>>> a
array([ 0.77841574, 0.39654203, 0.38188665, 0.26704305, 0.27519705,
0.78115866, 0.96019214, 0.59328414, 0.52008642, 0.10862692,
0.41894881, 0.73581471])
>>> A = a.reshape(3, 4)
>>> A
array([[ 0.77841574, 0.39654203, 0.38188665, 0.26704305],
[ 0.27519705, 0.78115866, 0.96019214, 0.59328414],
[ 0.52008642, 0.10862692, 0.41894881, 0.73581471]])
The reshape() function returns a new array and can therefore create new objects. However, if you want
to modify the object by modifying the shape, you have to assign a tuple containing the new dimensions
directly to its shape attribute.
As you can see, this time it is the starting array that changes shape and no object is returned. The
inverse operation is also possible; that is, you can convert a two-dimensional array into a one-dimensional
array. You do this by using the ravel() function.
>>> a = a.ravel()
>>> a
array([ 0.77841574, 0.39654203, 0.38188665, 0.26704305, 0.27519705,
0.78115866, 0.96019214, 0.59328414, 0.52008642, 0.10862692,
0.41894881, 0.73581471])
61
Chapter 3 ■ the Numpy Library
Or you can even act directly on the shape attribute of the array itself.
Another important operation is transposing a matrix, which is inverting the columns with the rows.
NumPy provides this feature with the transpose() function.
>>> A.transpose()
array([[ 0.77841574, 0.27519705, 0.52008642],
[ 0.39654203, 0.78115866, 0.10862692],
[ 0.38188665, 0.96019214, 0.41894881],
[ 0.26704305, 0.59328414, 0.73581471]])
Array Manipulation
Often you need to create an array using already created arrays. In this section, you see how to create new
arrays by joining or splitting arrays that are already defined.
Joining Arrays
You can merge multiple arrays to form a new one that contains all of the arrays. NumPy uses the concept of
stacking, providing a number of functions in this regard. For example, you can perform vertical stacking with
the vstack() function, which combines the second array as new rows of the first array. In this case, the array
grows in the vertical direction. By contrast, the hstack() function performs horizontal stacking; that is, the
second array is added to the columns of the first array.
Two other functions performing stacking between multiple arrays are column_stack() and row_
stack(). These functions operate differently than the two previous functions. Generally these functions
are used with one-dimensional arrays, which are stacked as columns or rows in order to form a new two-
dimensional array.
62
Chapter 3 ■ the Numpy Library
Splitting Arrays
In the previous section, you saw how to assemble multiple arrays through stacking. Now you see how to
divide an array into several parts. In NumPy, you use splitting to do this. Here too, you have a set of functions
that work both horizontally with the hsplit() function and vertically with the vsplit() function.
Thus, if you want to split the array horizontally, meaning the width of the array is divided into two parts,
the 4x4 matrix A will be split into two 2x4 matrices.
Instead, if you want to split the array vertically, meaning the height of the array is divided into two parts,
the 4x4 matrix A will be split into two 4x2 matrices.
63
Chapter 3 ■ the Numpy Library
A more complex command is the split() function, which allows you to split the array into
nonsymmetrical parts. Passing the array as an argument, you also have to specify the indexes of the parts to
be divided. If you use the axis = 1 option, then the indexes will be columns; if instead the option is axis =
0, then they will be row indexes.
For example, if you want to divide the matrix into three parts, the first of which will include the first
column, the second will include the second and the third column, and the third will include the last column,
you must specify three indexes in the following way.
This feature also includes the functionalities of the vsplit() and hsplit() functions.
General Concepts
This section describes the general concepts underlying the NumPy library. The difference between copies
and views is when they return values. The mechanism of broadcasting, which occurs implicitly in many
NumPy functions, is also covered in this section.
64
Chapter 3 ■ the Numpy Library
If you assign one array a to another array b, you are not copying it; array b is just another way to call
array a. In fact, by changing the value of the third element of a, you change the third value of b too. When you
slice an array, the object returned is a view of the original array.
>>> c = a[0:2]
>>> c
array([1, 2])
>>> a[0] = 0
>>> c
array([0, 2])
As you can see, even when slicing, you are actually pointing to the same object. If you want to generate a
complete and distinct array, use the copy() function.
In this case, even when you change the items in array a, array c remains unchanged.
Vectorization
Vectorization, along with broadcasting, is the basis of the internal implementation of NumPy. Vectorization
is the absence of an explicit loop during the development of the code. These loops actually cannot be
omitted, but are implemented internally and then are replaced by other constructs in the code. The
application of vectorization leads to more concise and readable code, and you can say that it will appear
more “Pythonic” in its appearance. In fact, thanks to the vectorization, many operations take on a more
mathematical expression. For example, NumPy allows you to express the multiplication of two arrays
as shown:
a * b
A * B
65
Chapter 3 ■ the Numpy Library
In other languages, such operations would be expressed with many nested loops and the for construct.
For example, the first operation would be expressed in the following way:
You can see that using NumPy makes the code more readable and more mathematical.
Broadcasting
Broadcasting allows an operator or a function to act on two or more arrays even if these arrays do not
have the same shape. That said, not all the dimensions can be subjected to broadcasting; they must meet
certain rules.
You saw that using NumPy, you can classify multidimensional arrays through a shape that is a tuple
representing the length of the elements of each dimension.
Two arrays can be subjected to broadcasting when all their dimensions are compatible, i.e., the length
of each dimension must be equal or one of them must be equal to 1. If neither of these conditions is met, you
get an exception that states that the two arrays are not compatible.
>>> A = np.arange(16).reshape(4, 4)
>>> b = np.arange(4)
>>> A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> b
array([0, 1, 2, 3])
4 x 4
4
There are two rules of broadcasting. First you must add a 1 to each missing dimension. If the
compatibility rules are now satisfied, you can apply broadcasting and move to the second rule. For example:
4 x 4
4 x 1
66
Chapter 3 ■ the Numpy Library
The rule of compatibility is met. Then you can move to the second rule of broadcasting. This rule
explains how to extend the size of the smallest array so that it’s the size of the biggest array, so that the
element-wise function or operator is applicable.
The second rule assumes that the missing elements (size, length 1) are filled with replicas of the values
contained in extended sizes (see Figure 3-6).
Now that the two arrays have the same dimensions, the values inside may be added together.
>>> A + b
array([[ 0, 2, 4, 6],
[ 4, 6, 8, 10],
[ 8, 10, 12, 14],
[12, 14, 16, 18]])
This is a simple case in which one of the two arrays is smaller than the other. There may be more
complex cases in which the two arrays have different shapes and each is smaller than the other only in
certain dimensions.
>>> m = np.arange(6).reshape(3, 1, 2)
>>> n = np.arange(6).reshape(3, 2, 1)
>>> m
array([[[0, 1]],
[[2, 3]],
[[4, 5]]])
>>> n
array([[[0],
[1]],
[[2],
[3]],
[[4],
[5]]])
Even in this case, by analyzing the shapes of the two arrays, you can see that they are compatible and
therefore the rules of broadcasting can be applied.
3 x 1 x 2
3 x 2 x 1
67
Chapter 3 ■ the Numpy Library
m* = [[[0,1], n* = [[[0,0],
[0,1]], [1,1]],
[[2,3], [[2,2],
[2,3]], [3,3]],
[[4,5], [[4,4],
[4,5]]] [5,5]]]
Then you can apply, for example, the addition operator between the two arrays, operating
element-wise.
>>> m + n
array([[[ 0, 1],
[ 1, 2]],
[[ 4, 5],
[ 5, 6]],
[[ 8, 9],
[ 9, 10]]])
Structured Arrays
So far in the various examples in the previous sections, you saw monodimensional and two-dimensional
arrays. NumPy allows you to create arrays that are much more complex not only in size, but in the structure,
called structured arrays. This type of array contains structs or records instead of individual items.
For example, you can create a simple array of structs as items. Thanks to the dtype option, you can
specify a list of comma-separated specifiers to indicate the elements that will constitute the struct, along
with data type and order.
bytes b1
int i1, i2, i4, i8
unsigned ints u1, u2, u4, u8
floats f2, f4, f8
complex c8, c16
fixed length strings a<n>
For example, if you want to specify a struct consisting of an integer, a character string of length 6,
and a Boolean value, you specify the three types of data in the dtype option with the right order using the
corresponding specifiers.
■ Note the result of dtype and other format attributes can vary among different operating systems and
python distributions.
>>> structured = np.array([(1, 'First', 0.5, 1+2j),(2, 'Second', 1.3, 2-2j), (3, 'Third',
0.8, 1+3j)],dtype=('i2, a6, f4, c8'))
>>> structured
array([(1, b'First', 0.5, 1+2.j),
68
Chapter 3 ■ the Numpy Library
You can also use the data type explicitly by specifying int8, uint8, float16, complex64, and so forth.
>>> structured = np.array([(1, 'First', 0.5, 1+2j),(2, 'Second', 1.3,2-2j), (3, 'Third',
0.8, 1+3j)],dtype=('
int16, a6, float32, complex64'))
>>> structured
array([(1, b'First', 0.5, 1.+2.j),
(2, b'Second', 1.3, 2.-2.j),
(3, b'Third', 0.8, 1.+3.j)],
dtype=[('f0', '<i2'), ('f1', 'S6'), ('f2', '<f4'), ('f3', '<c8')])
Both cases have the same result. Inside the array, you see a dtype sequence containing the name of
each item of the struct with the corresponding data type.
Writing the appropriate reference index, you obtain the corresponding row, which contains the struct.
>>> structured[1]
(2, b'Second', 1.3, 2.-2.j)
The names that are assigned automatically to each item of the struct can be considered the names of the
columns of the array. Using them as a structured index, you can refer to all the elements of the same type, or
of the same column.
>>> structured['f1']
array([b'First', b'Second', b'Third'], dtype='|S6')
As you have just seen, the names are assigned automatically with an f (which stands for field) and a
progressive integer that indicates the position in the sequence. In fact, it would be more useful to specify the
names with something more meaningful. This is possible and you can do it at the time of array declaration:
Or you can do it at a later time, redefining the tuples of names assigned to the dtype attribute of the
structured array.
Now you can use meaningful names for the various field types:
>>> structured['order']
array([b'First', b'Second', b'Third'], dtype='|S6')
69
Chapter 3 ■ the Numpy Library
When you need to recover the data stored in a .npy file, you use the load() function by specifying the
file name as the argument, this time adding the .npy extension.
id,value1,value2,value3
1,123,1.4,23
2,110,0.5,18
3,164,2.1,19
70
Chapter 3 ■ the Numpy Library
To be able to read your data in a text file and insert values into an array, NumPy provides a function
called genfromtxt(). Normally, this function takes three arguments—the name of the file containing the
data, the character that separates the values from each other (in this case, a comma), and whether the data
contain column headers.
As you can see from the result, you get a structured array in which the column headings have become
the field names.
This function implicitly performs two loops: the first loop reads a line at a time, and the second loop
separates and converts the values contained in it, inserting the consecutive elements created specifically.
One positive aspect of this feature is that if some data are missing, the function can handle them.
Take for example the previous file (see Listing 3-2) with some items removed. Save it as data2.csv.
id,value1,value2,value3
1,123,1.4,23
2,110,,18
3,,2.1,19
Launching these commands, you can see how the genfromtxt() function replaces the blanks in the file
with nan values.
At the bottom of the array, you can find the column headings contained in the file. These headers can
be considered labels that act as indexes to extract the values by column.
>>> data2['id']
array([ 1., 2., 3.])
Instead, by using the numerical indexes in the classic way, you extract data corresponding to the rows.
>>> data2[0]
(1.0, 123.0, 1.4, 23.0)
71
Chapter 3 ■ the Numpy Library
Conclusions
In this chapter, you learned about all the main aspects of the NumPy library and became familiar with a
range of features that form the basis of many other aspects you’ll face in the course of the book. In fact, many
of these concepts are from other scientific and computing libraries that are more specialized, but that have
been structured and developed on the basis of this library.
You saw how, thanks to ndarray, you can extend the functionalities of Python, making it a suitable
language for scientific computing and data analysis.
Knowledge of NumPy is therefore crucial for anyone who wants to take on the world of data analysis.
The next chapter introduces a new library, called pandas, which is structured on NumPy and so
encompasses all the basic concepts illustrated in this chapter. However, pandas extends these concepts so
they are more suitable to data analysis.
72