INTRODUCTION TO NUMPY
BY-RAVI SHANKAR
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed.
NumPy is a Python package. It stands for ‘Numerical Python’. It is a library consisting of multidimensional array objects and a collection of routines for processing of array.
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray was also developed, having some additional functionalities. In 2005, Travis Oliphant created NumPy package by incorporating the features of Numarray into Numeric package. There are many contributors to this open-source project.
Why use NumPy Arrays?
Numpy is one of the most commonly used packages for scientific computing in Python. It provides a multidimensional array object, as well as variations such as masks and matrices, which can be used for various math operations. Numpy is compatible with, and used by many other popular Python packages, including pandas and matplotlib.
Why is numpy so popular? Quite simply, because it’s faster than regular Python arrays, which lack numpy’s optimized and pre-compiled C code that does all the heavy lifting. Another reason is that numpy arrays and operations are vectorized, which means they lack explicit looping or indexing in the code. This makes the code not only more readable, but also more similar to standard mathematical notation.
The following example illustrates the vectorization difference between standard Python and numpy.
For two arrays A and B of the same size, if we wanted to do a vector multiplication in Python:
c = []
for i in range(len(a)):
c.append(a[i]*b[i])
In numpy, this can simply be done with the following line of code:
c = a*b
Numpy makes many mathematical operations used widely in scientific computing fast and easy to use, such as:
- Vector-Vector multiplication
- Matrix-Matrix and Matrix-Vector multiplication
- Element-wise operations on vectors and matrices (i.e., adding, subtracting, multiplying, and dividing by a number )
- Element-wise or array-wise comparisons
- Applying functions element-wise to a vector/matrix ( like pow, log, and exp)
- A whole lot of Linear Algebra operations can be found in NumPy.linalg
- Reduction, statistics, and much more
Limitations
Inserting or appending entries to an array is not as trivially possible as it is with Python’s lists. The np.pad(...)
routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it. NumPy's np.concatenate([a1,a2])
operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence. Reshaping the dimensionality of an array with np.reshape(...)
is only possible as long as the number of elements in the array does not change. These circumstances originate from the fact that NumPy's arrays must be views on contiguous memory buffers. A replacement package called Blaze attempts to overcome this limitation.
Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in “pure Python”, while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include scipy.weave
, numexpr and Numba. Cython and Pythran are static-compiling alternatives to these.
Many modern large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays. For example, NumPy arrays are usually loaded into a computer’s memory, which might have insufficient capacity for the analysis of large datasets. Further, NumPy operations are executed on a single CPU. However, many linear algebra operations can be accelerated by executing them on clusters of CPUs or of specialized hardware, such as GPUs and TPUs, which many deep learning applications rely on. As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as Dask for distributed arrays and TensorFlow or JAX for computations on GPUs. Because of its popularity, these often implement a subset of Numpy’s API or mimic it, so that users can change their array implementation with minimal changes to their code required.A recently introduced library named CUPy, accelerated by Nvidia’s CUDA framework, has also shown potential for faster computing, being a ‘drop-in replacement’ of NumPy.
Building and installing NumPy
Prerequisites
Building NumPy requires the following software installed:
- Python 2.6.x, 2.7.x, 3.2.x or newer
- On Debian and derivatives (Ubuntu): python, python-dev (or python3-dev)
- On Windows: the official python installer at www.python.org is enough
- Make sure that the Python package distutils is installed before continuing. For example, in Debian GNU/Linux, installing python-dev also installs distutils.
- Python must also be compiled with the zlib module enabled. This is practically always the case with pre-packaged Pythons.
- Compilers
- To build any extension modules for Python, you’ll need a C compiler. Various NumPy modules use FORTRAN 77 libraries, so you’ll also need a FORTRAN 77 compiler installed.
- Note that NumPy is developed mainly using GNU compilers. Compilers from other vendors such as Intel, Absoft, Sun, NAG, Compaq, Vast, Porland, Lahey, HP, IBM, Microsoft are only supported in the form of community feedback, and may not work out of the box. GCC 4.x (and later) compilers are recommended.
- Linear Algebra libraries
- NumPy does not require any external linear algebra libraries to be installed. However, if these are available, NumPy’s setup script can detect them and use them for building. A number of different LAPACK library setups can be used, including optimized LAPACK libraries such as ATLAS, MKL or the Accelerate/vecLib framework on OS X.
Basic Installation
To install NumPy run:
python setup.py install
To perform an in-place build that can be run from the source folder run:
python setup.py build_ext --inplace
The NumPy build system uses distutils and numpy.distutils. setuptools is only used when building via pip or with python setupegg.py. Using virtualenv should work as expected.
Note: for build instructions to do development work on NumPy itself, see :ref:`development-environment`.
Parallel builds
From NumPy 1.10.0 on it’s also possible to do a parallel build with:
python setup.py build -j 4 install --prefix $HOME/.local
This will compile numpy on 4 CPUs and install it into the specified prefix.
numpy.ndarray
class numpy.ndarray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)
An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)
Arrays should be constructed using array
, zeros
or empty
(refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
For more information, refer to the numpy
module and examine the methods and attributes of an array.
Parameters(for the __new__ method; see Notes below)shapetuple of ints
Shape of created array.
dtypedata-type, optional
Any object that can be interpreted as a numpy data type.
bufferobject exposing buffer interface, optional
Used to fill the array with data.
offsetint, optional
Offset of array data in buffer.
stridestuple of ints, optional
Strides of data in memory.
order{‘C’, ‘F’}, optional
Row-major (C-style) or column-major (Fortran-style) order.
See also
Construct an array.
Create an array, each element of which is zero.
Create an array, but leave its allocated memory unchanged (i.e., it contains “garbage”).
Create a data-type.
A generic version of ndarray.
Notes
There are two modes of creating an array using __new__
:
- If buffer is None, then only
shape
,dtype
, and order are used. - If buffer is an object exposing the buffer interface, then all keywords are interpreted.
No __init__
method is needed because the array is fully initialized after the __new__
method.
Examples
These examples illustrate the low-level ndarray
constructor. Refer to the See Also section above for easier ways of constructing an ndarray.
First mode, buffer is None:
>>> np.ndarray(shape=(2,2), dtype=float, order='F')
array([[0.0e+000, 0.0e+000], # random
[ nan, 2.5e-323]])
Second mode:
>>> np.ndarray((2,), buffer=np.array([1,2,3]),
... offset=np.int_().itemsize,
... dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])
AttributesT
ndarray
The transposed array.
data
buffer
Python buffer object pointing to the start of the array’s data.
dtype
dtype object
Data-type of the array’s elements.
flags
dict
Information about the memory layout of the array.
flat
numpy.flatiter object
A 1-D iterator over the array.
imag
ndarray
The imaginary part of the array.
real
ndarray
The real part of the array.
size
int
Number of elements in the array.
itemsize
int
Length of one array element in bytes.
nbytes
int
Total bytes consumed by the elements of the array.
ndim
int
Number of array dimensions.
shape
tuple of ints
Tuple of array dimensions.
strides
tuple of ints
Tuple of bytes to step in each dimension when traversing an array.
ctypes
ctypes object
An object to simplify the interaction of the array with the ctypes module.
base
ndarray
Base object if memory is from some other object.
Basic Slicing and Advanced Indexing in NumPy Python
NumPy or Numeric Python is a package for computation on homogenous n-dimensional arrays. In numpy dimensions are called as axes.
Indexing using index arrays
Indexing can be done in numpy by using an array as an index. In case of slice, a view or shallow copy of the array is returned but in index array a copy of the original array is returned. Numpy arrays can be indexed with other arrays or any other sequence with the exception of tuples. The last element is indexed by -1 second last by -2 and so on.
NumPy Array Indexing
Access Array Elements
Array indexing is the same as accessing an array element.
You can access an array element by referring to its index number.
The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Example
Get the first element from the following array:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
Example
Get the second element from the following array.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[1])
Example
Get third and fourth elements from the following array and add them.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])
Access 2-D Arrays
To access elements from 2-D arrays we can use comma separated integers representing the dimension and the index of the element.
Example
Access the 2nd element on 1st dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print(‘2nd element on 1st dim: ‘, arr[0, 1])
Example
Access the 5th element on 2nd dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print(‘5th element on 2nd dim: ‘, arr[1, 4])
Access 3-D Arrays
To access elements from 3-D arrays we can use comma separated integers representing the dimensions and the index of the element.
Example
Access the third element of the second array of the first array:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
Example Explained
arr[0, 1, 2]
prints the value 6
.
And this is why:
The first number represents the first dimension, which contains two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
Since we selected 0
, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]
The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
Since we selected 1
, we are left with the second array:
[4, 5, 6]
The third number represents the third dimension, which contains three values:
4
5
6
Since we selected 2
, we end up with the third value:
6
Negative Indexing
Use negative indexing to access an array from the end.
Example
Print the last element from the 2nd dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print(‘Last element from 2nd dim: ‘, arr[1, -1])
NumPy Array Slicing
Slicing arrays
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]
.
We can also define the step, like this: [start:end:step]
.
If we don’t pass start its considered 0
If we don’t pass end its considered length of array in that dimension
If we don’t pass step its considered 1
Example
Slice elements from index 1 to index 5 from the following array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Example
Slice elements from index 4 to the end of the array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Example
Slice elements from the beginning to index 4 (not included):
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
Negative Slicing
Use the minus operator to refer to an index from the end:
Example
Slice from the index 3 from the end to index 1 from the end:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
STEP
Use the step
value to determine the step of the slicing:
Example
Return every other element from index 1 to index 5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
Example
Return every other element from the entire array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])
Slicing 2-D Arrays
Example
From the second element, slice elements from index 1 to index 4 (not included):
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
Note: Remember that second element has index 1.
Example
From both elements, return index 2:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
Example
From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
Internal memory layout of an ndarray
NumPy Copies and Views
ndarray.view() method
The ndarray.view() method returns the new array object which contains the same content as the original array does. Since it is a new array object, changes made on this object do not reflect the original array.
Consider the following example.
Example
import numpy as np
a = np.array([[1,2,3,4],[9,0,2,3],[1,2,3,19]])
print(“Original Array:\n”,a)
print(“\nID of array a:”,id(a))
b = a.view()
print(“\nID of b:”,id(b))
print(“\nprinting the view b”)
print(b)
b.shape = 4,3;
print(“\nChanges made to the view b do not reflect a”)
print(“\nOriginal array \n”,a)
print(“\nview\n”,b)
Output:
Original Array:
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]ID of array a: 140280414447456ID of b: 140280287000656printing the view b
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]Changes made to the view b do not reflect aOriginal array
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]view
[[ 1 2 3]
[ 4 9 0]
[ 2 3 1]
[ 2 3 19]]
ndarray.copy() method
It returns the deep copy of the original array which doesn’t share any memory with the original array. The modification made to the deep copy of the original array doesn’t reflect the original array.
Consider the following example.
Example
import numpy as np
a = np.array([[1,2,3,4],[9,0,2,3],[1,2,3,19]])
print(“Original Array:\n”,a)
print(“\nID of array a:”,id(a))
b = a.copy()
print(“\nID of b:”,id(b))
print(“\nprinting the deep copy b”)
print(b)
b.shape = 4,3;
print(“\nChanges made to the copy b do not reflect a”)
print(“\nOriginal array \n”,a)
print(“\nCopy\n”,b)
Output:
Original Array:
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]ID of array a: 139895697586176ID of b: 139895570139296printing the deep copy b
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]Changes made to the copy b do not reflect aOriginal array
[[ 1 2 3 4]
[ 9 0 2 3]
[ 1 2 3 19]]Copy
[[ 1 2 3]
[ 4 9 0]
[ 2 3 1]
[ 2 3 19]]
NumPy Creating Arrays
NumPy is used to work with arrays. The array object in NumPy is called ndarray
.
We can create a NumPy ndarray
object by using the array()
function.
Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
type(): This built-in Python function tells us the type of the object passed to it. Like in above code it shows that arr
is numpy.ndarray
type.
To create an ndarray
, we can pass a list, tuple or any array-like object into the array()
method, and it will be converted into an ndarray
:
Example
Use a tuple to create a NumPy array:
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
nested array: are arrays that have arrays as their elements.
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
Create a 0-D array with value 42
import numpy as np
arr = np.array(42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Example
Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.
NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
Example
Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
Example
Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)