6.2. Array Computation#

NumPy is so important in the Pythonic data science world because it optimizes computation with arrays. Computational efficiency on NumPy arrays lies in the use of vectorized operations, generally implemented through NumPy’s universal array functions (ufuncs).

  1. Vectorized Operations

  2. Array Arithmetic

  3. Aggregation

6.2.1. Vectorized vs. Iterative Operation#

Vectorized operations are more efficient, and here we will compare NumPy’s vectorized operation with the general iteration of Python lists.

6.2.1.1. General Iteration#

Let’s take a look at a general iteration (loop 5 times) example (note the code is meant to represent a computation for your Python programming learning, not a business context):

import numpy as np
rng = np.random.default_rng(seed=1701)      ### default_rng() is a Random Number Generator
                                            ### it's a function in numpy's random module
                                            ### A random seed (or seed state, or just seed) 
                                            ### is a number (or vector) used to initialize 
                                            ### a pseudorandom number generator. 
                                            ### A pseudorandom number generator's number sequence 
                                            ### is completely determined by the seed: thus, if 
                                            ### a pseudorandom number generator is later reinitialized 
                                            ### with the same seed, it will produce the same sequence of numbers.
                                            ### https://en.wikipedia.org/wiki/Random_seed
def compute_reciprocals(values):
    output = np.empty(len(values))          ### numpy.empty() creates an array by taking whatever "garbage" values present in the allocated memory at the time of creation.
    for i in range(len(values)):            ### for loop update the elements of output array
        output[i] = 1.0 / values[i]
    return output                           ### return the output array
        
values = rng.integers(1, 10, size=5)
compute_reciprocals(values)
                                            ### when returned, Jupyter evaluates and outputs
array([0.11111111, 0.25      , 1.        , 0.33333333, 0.125     ])
### explains the code here
### rng function with seed
rng = np.random.default_rng(seed=1701)   
### the values
values = rng.integers(1, 10, size=5)
values
array([9, 4, 1, 3, 8])
### the output 
np.empty(len(values))
array([4.4e-323, 2.0e-323, 4.9e-324, 1.5e-323, 4.0e-323])
### output 
output = np.empty(len(values))
### loop through to update the output array
for i in range(len(values)):
   output[i] = 1.0 / values[i]
print( output)
[0.11111111 0.25       1.         0.33333333 0.125     ]

Now calling the function with a huge number: 1000000

big_array = rng.integers(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

### μs is one-millionth of a second (10⁻⁶ s); a ns is one billionth of a second (10⁻⁹ s)
648 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

6.2.1.2. List vs Array Performance#

An example comparing list iteration and vectorized operation.

(The following sample code is adapted from geeksforgeeks .)

import numpy as np
import time

lst = list(range(1000000))   ### create list
start_time = time.time()     ### time starts
# iterative_sum = sum(range(15000))          ###geeksforgeeks got it wrong here.
iterative_sum = sum(lst)     ### sum list
print("\nIterative sum:", iterative_sum)
print("Time taken by iterative sum:", time.time() - start_time)
iterative_time = time.time() - start_time

arr = np.arange(1000000)     ### create array
start_time = time.time()     ### time starts
vectorized_sum = np.sum(arr) ### sum array
print("Vectorized sum:", vectorized_sum)
print("Time taken by vectorized sum:", time.time() - start_time)
vectoized_time = time.time() - start_time

print()

print(f"NumPy array sum is {iterative_time / vectoized_time:.2f} times faster of list sum.")
Iterative sum: 499999500000
Time taken by iterative sum: 0.002774953842163086
Vectorized sum: 499999500000
Time taken by vectorized sum: 0.00019979476928710938

NumPy array sum is 13.01 times faster of list sum.

6.2.1.3. List vs. Array Basic Operations#

As an observation, take a look at additon and multiplcation in lists and arrays.

### this is a range
range(5)

### type of range
type(range(5))

### use the list function/constructor to cast the range into a list
list(range(5))

### create a list variable to reference the object
lst = list(range(5))

### evaluate
lst
[0, 1, 2, 3, 4]

A list multiplication (for addition we use append() and extend()):

print(lst * 2)
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

NumPy arrays does vectorized operations

arr = np.array([1, 2, 3, 4, 5])
print(arr * 2)              # multiply each element by 2
print(arr + 5)              # add 5 to each element
[ 2  4  6  8 10]
[ 6  7  8  9 10]
### array stays the same so far
arr
array([1, 2, 3, 4, 5])

6.2.2. Array Arithmetic ufuncs#

Ufuncs come in two types: unary ufuncs, which work with a single input, and binary ufuncs, which work with two inputs. We’ll explore examples of both types in this section.

NumPy implements arithmetic operators with ufuncs as follows. However, NumPy ufuncs also utilize Python’s native arithmetic operators, which can be more intuitive to use. For element-wise operations like addition, you can simply use Python’s arithmetic operators directly on NumPy’s array object.

This vectorized method shifts the looping process to NumPy’s compiled layer, resulting in significantly faster execution.

+  np.add          Addition (e.g., 1 + 1 = 2)
-  np.subtract     Subtraction (e.g., 3 - 2 = 1)
-  np.negative     Unary negation (e.g., -2)
*  np.multiply     Multiplication (e.g., 2 * 3 = 6)
/  np.divide       Division (e.g., 3 / 2 = 1.5)
// np.floor_divide Floor division (e.g., 3 // 2 = 1)
** np.power        Exponentiation (e.g., 2 ** 3 = 8)
%  np.mod          Modulus/remainder (e.g., 9 % 4 = 1)
arr1 = np.arange(5)           ### 0...4
arr2 = np.arange(5, 10, 1)    ### 5...9

print(np.add(arr1, arr2))
print(np.subtract(arr1, arr2))
print(np.negative(arr1))
print(np.multiply(arr1, arr2))
print(np.divide(arr1, arr2))
print(np.floor_divide(arr1, 3))
print(np.ceil(np.floor_divide(arr1, 2)))
print(np.power(arr1, 3))
print(np.mod(arr1, 2))
[ 5  7  9 11 13]
[-5 -5 -5 -5 -5]
[ 0 -1 -2 -3 -4]
[ 0  6 14 24 36]
[0.         0.16666667 0.28571429 0.375      0.44444444]
[0 0 0 1 1]
[0 0 1 1 2]
[ 0  1  8 27 64]
[0 1 0 1 0]

Similar vectorized operations can be achieved using the Python arithmetic operators.

### create array using np.arange() function
arr = np.arange(0,10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
### addition
arr + arr
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
### multiplication
arr * arr
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
### substration
arr - arr
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
### check array: unchanged after operations because we didn't change the reference
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
### division by zero!!! 
### However, unlike Python, you do not receive an error.
### Just replaced with nan
arr / arr
/var/folders/g4/v24tl8t172g5d7rzsd63y51w0000gp/T/ipykernel_10065/1831715423.py:4: RuntimeWarning: invalid value encountered in divide
  arr / arr
array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In NumPy, np.nan represents “Not a Number,” a special floating-point value defined by the IEEE 754 standard. It signifies missing or undefined numerical data, such as the result of an indeterminate mathematical operation (e.g., dividing zero by zero).

### also warning, but not an error, instead infinity
1 / arr
/var/folders/g4/v24tl8t172g5d7rzsd63y51w0000gp/T/ipykernel_10065/2426921330.py:2: RuntimeWarning: divide by zero encountered in divide
  1 / arr
array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

Seeing the operation in the terminal, you will see the inf and nan as output.

>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> arr[0]/1
np.float64(0.0)
>>> print(arr[0]/1)
0.0
>>> print(arr[1]/0)
<python-input-13>:1: RuntimeWarning: divide by zero encountered in scalar divide
inf
>>> 
>>> print(arr[0]/0)
<python-input-14>:1: RuntimeWarning: invalid value encountered in scalar divide
nan
### exponents
arr**3
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
### summary: ufunc arithmetic operators

x = np.arange(5)
print("x =", x)
print("x + 5  =", x + 5)
print("x - 5  =", x - 5)
print("x * 2  =", x * 2)
print("x / 2  =", x / 2)
print("x // 2 =", x // 2) # floor division
x = [0 1 2 3 4]
x + 5  = [5 6 7 8 9]
x - 5  = [-5 -4 -3 -2 -1]
x * 2  = [0 2 4 6 8]
x / 2  = [0.  0.5 1.  1.5 2. ]
x // 2 = [0 0 1 1 2]
### Python sum() vs np.sum()
arr_a = np.arange(5)
arr_b = np.arange(5, 10)
print(arr_a)
print(arr_b)
# # print(np.sum(arr1, arr2))
# print(np.sum(arr_a))  ### np.sum(), much faster
# print(sum(arr_a))     ### Python sum, the same as np.sum() with simple cases
# print(arr_a + arr_b)  ### ufunc
### if you Python sum(arr), you get an error: can't iterate
print(sum(arr_a, arr_b))     ### python sum
print(np.sum([arr_a, arr_b], axis=0))
[0 1 2 3 4]
[5 6 7 8 9]
[15 16 17 18 19]
[ 5  7  9 11 13]

6.2.2.1. More math ufuncs#

Numpy comes with many mathematical ufuncs, which are essentially just mathematical operations you can use to operate on the arrays.

  • sqrt()

  • exp()

  • sin()

  • log()

### make a 2-D array
arr = np.arange(12)
arr = arr.reshape(3, 4)
arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
### taking square roots
np.sqrt(arr)
array([[0.        , 1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974, 2.64575131],
       [2.82842712, 3.        , 3.16227766, 3.31662479]])

np.exp(): The numpy exponential function, \(e^x\), returns Euler’s number \(e\) (\(\approx 2.718\)) raised to the power of \(x\).

### exponent 1/3: np.exp()
np.exp(arr)
array([[1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01],
       [5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03],
       [2.98095799e+03, 8.10308393e+03, 2.20264658e+04, 5.98741417e+04]])
print(np.exp(3))
import math
print((math.e)**3)
print(math.e * math.e * math.e)
20.085536923187668
20.085536923187664
20.085536923187664
### exponent 2/3: you can also do 
np.power(2, 3)
np.int64(8)
### exponent 2/3: or use the power function
np.power(arr, 3)
array([[   0,    1,    8,   27],
       [  64,  125,  216,  343],
       [ 512,  729, 1000, 1331]])
### exponential 3/3: or the ** operator
arr**3
array([[   0,    1,    8,   27],
       [  64,  125,  216,  343],
       [ 512,  729, 1000, 1331]])
np.sin(arr)
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021]])
np.log(arr)
/var/folders/g4/v24tl8t172g5d7rzsd63y51w0000gp/T/ipykernel_10065/3120950136.py:1: RuntimeWarning: divide by zero encountered in log
  np.log(arr)
array([[      -inf, 0.        , 0.69314718, 1.09861229],
       [1.38629436, 1.60943791, 1.79175947, 1.94591015],
       [2.07944154, 2.19722458, 2.30258509, 2.39789527]])

6.2.3. Aggregation#

Function      NaN-safe version  Description
np.sum        np.nansum         Compute sum of elements
np.prod       np.nanprod        Compute product of elements
np.mean       np.nanmean        Compute mean of elements
np.std        np.nanstd         Compute standard deviation
np.var        np.nanvar         Compute variance
np.min        np.nanmin         Find minimum value
np.max        np.nanmax         Find maximum value
np.argmin     np.nanargmin      Find index of minimum value
np.argmax     np.nanargmax      Find index of maximum value
np.median     np.nanmedian      Compute median of elements
np.percentile np.nanpercentile  Compute rank-based statistics of elements
np.any        N/A Evaluate whether any elements are true
np.all        N/A Evaluate whether all elements are true
arr = np.arange(1, 6)
arr
array([1, 2, 3, 4, 5])
### np.max

print(np.sum(arr))
15
### np.max: same as arr.max() method

print(np.max(arr))
5
### np.min

print(np.min(arr))
1
### numpy.argmax() returns the indices of the maximum values along a specified axis of an array.

print(arr.argmax())
4
### numpy.argmin() returns the indices of the maximum values along a specified axis of an array.

print(arr.argmin())
0

6.2.4. Advanced Features#

6.2.4.1. Broadcasting#

Broadcasting means NumPy can combine arrays of different shapes by “stretching” one to match the other. NumPy arrays differ from normal Python lists due to their ability to broadcast. In broadcasting, the data is not copied; it’s a view of the original array. This avoids memory problems, but changes the original array with operations.

### broadcasting can be surprising

x = np.arange(0, 5)     ### 0, 1, ,2 ,3, 4 
y = np.arange(5, 10)    ### 5, 6, ,7, 8, 9
x + y
array([ 5,  7,  9, 11, 13])
### observe what happens when we do addition with arrays; the operation is by vector:

a = np.array([[1, 2, 3],
              [4, 5, 6]])

b = np.array([10, 20, 30])

print(a + b)     ### stretch b into the two vectors in a
[[11 22 33]
 [14 25 36]]

Observe how the original array is changed during broadcasting:

### setting a value with index range 

arr = np.arange(0, 10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
### udpate array (broadcasting)
### operation not saved but array changed

arr[0:5]=10

### show array again:
arr
array([10, 10, 10, 10, 10,  5,  6,  7,  8,  9])

Note that when we update the array using broadcasting, the original array changes without us saving the array to another variable.

### keep moving:
### slicing

slice_of_arr = arr[0:3]
slice_of_arr
array([10, 10, 10])
### change the slice

slice_of_arr[:]=99
slice_of_arr
array([99, 99, 99])

Now note the changes also occur in our original array!

arr
array([99, 99, 99, 10, 10,  5,  6,  7,  8,  9])

Make a copy of the original array if you need to:

### to get a copy of an array, you need to copy it explicitly:
arr_copy = arr.copy()

### You need to do this from the beginning if you want to keep the original array

6.2.4.2. Update Array Elements#

Updating elements in a NumPy array can be achieved through various indexing methods:

  1. Basic Indexing (Single Element Update): You can directly access an element using its index and assign a new value.

  2. Advanced Indexing (Multiple Element Update): You can use arrays of indices to update multiple elements simultaneously.

  3. Boolean Indexing (Conditional Update): You can create a Boolean mask based on a condition and use it to update elements that satisfy the condition.

6.2.4.2.1. Basic Indexing#

import numpy as np

### 1D array
arr_1d = np.array([1, 2, 3, 4])
arr_1d[2] = 10                  ### Update the third element (index 2)
print(arr_1d, "\n")

### 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_2d[1, 1] = 20               ### Update element at row 1, column 1
print(arr_2d)
[ 1  2 10  4] 

[[ 1  2  3]
 [ 4 20  6]]
arr_2d
array([[ 1,  2,  3],
       [ 4, 20,  6]])

6.2.4.2.2. Advanced Indexing#

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d, "\n")

arr_2d[[0, 1], [1, 1]] = [10, 20] ### update elements at (0,1) and (1,1)
print(arr_2d)
[[1 2 3]
 [4 5 6]] 

[[ 1 10  3]
 [ 4 20  6]]

6.2.4.2.3. Boolean Indexing#

arr = np.array([1, 5, 8, 3, 9])
arr[arr > 5] = 0    ### replace all elements greater than 5 with 0
print(arr)
[1 5 0 3 0]

6.2.4.3. Comparisons and Boolean Logic#

Given an array, there are a host of useful operations you can do with Boolean.

import numpy as np

arr1 = np.array([True, False, True, False])
arr2 = np.array([True, True, False, False])
arr3 = np.array([1, 5, 2, 8])
arr4 = np.array([3, 2, 2, 7])

# Logical operations
print(f"logical_and: {np.logical_and(arr1, arr2)}")
print(f"logical_or:  {np.logical_or(arr1, arr2)}")
print(f"logical_not: {np.logical_not(arr1)}")
print(f"logical_xor: {np.logical_xor(arr1, arr2)}")     ### XOR (eXclusive OR) is a logical operation 
                                                        ### that outputs true if and only if its inputs 
                                                        ### are different. 

print()

# Comparison operations with f-string
print(f"equal: {np.equal(arr3, arr4)}")
print(f"greater: {np.greater(arr3, arr4)}")
logical_and: [ True False False False]
logical_or:  [ True  True  True False]
logical_not: [False  True False  True]
logical_xor: [False  True  True False]

equal: [False False  True False]
greater: [False  True False  True]

In the terminal, go to your project folder, activate the virtual environment, and go into Python to do the following:

(.venv) [user]@[host]ː~/workspace/thinkdsm$ python
Python 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4, 5])
>>> x > 3
array([False, False, False,  True,  True])
>>> x < 3
array([ True,  True, False, False, False])
>>> x >= 3
array([False, False,  True,  True,  True])
>>> x <=3
array([ True,  True,  True, False, False])
>>> x != 3 # not equal
array([ True,  True, False,  True,  True])
>>> x == 3 # equal
array([False, False,  True, False, False])
>>>

6.2.4.3.1. Boolean Selection#

Let’s briefly go over how to use brackets for selection based on comparison operators.

arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr > 4
array([False, False, False, False, False,  True,  True,  True,  True,
        True])
bool_arr = arr > 4
bool_arr
array([False, False, False, False, False,  True,  True,  True,  True,
        True])
arr[bool_arr]
array([5, 6, 7, 8, 9])
arr[arr > 2]
array([3, 4, 5, 6, 7, 8, 9])
x = 2
arr[arr > x]
array([3, 4, 5, 6, 7, 8, 9])

6.2.4.4. Fancy Indexing#

Fancy indexing allows you to select entire rows or columns out of order.

### set up matrix
arr2d = np.zeros((10,10))
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
#Length of array

arr_length = arr2d.shape[1]
arr_length
10
### set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])
### allows in any order
arr2d[[2,4,6,8]]
array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])
### allows in any order
arr2d[[6,4,2,7]]
array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

6.2.4.5. Sorting Arrays#

6.2.4.5.1. Sorting lists#

  • Python sorted( ) function; returns a copy of sorted list

  • Python list sort( ) method (work on the object, change in place)

### sorted function on a list
lst = [3, 1, 4, 1, 5, 9, 2, 6]
sorted(lst)      ### returns a sorted copy
[1, 1, 2, 3, 4, 5, 6, 9]
lst              ### original object unchanged
[3, 1, 4, 1, 5, 9, 2, 6]
state = 'MISSOURI' 
sorted(state)
['I', 'I', 'M', 'O', 'R', 'S', 'S', 'U']
state            ### original object not changed
'MISSOURI'
### list.sort method: sort in place
lst.sort()       ### no return value, sorts in place
lst              ### object changed
[1, 1, 2, 3, 4, 5, 6, 9]

6.2.4.5.2. Sorting NumPy arrays#

Both np.sort() and Python sorted( ) functions return a sorted copy of an array; original object unchanged.

arr = np.array([1, 5, 4, 2, 3])
### pythoon sorted( ) function

sorted(arr)
[np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5)]
### arr unchanged

arr
array([1, 5, 4, 2, 3])
### sort array in place

np.sort(arr)     ### returns sorted 
array([1, 2, 3, 4, 5])
arr              ### original object not changed
array([1, 5, 4, 2, 3])