3.2. Array Computation#

Hide code cell source

import sys
from pathlib import Path

current = Path.cwd()
for parent in [current, *current.parents]:
    if (parent / '_config.yml').exists():
        project_root = parent  # ← Add project root, not chapters
        break
else:
    project_root = Path.cwd().parent.parent

sys.path.insert(0, str(project_root))

from shared import thinkpython, diagram, jupyturtle

NumPy is so important in the Pythonic data science world because it optimizes computation with arrays. Computational efficiency on NumPy arrays lies in the use of vectorized operations, or vector-based operations, generally implemented through NumPy’s universal array functions (ufuncs), that work on entire arrays at once rather than looping through individual elements. This approach is dramatically faster than traditional Python loops and is fundamental to efficient data science.

In this section, we’ll explore how NumPy’s universal functions (ufuncs) enable fast, efficient computation on arrays. You’ll learn why vectorization matters, how to use ufuncs effectively, and advanced techniques like broadcasting and boolean indexing.

What You’ll Learn

  1. Vectorized vs. Iterative Operations: Compare performance and see why vectorization matters

  2. Universal Functions (ufuncs): Explore unary and binary operations for arrays

  3. Array Arithmetic: Perform mathematical operations on arrays

  4. Aggregation: Compute summary statistics efficiently

  5. Advanced Features: Broadcasting, indexing, and boolean logic

3.2.1. Iterative vs Vectorized Operations#

Vectorized operations in NumPy refer to performing operations on entire arrays at once, rather than iterating through individual elements using Python loops.

Vectorized operations are more efficient, and we will compare NumPy’s vectorized operations with Python list iteration.

3.2.1.1. Contender 1: List Iteration#

Let’s take a look at a general iteration (iterative loop) function example (note the code is meant to represent a computation for your Python programming learning, not a business context):

import numpy as np

Use the new numpy random number generator:

rng = np.random.default_rng(seed=42)      ### default_rng() is a Random Number Generator
                                            ### it's a function in numpy's random module
                                            ### A random seed (or seed state, or just seed) 
                                            ### is a number (or vector) used to initialize 
                                            ### a pseudorandom number generator. 
                                            ### A pseudorandom number generator's number sequence 
                                            ### is completely determined by the seed: thus, if 
                                            ### a pseudorandom number generator is later reinitialized 
                                            ### with the same seed, it will produce the same sequence of numbers.
                                            ### https://en.wikipedia.org/wiki/Random_seed

Design a Python function that would loop through a sequence a number of times and does something.

def compute_reciprocals_iteration(values):
    output = np.empty(len(values))          ### numpy.empty() creates an array of size by taking whatever 
    for i in range(len(values)):                ### "garbage" values are present in the allocated memory at the time of creation.
        output[i] = 1.0 / values[i]
    return output                           ### return the output array
### test the function with a small array of values
values = rng.integers(1, 10, size=5)        ### 5 numbers between 1 and 10 (exclusvie)
compute_reciprocals_iteration(values)                     ### e.g., array([7, 4, 2, 4, 8])
                                            ### when returned, Jupyter evaluates and outputs
array([1.        , 0.14285714, 0.16666667, 0.25      , 0.25      ])

Note

np.random.default_rng() is the recommended function for creating a modern, isolated random-number generator object in NumPy.

42 is widely considered the conventional, or “default,” seed for NumPy’s random number generation for reproducibility.

3.2.1.2. Contender 2: Vectorized Operation (NO LOOP!)#

def compute_reciprocals_numpy(values):
    return 1.0 / values             ### no loop!
### test the function with the same array of values
compute_reciprocals_numpy(values)   ### compare with compute_reciprocals(values)
array([1.        , 0.14285714, 0.16666667, 0.25      , 0.25      ])

3.2.1.3. Who Wins?#

Now calling the function with a huge number (1000000) of times.

big_array = 1000000
values = rng.integers(1, 100, size=big_array)

%timeit compute_reciprocals_iteration(values)
%timeit compute_reciprocals_numpy(values)
1.73 s ± 318 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
849 μs ± 18.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Note

  • ms (millisecond) is 1/1000 second

  • μs (mu-second; microsecond) is one millionth (1/1000,000) second (10⁻⁶ s);

  • ns (nanosecond) is one billionth of a second (10⁻⁹ s)

  • -n : Specifies the number of loops per run

  • -r : Specifies the number of times the entire timing experiment is repeated.

NumPy vectorized operation is clearly faster than list iteration. Why?

  • Lists store references to objects.

  • NumPy arrays store raw numbers in memory.

3.2.1.4. Round 2: sum()#

We have seen the performance of NumPy arrays (vectorized operations) in large-scale looping. Now we’ll compare list iteration with NumPy vectorized operations. Let’s choose the sum() function. (The following sample code is adapted from geeksforgeeks)

import numpy as np
import time

#################### list sum() ####################
lst = list(range(1000000))   ### create list
start_time = time.time()     ### time starts
# iterative_sum = sum(range(15000))          ###geeksforgeeks got it wrong here.
iterative_sum = sum(lst)     ### sum list
print("\nIterative sum:", iterative_sum)
print("Time taken by iterative sum:", time.time() - start_time)
iterative_time = time.time() - start_time

#################### numpy sum() ####################
arr = np.arange(1000000)     ### create array
start_time = time.time()     ### time starts
vectorized_sum = np.sum(arr) ### sum array
print("Vectorized sum:", vectorized_sum)
print("Time taken by vectorized sum:", time.time() - start_time)
vectorized_time = time.time() - start_time

print()

print(f"NumPy array sum is {iterative_time / vectorized_time:.2f} times faster of list sum.")
Iterative sum: 499999500000
Time taken by iterative sum: 0.0049169063568115234
Vectorized sum: 499999500000
Time taken by vectorized sum: 0.0005390644073486328

NumPy array sum is 8.56 times faster of list sum.

3.2.2. Universal Functions#

NumPy array operations written in a vectorized way compute the entire array (‘vector’) at once, without Python needing to loop over individual elements are called universal functions (ufuncs). Ufuncs is a convenient interface to this statically typed, compiled, vectorized operation.

As a very simple and practical example (as opposed to the conceptual experimentation earlier in this chapter), let’s say we would like to add the elements of two lists, which is similar to the tasks in the opening cases, only simpler. With Python, we would use:

  • a for loop

  • the zip() function

  • the append() method

x = list(range(1, 5)) 
y = list(range(5, 9))
z = []

for i, j in zip(x, y):
  z.append(i + j)

print(z)
[6, 8, 10, 12]

With NumPy, vectorized element-wise scalar addition is much efficient, and the + operator below performs a ufunc as well.

x = np.arange(1, 5) 
y = np.arange(5, 9)

z = x + y

print(z)              
[ 6  8 10 12]

3.2.2.1. Array Arithmetic#

NumPy implements arithmetic operators using ufuncs, as shown in the table below. In addition to ufuncs, NumPy ufuncs also utilize Python’s native arithmetic operators, which can be more intuitive to use. For example, for element-wise operations like addition, you can simply use Python’s arithmetic operators directly on NumPy’s array object. This vectorized method shifts the looping process to NumPy’s compiled layer, resulting in significantly faster execution.

Arithmetic ufunc implementation:

Operation

Operator

ufunc (NumPy)

Example

Vectorized Meaning

Addition

+

np.add

a + b

Element-wise addition

Subtraction

-

np.subtract

a - b

Element-wise subtraction

Multiplication

*

np.multiply

a * b

Element-wise multiply (not matrix multiply)

Division

/

np.true_divide (np.divide)

a / b

Element-wise division

Floor Division

//

np.floor_divide

a // b

Element-wise floor (quotient rounded down)

Modulo

%

np.mod (np.remainder)

a % b

Element-wise remainder

Power

**

np.power

a ** 2

Element-wise power (e.g., square each element)

Examples of using the NumPy arithmetic ufuncs are as follows, and you may use the Python arithmetic operators in place of ufuncs for these vectorized operations, as listed above.

### summary: ufunc arithmetic scalor operators

x = np.arange(5)
print("x ==", x)
print("x + 5  ==", x + 5)
print("x - 5  ==", x - 5)
print("x * 2  ==", x * 2)
print("x / 2  ==", x / 2)
print("x // 2 ==", x // 2) # floor division
x == [0 1 2 3 4]
x + 5  == [5 6 7 8 9]
x - 5  == [-5 -4 -3 -2 -1]
x * 2  == [0 2 4 6 8]
x / 2  == [0.  0.5 1.  1.5 2. ]
x // 2 == [0 0 1 1 2]

For array-on-array operations using the Python arithmetic operators. For example, np.add(arr1, arr2) is the same as arr1 + arr2:

arr1 = np.arange(5)
arr2 = np.arange(5, 10)

print(np.add(arr1, arr2))     ### ufunc
print(arr1 + arr2)            ### ufunc honors Python arithmetic operators
[ 5  7  9 11 13]
[ 5  7  9 11 13]

Now one example for each of the arithmetic ufuncs:

import numpy as np

arr1 = np.arange(5)           ### 0...4
arr2 = np.arange(5, 10)       ### 5...9

print(np.add(arr1, arr2))
print(np.subtract(arr1, arr2))
print(np.negative(arr1))
print(np.multiply(arr1, arr2))
print(np.divide(arr1, arr2))
print(np.floor_divide(arr1, 3))
print(np.ceil(np.floor_divide(arr1, 2)))
print(np.power(arr1, 3))
print(np.mod(arr1, 2))
[ 5  7  9 11 13]
[-5 -5 -5 -5 -5]
[ 0 -1 -2 -3 -4]
[ 0  6 14 24 36]
[0.         0.16666667 0.28571429 0.375      0.44444444]
[0 0 0 1 1]
[0 0 1 1 2]
[ 0  1  8 27 64]
[0 1 0 1 0]

3.2.2.1.1. Divided by Zero#

Unlike regular Python, you do not receive an error when you divide a number by zero(ZeroDivisionError: division by zero). Instead,

  • inf: You receive a inf and a RuntimeWarning: divide by zero encountered in divide.

  • nan: You receive a nan and a RuntimeWarning: invalid value encountered in divide.

In NumPy, np.nan represents “Not a Number,” a special floating-point value defined by the IEEE 754 standard. It signifies missing or undefined numerical data, such as the result of an indeterminate mathematical operation (e.g., dividing zero by zero). So,

Expression

Result

Why

5 / 0

inf

Overflow toward infinity

0 / 0

nan

Indeterminate form

arr1 = np.arange(5)
arr2 = np.arange(5)

print(arr1[0] / arr2[0])
print(arr1 / arr2)
nan
[nan  1.  1.  1.  1.]
/tmp/ipykernel_1107684/1681753110.py:4: RuntimeWarning: invalid value encountered in scalar divide
  print(arr1[0] / arr2[0])
/tmp/ipykernel_1107684/1681753110.py:5: RuntimeWarning: invalid value encountered in divide
  print(arr1 / arr2)
arr1 = np.arange(5)
arr2 = np.arange(5, 10)

print(arr2[0] / arr1[0])
print(arr2 / arr1)
inf
[       inf 6.         3.5        2.66666667 2.25      ]
/tmp/ipykernel_1107684/1933371002.py:4: RuntimeWarning: divide by zero encountered in scalar divide
  print(arr2[0] / arr1[0])
/tmp/ipykernel_1107684/1933371002.py:5: RuntimeWarning: divide by zero encountered in divide
  print(arr2 / arr1)

3.2.2.2. More math ufuncs#

Numpy comes with many mathematical ufuncs, which are essentially just mathematical operations you can use to operate on the arrays.

  • abs()

  • sqrt()

  • exp()

  • sin()

  • log()

For abs(), let us start with Python abs().

%%expect TypeError

y = [-2, -1, 0, 1, 2]
abs(y)
TypeError: bad operand type for abs(): 'list'

Now, with NumPy, as long as it is a NumPy array, ufunc works as expected.

x = np.array([-2, -1, 0, 1, 2])

print(abs(x))
print(np.absolute(x))     ### ufunc
print(np.abs(x))          ### alias to ufunc
[2 1 0 1 2]
[2 1 0 1 2]
[2 1 0 1 2]

For sqrt(),

### make a 2-D array using reshape() method
arr = np.arange(12)
arr = arr.reshape(3, 4)
arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
### taking square roots
np.sqrt(arr)
array([[0.        , 1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974, 2.64575131],
       [2.82842712, 3.        , 3.16227766, 3.31662479]])

3.2.3. Aggregations#

NumPy aggregation functions allow us to summarize many values into fewer, often a single value. Instead of looping through an array manually, functions like np.sum(), np.mean(), np.min(), and np.max() efficiently compute totals, averages, and other summary statistics. These operations are vectorized and optimized in compiled code, making them much faster than pure Python loops. When working with multi-dimensional arrays, we can also use the axis parameter to control the direction of the aggregation.

Aggregation = “Reduce many values into fewer values.”

Function

What It Computes

Example

Axis Support

Notes

np.sum()

Total sum

np.sum(a)

Yes

Adds all elements

np.mean()

Average

np.mean(a)

Yes

Arithmetic mean

np.median()

Median

np.median(a)

Yes

Middle value

np.min()

Minimum

np.min(a)

Yes

Smallest value

np.max()

Maximum

np.max(a)

Yes

Largest value

np.std()

Standard deviation

np.std(a)

Yes

Spread (default population)

np.var()

Variance

np.var(a)

Yes

Spread squared

np.prod()

Product

np.prod(a)

Yes

Multiply all elements

np.argmin()

Index of min

np.argmin(a)

Yes

Position of smallest value

np.argmax()

Index of max

np.argmax(a)

Yes

Position of largest value

np.any()

Any True?

np.any(a)

Yes

Logical OR across elements

np.all()

All True?

np.all(a)

Yes

Logical AND across elements

3.2.3.1. Axis Support#

Some NumPy functions allow this axis= parameter. The axis you specify is the axis that gets eliminated.

  • axis=0 → remove row dimension → result keeps columns

  • axis=1 → remove column dimension → result keeps rows

import numpy as np

arr = np.arange(1, 7).reshape(2, 3)
arr
array([[1, 2, 3],
       [4, 5, 6]])

Call

Result

Meaning

np.sum(a)

21

Sum of all elements

np.sum(a, axis=0)

[5 7 9]

Column sums

np.sum(a, axis=1)

[6 15]

Row sums

total = np.sum(arr)
ax_0 = np.sum(arr, axis=0)
ax_1 = np.sum(arr, axis=1)

print(ax_0)
print(ax_1)
[5 7 9]
[ 6 15]

3.2.3.2. sum()#

There are different syntaxes for summing a vector of a (1-D) NumPy array :

  • Python function

  • NumPy method

  • Array object method

arr = np.arange(10)
print(arr)

print(f"{sum(arr)}")
print(f"{np.sum(arr)}")
print(f"{arr.sum()}")
[0 1 2 3 4 5 6 7 8 9]
45
45
45

Now let’s use the NumPy random number generator. The results are almost the same in such simple test case:

import numpy as np
rng = np.random.default_rng(seed=42)    ### ref: import random
                                        ### random.random()
rnd_arr = rng.random(1000)
print(f"Python sum: {sum(rnd_arr)}")
print(f"NumPy  sum: {np.sum(rnd_arr)}")
Python sum: 497.17783852843127
NumPy  sum: 497.17783852843195

Considering performance, NumPy’s compiled code version of the operation is computed much more quickly.

big_array = rng.random(1000000)

%timeit sum(big_array)
%timeit np.sum(big_array)
80 ms ± 3.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
253 μs ± 16.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

3.2.3.3. min() & max()#

Python has built-in min and max functions, and NumPy has corresponding functions for finding the minimum and maximum values of an array.

import numpy as np

rng = np.random.default_rng(seed=42)    ### create random number generator (rng)

big_array = rng.random(1000000)         ### use the rng to generate random numbers

Python min/max:

min(big_array), max(big_array)          ### note this returns a tuple
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))

NumPy min/max:

np.min(big_array), np.max(big_array)
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))

There is a different syntax, though, which uses the methods of the array objects:

big_array.min(), big_array.max()
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))

Compare performance:

%timeit min(big_array)
%timeit np.min(big_array)
%timeit big_array.min()
55.1 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
169 μs ± 6.24 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
174 μs ± 2.78 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Based on the performance observations above, we should always use NumPy ufuncs.

3.2.3.4. argmin() and argmax#

import numpy as np

### 1D array
arr_1d = np.array([1, 2, 3, 4])
arr_1d[2] = 10                  ### Update the third element (index 2)
print(arr_1d, "\n")

### 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_2d[1, 1] = 20               ### Update element at row 1, column 1
print(arr_2d)
[ 1  2 10  4] 

[[ 1  2  3]
 [ 4 20  6]]
### numpy.argmax() returns the indices of the maximum values along a specified axis of an array.

print(arr.argmax())
9
### numpy.argmin() returns the indices of the maximum values along a specified axis of an array.

print(arr.argmin())
0
arr_2d
array([[ 1,  2,  3],
       [ 4, 20,  6]])

3.2.4. Broadcasting#

Broadcasting means NumPy can combine arrays of different shapes by “stretching” one to match the other. NumPy arrays differ from ordinary Python lists in their ability to broadcast. In broadcasting, the data is not copied; it’s a view of the original array. This avoids memory problems, but it changes the original array during the operations.

### same shape addition: default behavior
x = np.arange(0, 5)     ### 0, 1, ,2 ,3, 4 
y = np.arange(5, 10)    ### 5, 6, ,7, 8, 9
x + y
array([ 5,  7,  9, 11, 13])

3.2.4.1. Broadcasting Rules#

Broadcasting follows specific rules to determine compatibility:

  • Two dimensions are compatible for broadcasting if they are equal or if one of them is 1.

  • Arrays are treated as if they have the shape of the maximum dimensions by “stretching” the smaller dimension with a size of 1 to match.

Simple broadcasting: value 100 below is stretched or broadcasted into the array [0, 1, 2, 3, 4] and the results are added.

### different shape: add a scalar to an 1-D array
### broadcasting

x + 100
array([100, 101, 102, 103, 104])
### x stays the same
x
array([0, 1, 2, 3, 4])
### observe what happens when we do addition with arrays; the operation is by vector:

a = np.array([[1, 2, 3],
              [4, 5, 6]])

b = np.array([10, 20, 30])

print(a + b)     ### stretch b into the two vectors in a
[[11 22 33]
 [14 25 36]]

Here above, we say that b is broadcasted over the second dimension to match the shape of a.

### setting a value with index range 

arr = np.arange(0, 10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
### udpate array (broadcasting)
### operation not saved but array changed

arr[0:5]=10

### show array again:
arr
array([10, 10, 10, 10, 10,  5,  6,  7,  8,  9])
### Exercise: Basic Broadcasting with Scalar
### Given the Series:
import pandas as pd

sales = pd.Series([100, 150, 200, 120], index=['A', 'B', 'C', 'D'])

### Apply a 10% discount to all sales values.
### Your code starts here:



### Your code ends here.

Hide code cell source

sales * 90/100
A     90.0
B    135.0
C    180.0
D    108.0
dtype: float64

3.2.5. Masking#

NumPy masking is a technique for selecting or filtering array elements based on conditions. Instead of using indices, you create a boolean mask (an array of True/False values) to specify which elements you want to keep, modify, or analyze. In NumPy, masking is a method for selecting, modifying, or ignoring elements in an array based on a boolean condition.

Given an array, there are a host of useful operations you can do with Boolean using:

  • Comparison operators

  • Logical operators

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])   ### create array

### Create a mask (boolean array)
mask = arr > 3             ### Result: [False, False, False, True, True, True]
print(mask)

### Apply the mask to select elements
result = arr[mask]         # Result: [4, 5, 6]
print(result)
[False False False  True  True  True]
[4 5 6]

Logical operations on arrays. Note that we use &, |. The difference with Python and and or is: and and or operate on the object as a whole, while & and | operate on the elements within the object.

### example of logical operations

import numpy as np

a = np.array([1, 5, 10, 15])
print((a > 3) & (a < 12))           ### array([False,  True,  True, False])
print((a < 3) | (a > 12))           ### array([ True, False, False,  True])
print(~(a > 5))                     ### array([ True,  True, False, False])
[False  True  True False]
[ True False False  True]
[ True  True False False]
a = np.array([1, 5, 10, 15]).reshape(2,2)
print(a)
print()

print((a > 3) & (a < 12))           ### array([False,  True,  True, False])
print()
print((a < 3) | (a > 12))           ### array([ True, False, False,  True])
print()
print(~(a > 5))                     ### array([ True,  True, False, False])
[[ 1  5]
 [10 15]]

[[False  True]
 [ True False]]

[[ True False]
 [False  True]]

[[ True  True]
 [False False]]
arr = np.array([10, 20, 30])
res = arr > 15
print(res)
[False  True  True]
import numpy as np

arr1 = np.array([True, False, True, False])
arr2 = np.array([True, True, False, False])
arr3 = np.array([1, 5, 2, 8])
arr4 = np.array([3, 2, 2, 7])

# Logical operations
print(f"logical_and: {np.logical_and(arr1, arr2)}")
print(f"logical_or:  {np.logical_or(arr1, arr2)}")
print(f"logical_not: {np.logical_not(arr1)}")
print(f"logical_xor: {np.logical_xor(arr1, arr2)}")     ### XOR (eXclusive OR) is a logical operation 
                                                        ### that outputs true if and only if its inputs 
                                                        ### are different. 

print()

# Comparison operations with f-string
print(f"equal: {np.equal(arr3, arr4)}")
print(f"greater: {np.greater(arr3, arr4)}")
logical_and: [ True False False False]
logical_or:  [ True  True  True False]
logical_not: [False  True False  True]
logical_xor: [False  True  True False]

equal: [False False  True False]
greater: [False  True False  True]

In the terminal, go to your project folder, activate the virtual environment, and go into Python to do the following:

(.venv) [user]@[host]ː~/workspace/thinkdsm$ python
Python 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4, 5])
>>> x > 3
array([False, False, False,  True,  True])
>>> x < 3
array([ True,  True, False, False, False])
>>> x >= 3
array([False, False,  True,  True,  True])
>>> x <=3
array([ True,  True,  True, False, False])
>>> x != 3 # not equal
array([ True,  True, False,  True,  True])
>>> x == 3 # equal
array([False, False,  True, False, False])
>>>

3.2.5.1. Filtering Values#

temperatures = np.array([68, 72, 95, 58, 88, 102])

hot_days = temperatures[temperatures > 85]  # [95, 88, 102]
hot_days
array([ 95,  88, 102])

3.2.5.2. Modifying elements#

arr = np.array([1, 2, 3, 4, 5])

arr[arr > 3] = 0               ### Sets elements > 3 to 0
arr
array([1, 2, 3, 0, 0])

3.2.5.3. Counting elements#

arr = np.array([5, 10, 15, 20, 25])

count = np.sum(arr > 12)  # Counts elements > 12 (returns 3)
print(count)
3

3.2.6. Fancy Indexing#

Fancy indexing, or vectorized indexing, allows you to select entire rows or columns out of order.

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60, 70])
indices = np.array([0, 3, 5])    ### indices to select: 0th, 3rd, and 5th elements

selected_elements = arr[indices]
print(selected_elements)         ### Output: [10 40 60]
[10 40 60]
import numpy as np

arr_2d = np.arange(12).reshape(3, 4)     ### the 2-D array
print("Original 2D array:\n", arr_2d)

print()

### indexing
row_indices = np.array([0, 1, 2])
col_indices = np.array([1, 3, 0])

### Select elements at (0, 1), (1, 3), and (2, 0)
selected_elements = arr_2d[row_indices, col_indices]     ### pay attention to the syntax here
print("Selected elements:", selected_elements)           ### selected elements: [ 1  7  8]
Original 2D array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Selected elements: [1 7 8]

Another example of fancy indexing.

### set up matrix
arr2d = np.zeros((10,10))
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
### Length of array

arr_length = arr2d.shape[1]    
arr_length
10
### set up array

for i in range(arr_length):
    arr2d[i] = i              ### Row 0 gets all 0s, row 1 gets all 1s, etc.
    
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])

Now we will do fancy indexing with a list of indices — you can select rows in any order you want, and they’ll be returned in that exact order:

  1. The double brackets [[ ]] mean “use a list of indices.”

  2. [2,4,6,8] specifies which rows to select: row 2, row 4, row 6, row 8.

  3. Result: A new 4×10 array with those 4 rows stacked together in that order

### allows in any order
arr2d[[2,4,6,8]]
array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])

Now, another fancy indexing:

  1. Same idea, but notice the rows are in a different order: 6, 4, 2, 7.

  2. NumPy doesn’t care about the order — it just grabs the rows in the sequence you specify.

  3. Result: Row 6, then row 4, then row 2, then row 7 (in that exact sequence).

### allows in any order
arr2d[[6,4,2,7]]
array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

3.2.7. Sorting Arrays#

3.2.7.1. Sorting lists#

  • Python sorted( ) function; returns a copy of sorted list

  • Python list sort( ) method (work on the object, change in place)

### sorted function on a list

lst = [3, 1, 4, 1, 5, 9, 2, 6]

sorted(lst)      ### returns a sorted copy
[1, 1, 2, 3, 4, 5, 6, 9]
lst              ### original object unchanged
[3, 1, 4, 1, 5, 9, 2, 6]
### list.sort method: sort in place

lst.sort()       ### no return value, sorts in place
lst              ### object changed
[1, 1, 2, 3, 4, 5, 6, 9]
### sorting a string 

state = 'MISSOURI' 
sorted(state)
['I', 'I', 'M', 'O', 'R', 'S', 'S', 'U']
state            ### original object not changed
'MISSOURI'

3.2.7.2. Sorting NumPy arrays#

  • np.sort(): Both np.sort() and Python sorted( ) functions return a sorted copy of an array; original object unchanged.

  • argsort(): argsort returns the indices of the sorted elements.

arr = np.array([1, 5, 4, 2, 3])
### python sorted( ) function

sorted(arr)
[np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5)]
### arr unchanged

arr
array([1, 5, 4, 2, 3])
### sort array in place

np.sort(arr)     ### returns sorted 
array([1, 2, 3, 4, 5])
arr              ### original object not changed
array([1, 5, 4, 2, 3])

argsort returns the indices of the sorted elements:

np.argsort(arr)
array([0, 3, 4, 2, 1])

arr[0] is the smallest element; arr[1] has the largest element.