3.2. Array Computation#
NumPy is so important in the Pythonic data science world because it optimizes computation with arrays. Computational efficiency on NumPy arrays lies in the use of vectorized operations, or vector-based operations, generally implemented through NumPy’s universal array functions (ufuncs), that work on entire arrays at once rather than looping through individual elements. This approach is dramatically faster than traditional Python loops and is fundamental to efficient data science.
In this section, we’ll explore how NumPy’s universal functions (ufuncs) enable fast, efficient computation on arrays. You’ll learn why vectorization matters, how to use ufuncs effectively, and advanced techniques like broadcasting and boolean indexing.
What You’ll Learn
Vectorized vs. Iterative Operations: Compare performance and see why vectorization matters
Universal Functions (ufuncs): Explore unary and binary operations for arrays
Array Arithmetic: Perform mathematical operations on arrays
Aggregation: Compute summary statistics efficiently
Advanced Features: Broadcasting, indexing, and boolean logic
3.2.1. Iterative vs Vectorized Operations#
Vectorized operations in NumPy refer to performing operations on entire arrays at once, rather than iterating through individual elements using Python loops.
Vectorized operations are more efficient, and we will compare NumPy’s vectorized operations with Python list iteration.
3.2.1.1. Contender 1: List Iteration#
Let’s take a look at a general iteration (iterative loop) function example (note the code is meant to represent a computation for your Python programming learning, not a business context):
import numpy as np
Use the new numpy random number generator:
rng = np.random.default_rng(seed=42) ### default_rng() is a Random Number Generator
### it's a function in numpy's random module
### A random seed (or seed state, or just seed)
### is a number (or vector) used to initialize
### a pseudorandom number generator.
### A pseudorandom number generator's number sequence
### is completely determined by the seed: thus, if
### a pseudorandom number generator is later reinitialized
### with the same seed, it will produce the same sequence of numbers.
### https://en.wikipedia.org/wiki/Random_seed
Design a Python function that would loop through a sequence a number of times and does something.
def compute_reciprocals_iteration(values):
output = np.empty(len(values)) ### numpy.empty() creates an array of size by taking whatever
for i in range(len(values)): ### "garbage" values are present in the allocated memory at the time of creation.
output[i] = 1.0 / values[i]
return output ### return the output array
### test the function with a small array of values
values = rng.integers(1, 10, size=5) ### 5 numbers between 1 and 10 (exclusvie)
compute_reciprocals_iteration(values) ### e.g., array([7, 4, 2, 4, 8])
### when returned, Jupyter evaluates and outputs
array([1. , 0.14285714, 0.16666667, 0.25 , 0.25 ])
Note
np.random.default_rng() is the recommended function for creating a modern, isolated random-number generator object in NumPy.
42 is widely considered the conventional, or “default,” seed for NumPy’s random number generation for reproducibility.
3.2.1.2. Contender 2: Vectorized Operation (NO LOOP!)#
def compute_reciprocals_numpy(values):
return 1.0 / values ### no loop!
### test the function with the same array of values
compute_reciprocals_numpy(values) ### compare with compute_reciprocals(values)
array([1. , 0.14285714, 0.16666667, 0.25 , 0.25 ])
3.2.1.3. Who Wins?#
Now calling the function with a huge number (1000000) of times.
big_array = 1000000
values = rng.integers(1, 100, size=big_array)
%timeit compute_reciprocals_iteration(values)
%timeit compute_reciprocals_numpy(values)
1.73 s ± 318 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
849 μs ± 18.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Note
ms (millisecond) is 1/1000 second
μs (mu-second; microsecond) is one millionth (1/1000,000) second (10⁻⁶ s);
ns (nanosecond) is one billionth of a second (10⁻⁹ s)
-n
: Specifies the number of loops per run -r
: Specifies the number of times the entire timing experiment is repeated.
NumPy vectorized operation is clearly faster than list iteration. Why?
Lists store references to objects.
NumPy arrays store raw numbers in memory.
3.2.1.4. Round 2: sum()#
We have seen the performance of NumPy arrays (vectorized operations) in large-scale looping. Now we’ll compare list iteration with NumPy vectorized operations. Let’s choose the sum() function. (The following sample code is adapted from geeksforgeeks)
import numpy as np
import time
#################### list sum() ####################
lst = list(range(1000000)) ### create list
start_time = time.time() ### time starts
# iterative_sum = sum(range(15000)) ###geeksforgeeks got it wrong here.
iterative_sum = sum(lst) ### sum list
print("\nIterative sum:", iterative_sum)
print("Time taken by iterative sum:", time.time() - start_time)
iterative_time = time.time() - start_time
#################### numpy sum() ####################
arr = np.arange(1000000) ### create array
start_time = time.time() ### time starts
vectorized_sum = np.sum(arr) ### sum array
print("Vectorized sum:", vectorized_sum)
print("Time taken by vectorized sum:", time.time() - start_time)
vectorized_time = time.time() - start_time
print()
print(f"NumPy array sum is {iterative_time / vectorized_time:.2f} times faster of list sum.")
Iterative sum: 499999500000
Time taken by iterative sum: 0.0049169063568115234
Vectorized sum: 499999500000
Time taken by vectorized sum: 0.0005390644073486328
NumPy array sum is 8.56 times faster of list sum.
3.2.2. Universal Functions#
NumPy array operations written in a vectorized way compute the entire array (‘vector’) at once, without Python needing to loop over individual elements are called universal functions (ufuncs). Ufuncs is a convenient interface to this statically typed, compiled, vectorized operation.
As a very simple and practical example (as opposed to the conceptual experimentation earlier in this chapter), let’s say we would like to add the elements of two lists, which is similar to the tasks in the opening cases, only simpler. With Python, we would use:
a
forloopthe
zip()functionthe
append()method
x = list(range(1, 5))
y = list(range(5, 9))
z = []
for i, j in zip(x, y):
z.append(i + j)
print(z)
[6, 8, 10, 12]
With NumPy, vectorized element-wise scalar addition is much efficient, and the + operator below performs a ufunc as well.
x = np.arange(1, 5)
y = np.arange(5, 9)
z = x + y
print(z)
[ 6 8 10 12]
3.2.2.1. Array Arithmetic#
NumPy implements arithmetic operators using ufuncs, as shown in the table below. In addition to ufuncs, NumPy ufuncs also utilize Python’s native arithmetic operators, which can be more intuitive to use. For example, for element-wise operations like addition, you can simply use Python’s arithmetic operators directly on NumPy’s array object. This vectorized method shifts the looping process to NumPy’s compiled layer, resulting in significantly faster execution.
Arithmetic ufunc implementation:
Operation |
Operator |
ufunc (NumPy) |
Example |
Vectorized Meaning |
|---|---|---|---|---|
Addition |
|
|
|
Element-wise addition |
Subtraction |
|
|
|
Element-wise subtraction |
Multiplication |
|
|
|
Element-wise multiply (not matrix multiply) |
Division |
|
|
|
Element-wise division |
Floor Division |
|
|
|
Element-wise floor (quotient rounded down) |
Modulo |
|
|
|
Element-wise remainder |
Power |
|
|
|
Element-wise power (e.g., square each element) |
Examples of using the NumPy arithmetic ufuncs are as follows, and you may use the Python arithmetic operators in place of ufuncs for these vectorized operations, as listed above.
### summary: ufunc arithmetic scalor operators
x = np.arange(5)
print("x ==", x)
print("x + 5 ==", x + 5)
print("x - 5 ==", x - 5)
print("x * 2 ==", x * 2)
print("x / 2 ==", x / 2)
print("x // 2 ==", x // 2) # floor division
x == [0 1 2 3 4]
x + 5 == [5 6 7 8 9]
x - 5 == [-5 -4 -3 -2 -1]
x * 2 == [0 2 4 6 8]
x / 2 == [0. 0.5 1. 1.5 2. ]
x // 2 == [0 0 1 1 2]
For array-on-array operations using the Python arithmetic operators. For example, np.add(arr1, arr2) is the same as arr1 + arr2:
arr1 = np.arange(5)
arr2 = np.arange(5, 10)
print(np.add(arr1, arr2)) ### ufunc
print(arr1 + arr2) ### ufunc honors Python arithmetic operators
[ 5 7 9 11 13]
[ 5 7 9 11 13]
Now one example for each of the arithmetic ufuncs:
import numpy as np
arr1 = np.arange(5) ### 0...4
arr2 = np.arange(5, 10) ### 5...9
print(np.add(arr1, arr2))
print(np.subtract(arr1, arr2))
print(np.negative(arr1))
print(np.multiply(arr1, arr2))
print(np.divide(arr1, arr2))
print(np.floor_divide(arr1, 3))
print(np.ceil(np.floor_divide(arr1, 2)))
print(np.power(arr1, 3))
print(np.mod(arr1, 2))
[ 5 7 9 11 13]
[-5 -5 -5 -5 -5]
[ 0 -1 -2 -3 -4]
[ 0 6 14 24 36]
[0. 0.16666667 0.28571429 0.375 0.44444444]
[0 0 0 1 1]
[0 0 1 1 2]
[ 0 1 8 27 64]
[0 1 0 1 0]
3.2.2.1.1. Divided by Zero#
Unlike regular Python, you do not receive an error when you divide a number by zero(ZeroDivisionError: division by zero). Instead,
inf: You receive ainfand aRuntimeWarning: divide by zero encountered in divide.nan: You receive ananand aRuntimeWarning: invalid value encountered in divide.
In NumPy, np.nan represents “Not a Number,” a special floating-point value defined by the IEEE 754 standard. It signifies missing or undefined numerical data, such as the result of an indeterminate mathematical operation (e.g., dividing zero by zero). So,
Expression |
Result |
Why |
|---|---|---|
|
|
Overflow toward infinity |
|
|
Indeterminate form |
arr1 = np.arange(5)
arr2 = np.arange(5)
print(arr1[0] / arr2[0])
print(arr1 / arr2)
nan
[nan 1. 1. 1. 1.]
/tmp/ipykernel_1107684/1681753110.py:4: RuntimeWarning: invalid value encountered in scalar divide
print(arr1[0] / arr2[0])
/tmp/ipykernel_1107684/1681753110.py:5: RuntimeWarning: invalid value encountered in divide
print(arr1 / arr2)
arr1 = np.arange(5)
arr2 = np.arange(5, 10)
print(arr2[0] / arr1[0])
print(arr2 / arr1)
inf
[ inf 6. 3.5 2.66666667 2.25 ]
/tmp/ipykernel_1107684/1933371002.py:4: RuntimeWarning: divide by zero encountered in scalar divide
print(arr2[0] / arr1[0])
/tmp/ipykernel_1107684/1933371002.py:5: RuntimeWarning: divide by zero encountered in divide
print(arr2 / arr1)
3.2.2.2. More math ufuncs#
Numpy comes with many mathematical ufuncs, which are essentially just mathematical operations you can use to operate on the arrays.
abs()
sqrt()
exp()
sin()
log()
For abs(), let us start with Python abs().
%%expect TypeError
y = [-2, -1, 0, 1, 2]
abs(y)
TypeError: bad operand type for abs(): 'list'
Now, with NumPy, as long as it is a NumPy array, ufunc works as expected.
x = np.array([-2, -1, 0, 1, 2])
print(abs(x))
print(np.absolute(x)) ### ufunc
print(np.abs(x)) ### alias to ufunc
[2 1 0 1 2]
[2 1 0 1 2]
[2 1 0 1 2]
For sqrt(),
### make a 2-D array using reshape() method
arr = np.arange(12)
arr = arr.reshape(3, 4)
arr
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
### taking square roots
np.sqrt(arr)
array([[0. , 1. , 1.41421356, 1.73205081],
[2. , 2.23606798, 2.44948974, 2.64575131],
[2.82842712, 3. , 3.16227766, 3.31662479]])
3.2.3. Aggregations#
NumPy aggregation functions allow us to summarize many values into fewer, often a single value. Instead of looping through an array manually, functions like np.sum(), np.mean(), np.min(), and np.max() efficiently compute totals, averages, and other summary statistics. These operations are vectorized and optimized in compiled code, making them much faster than pure Python loops. When working with multi-dimensional arrays, we can also use the axis parameter to control the direction of the aggregation.
Aggregation = “Reduce many values into fewer values.”
Function |
What It Computes |
Example |
Axis Support |
Notes |
|---|---|---|---|---|
|
Total sum |
|
Yes |
Adds all elements |
|
Average |
|
Yes |
Arithmetic mean |
|
Median |
|
Yes |
Middle value |
|
Minimum |
|
Yes |
Smallest value |
|
Maximum |
|
Yes |
Largest value |
|
Standard deviation |
|
Yes |
Spread (default population) |
|
Variance |
|
Yes |
Spread squared |
|
Product |
|
Yes |
Multiply all elements |
|
Index of min |
|
Yes |
Position of smallest value |
|
Index of max |
|
Yes |
Position of largest value |
|
Any True? |
|
Yes |
Logical OR across elements |
|
All True? |
|
Yes |
Logical AND across elements |
3.2.3.1. Axis Support#
Some NumPy functions allow this axis= parameter. The axis you specify is the axis that gets eliminated.
axis=0 → remove row dimension → result keeps columns
axis=1 → remove column dimension → result keeps rows
import numpy as np
arr = np.arange(1, 7).reshape(2, 3)
arr
array([[1, 2, 3],
[4, 5, 6]])
Call |
Result |
Meaning |
|---|---|---|
|
21 |
Sum of all elements |
|
|
Column sums |
|
|
Row sums |
total = np.sum(arr)
ax_0 = np.sum(arr, axis=0)
ax_1 = np.sum(arr, axis=1)
print(ax_0)
print(ax_1)
[5 7 9]
[ 6 15]
3.2.3.2. sum()#
There are different syntaxes for summing a vector of a (1-D) NumPy array :
Python function
NumPy method
Array object method
arr = np.arange(10)
print(arr)
print(f"{sum(arr)}")
print(f"{np.sum(arr)}")
print(f"{arr.sum()}")
[0 1 2 3 4 5 6 7 8 9]
45
45
45
Now let’s use the NumPy random number generator. The results are almost the same in such simple test case:
import numpy as np
rng = np.random.default_rng(seed=42) ### ref: import random
### random.random()
rnd_arr = rng.random(1000)
print(f"Python sum: {sum(rnd_arr)}")
print(f"NumPy sum: {np.sum(rnd_arr)}")
Python sum: 497.17783852843127
NumPy sum: 497.17783852843195
Considering performance, NumPy’s compiled code version of the operation is computed much more quickly.
big_array = rng.random(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)
80 ms ± 3.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
253 μs ± 16.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
3.2.3.3. min() & max()#
Python has built-in min and max functions, and NumPy has corresponding functions for finding the minimum and maximum values of an array.
import numpy as np
rng = np.random.default_rng(seed=42) ### create random number generator (rng)
big_array = rng.random(1000000) ### use the rng to generate random numbers
Python min/max:
min(big_array), max(big_array) ### note this returns a tuple
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))
NumPy min/max:
np.min(big_array), np.max(big_array)
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))
There is a different syntax, though, which uses the methods of the array objects:
big_array.min(), big_array.max()
(np.float64(1.2500323287589765e-07), np.float64(0.9999997172035572))
Compare performance:
%timeit min(big_array)
%timeit np.min(big_array)
%timeit big_array.min()
55.1 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
169 μs ± 6.24 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
174 μs ± 2.78 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Based on the performance observations above, we should always use NumPy ufuncs.
3.2.3.4. argmin() and argmax#
import numpy as np
### 1D array
arr_1d = np.array([1, 2, 3, 4])
arr_1d[2] = 10 ### Update the third element (index 2)
print(arr_1d, "\n")
### 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_2d[1, 1] = 20 ### Update element at row 1, column 1
print(arr_2d)
[ 1 2 10 4]
[[ 1 2 3]
[ 4 20 6]]
### numpy.argmax() returns the indices of the maximum values along a specified axis of an array.
print(arr.argmax())
9
### numpy.argmin() returns the indices of the maximum values along a specified axis of an array.
print(arr.argmin())
0
arr_2d
array([[ 1, 2, 3],
[ 4, 20, 6]])
3.2.4. Broadcasting#
Broadcasting means NumPy can combine arrays of different shapes by “stretching” one to match the other. NumPy arrays differ from ordinary Python lists in their ability to broadcast. In broadcasting, the data is not copied; it’s a view of the original array. This avoids memory problems, but it changes the original array during the operations.
### same shape addition: default behavior
x = np.arange(0, 5) ### 0, 1, ,2 ,3, 4
y = np.arange(5, 10) ### 5, 6, ,7, 8, 9
x + y
array([ 5, 7, 9, 11, 13])
3.2.4.1. Broadcasting Rules#
Broadcasting follows specific rules to determine compatibility:
Two dimensions are compatible for broadcasting if they are equal or if one of them is 1.
Arrays are treated as if they have the shape of the maximum dimensions by “stretching” the smaller dimension with a size of 1 to match.
Simple broadcasting: value 100 below is stretched or broadcasted into the array [0, 1, 2, 3, 4] and the results are added.
### different shape: add a scalar to an 1-D array
### broadcasting
x + 100
array([100, 101, 102, 103, 104])
### x stays the same
x
array([0, 1, 2, 3, 4])
### observe what happens when we do addition with arrays; the operation is by vector:
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([10, 20, 30])
print(a + b) ### stretch b into the two vectors in a
[[11 22 33]
[14 25 36]]
Here above, we say that b is broadcasted over the second dimension to match the shape of a.
### setting a value with index range
arr = np.arange(0, 10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
### udpate array (broadcasting)
### operation not saved but array changed
arr[0:5]=10
### show array again:
arr
array([10, 10, 10, 10, 10, 5, 6, 7, 8, 9])
### Exercise: Basic Broadcasting with Scalar
### Given the Series:
import pandas as pd
sales = pd.Series([100, 150, 200, 120], index=['A', 'B', 'C', 'D'])
### Apply a 10% discount to all sales values.
### Your code starts here:
### Your code ends here.
A 90.0
B 135.0
C 180.0
D 108.0
dtype: float64
3.2.5. Masking#
NumPy masking is a technique for selecting or filtering array elements based on conditions. Instead of using indices, you create a boolean mask (an array of True/False values) to specify which elements you want to keep, modify, or analyze. In NumPy, masking is a method for selecting, modifying, or ignoring elements in an array based on a boolean condition.
Given an array, there are a host of useful operations you can do with Boolean using:
Comparison operators
Logical operators
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) ### create array
### Create a mask (boolean array)
mask = arr > 3 ### Result: [False, False, False, True, True, True]
print(mask)
### Apply the mask to select elements
result = arr[mask] # Result: [4, 5, 6]
print(result)
[False False False True True True]
[4 5 6]
Logical operations on arrays. Note that we use &, |. The difference with Python and and or is: and and or operate on the object as a whole, while & and | operate on the elements within the object.
### example of logical operations
import numpy as np
a = np.array([1, 5, 10, 15])
print((a > 3) & (a < 12)) ### array([False, True, True, False])
print((a < 3) | (a > 12)) ### array([ True, False, False, True])
print(~(a > 5)) ### array([ True, True, False, False])
[False True True False]
[ True False False True]
[ True True False False]
a = np.array([1, 5, 10, 15]).reshape(2,2)
print(a)
print()
print((a > 3) & (a < 12)) ### array([False, True, True, False])
print()
print((a < 3) | (a > 12)) ### array([ True, False, False, True])
print()
print(~(a > 5)) ### array([ True, True, False, False])
[[ 1 5]
[10 15]]
[[False True]
[ True False]]
[[ True False]
[False True]]
[[ True True]
[False False]]
arr = np.array([10, 20, 30])
res = arr > 15
print(res)
[False True True]
import numpy as np
arr1 = np.array([True, False, True, False])
arr2 = np.array([True, True, False, False])
arr3 = np.array([1, 5, 2, 8])
arr4 = np.array([3, 2, 2, 7])
# Logical operations
print(f"logical_and: {np.logical_and(arr1, arr2)}")
print(f"logical_or: {np.logical_or(arr1, arr2)}")
print(f"logical_not: {np.logical_not(arr1)}")
print(f"logical_xor: {np.logical_xor(arr1, arr2)}") ### XOR (eXclusive OR) is a logical operation
### that outputs true if and only if its inputs
### are different.
print()
# Comparison operations with f-string
print(f"equal: {np.equal(arr3, arr4)}")
print(f"greater: {np.greater(arr3, arr4)}")
logical_and: [ True False False False]
logical_or: [ True True True False]
logical_not: [False True False True]
logical_xor: [False True True False]
equal: [False False True False]
greater: [False True False True]
In the terminal, go to your project folder, activate the virtual environment, and go into Python to do the following:
(.venv) [user]@[host]ː~/workspace/thinkdsm$ python
Python 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4, 5])
>>> x > 3
array([False, False, False, True, True])
>>> x < 3
array([ True, True, False, False, False])
>>> x >= 3
array([False, False, True, True, True])
>>> x <=3
array([ True, True, True, False, False])
>>> x != 3 # not equal
array([ True, True, False, True, True])
>>> x == 3 # equal
array([False, False, True, False, False])
>>>
3.2.5.1. Filtering Values#
temperatures = np.array([68, 72, 95, 58, 88, 102])
hot_days = temperatures[temperatures > 85] # [95, 88, 102]
hot_days
array([ 95, 88, 102])
3.2.5.2. Modifying elements#
arr = np.array([1, 2, 3, 4, 5])
arr[arr > 3] = 0 ### Sets elements > 3 to 0
arr
array([1, 2, 3, 0, 0])
3.2.5.3. Counting elements#
arr = np.array([5, 10, 15, 20, 25])
count = np.sum(arr > 12) # Counts elements > 12 (returns 3)
print(count)
3
3.2.6. Fancy Indexing#
Fancy indexing, or vectorized indexing, allows you to select entire rows or columns out of order.
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 60, 70])
indices = np.array([0, 3, 5]) ### indices to select: 0th, 3rd, and 5th elements
selected_elements = arr[indices]
print(selected_elements) ### Output: [10 40 60]
[10 40 60]
import numpy as np
arr_2d = np.arange(12).reshape(3, 4) ### the 2-D array
print("Original 2D array:\n", arr_2d)
print()
### indexing
row_indices = np.array([0, 1, 2])
col_indices = np.array([1, 3, 0])
### Select elements at (0, 1), (1, 3), and (2, 0)
selected_elements = arr_2d[row_indices, col_indices] ### pay attention to the syntax here
print("Selected elements:", selected_elements) ### selected elements: [ 1 7 8]
Original 2D array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Selected elements: [1 7 8]
Another example of fancy indexing.
### set up matrix
arr2d = np.zeros((10,10))
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
### Length of array
arr_length = arr2d.shape[1]
arr_length
10
### set up array
for i in range(arr_length):
arr2d[i] = i ### Row 0 gets all 0s, row 1 gets all 1s, etc.
arr2d
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
[4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
[8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
[9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])
Now we will do fancy indexing with a list of indices — you can select rows in any order you want, and they’ll be returned in that exact order:
The double brackets
[[ ]]mean “use a list of indices.”[2,4,6,8]specifies which rows to select: row 2, row 4, row 6, row 8.Result: A new 4×10 array with those 4 rows stacked together in that order
### allows in any order
arr2d[[2,4,6,8]]
array([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])
Now, another fancy indexing:
Same idea, but notice the rows are in a different order: 6, 4, 2, 7.
NumPy doesn’t care about the order — it just grabs the rows in the sequence you specify.
Result: Row 6, then row 4, then row 2, then row 7 (in that exact sequence).
### allows in any order
arr2d[[6,4,2,7]]
array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])
3.2.7. Sorting Arrays#
3.2.7.1. Sorting lists#
Python sorted( ) function; returns a copy of sorted list
Python list sort( ) method (work on the object, change in place)
### sorted function on a list
lst = [3, 1, 4, 1, 5, 9, 2, 6]
sorted(lst) ### returns a sorted copy
[1, 1, 2, 3, 4, 5, 6, 9]
lst ### original object unchanged
[3, 1, 4, 1, 5, 9, 2, 6]
### list.sort method: sort in place
lst.sort() ### no return value, sorts in place
lst ### object changed
[1, 1, 2, 3, 4, 5, 6, 9]
### sorting a string
state = 'MISSOURI'
sorted(state)
['I', 'I', 'M', 'O', 'R', 'S', 'S', 'U']
state ### original object not changed
'MISSOURI'
3.2.7.2. Sorting NumPy arrays#
np.sort(): Both np.sort() and Python sorted( ) functions return a sorted copy of an array; original object unchanged.argsort(): argsort returns the indices of the sorted elements.
arr = np.array([1, 5, 4, 2, 3])
### python sorted( ) function
sorted(arr)
[np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5)]
### arr unchanged
arr
array([1, 5, 4, 2, 3])
### sort array in place
np.sort(arr) ### returns sorted
array([1, 2, 3, 4, 5])
arr ### original object not changed
array([1, 5, 4, 2, 3])
argsort returns the indices of the sorted elements:
np.argsort(arr)
array([0, 3, 4, 2, 1])
arr[0] is the smallest element; arr[1] has the largest element.