2.3. Lists#
We will learn the following about list:
features
basic operations
functions
methods
In Python, a list is a sequence type, meaning it is an ordered collection of items that can store multiple values in a single variable. Also, a list is a mutable sequence.
Lists are extremely flexible because they can hold elements of different data types, such as numbers, strings, or even other lists, and can be changed (mutable) after creation by adding, removing, or modifying items. Like a string, a list is a sequence of values. In a string, the values are characters; in a list, they can be any type. The values in a list are called elements.
Some basic properties of Python lists are as follows:
Property |
Description |
|---|---|
Ordered |
Elements maintain the order in which they were added and can be accessed by index (starting at 0). |
Mutable |
You can change, add, or remove elements after the list is created. |
Allows Duplicates |
Lists can contain repeated values without restriction. |
Dynamic Size |
The length of a list can grow or shrink as elements are added or removed. |
Heterogeneous |
A single list can hold items of different data types (e.g., integers, strings, objects). |
2.3.1. Basic Operations#
Basic Python list operations include:
Creating lists: Assignment with
[ ]Indexing (
mylist[0],mylist[-1])Slicing (
mylist[1:4],mylist[::-1])Updating (
mylist[0] = 1000)List functions (e.g.,
len())Iterating through lists (for loops, while loops)
Because list is the name of a built-in function, you should avoid using it as a variable name.
2.3.1.1. Creating Lists#
The simplest way to create a list is to enclose the elements in square brackets ([ and ]), and we usually assign the created list to a variable name.
numbers = [ 1, 2, 3, 4, 5 ] ### int list
fruits = [ 'apple', 'banana', 'cherry'] ### string list
The elements of a list don’t have to be the same type. The following list contains a string, a float, an integer, and even another list. This list is therefore heterogeneous:
t = ['spam', 2.0, 5, [10, 20]]
A list within another list is called a nested list, which is important because it lets you represent more complex data structures, such as tables, matrices, and provides a stepping stone to understanding multi-dimensional arrays for advanced data processing.
nums = [ 1, 2, 3, [4, 5 ] ]
A list with no elements is called an empty list and can be created with empty brackets, [].
empty = []
2.3.1.2. Indexing#
In Python, list indexing is the process of accessing individual elements within a list using their position. To access an element of a list, we can use the bracket operator. The index of the first element is 0.
fruits = [ "apple", "banana", "cherry" ]
fruits[0] ### 'apple'
'apple'
List indices work as:
Python uses zero-based indexing, which means the first element has index 0, the second has index 1, and so on.
Any integer expression can be used as an index.
If you try to read or write an element that does not exist, you get an
IndexError.If an index has a negative value, it counts backward from the end of the list, beginning with
-1.
With 0-based indexing and negative indexing, list indexing looks like {numref}(list-indexing) below.
Fig. 2.4 width: 50% name: list-indexing alt: Python List Indexing with positive and negative indices#
Python List Indexing: Accessing elements by position
Although a list can contain another list, the nested list still counts as a single element – so in the following list, there are only 4 elements.
t = ['spam', 2.0, 5, [10, 20]]
print(len(t))
print(t[3])
4
[10, 20]
2.3.1.3. List Slicing#
Things to know about list slicing:
Slicing returns a new list from a portion of the elements of the original list.
The list slicing syntax uses colons (
:) and index parameters inside the square brackets following the list.There are 3 index parameters in the square brackets, and they are all optional:
start,
stop,
step. We mostly use only the start and stop parameters.
The start index is inclusive, but the stop index is exclusive; for example,
nums[1:3]would return a new list containing the elements index[1] and index[2], but not num[3] because it is exclusive.
The following example selects the second and third elements from a list of four letters.
letters = ['a', 'b', 'c', 'd']
letters[1:3] ### ['b', 'c']
['b', 'c']
If you omit the first index, the slice starts at the beginning.
letters[:2] ### ['a', 'b']
['a', 'b']
If you omit the second, the slice goes to the end.
letters[2:] ### ['c', 'd']
['c', 'd']
So if you omit both, the slice is a copy of the whole list.
letters[:] ### ['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']
2.3.1.4. Modification#
Unlike strings, lists are mutable. When the bracket operator appears on the left side of an assignment, it identifies the element of the list that will be assigned.
fruits = [ "apple", "banana", "cherry" ]
fruits[0] = "avocado"
fruits ### fruits[0] is now 'avocado.'
['avocado', 'banana', 'cherry']
2.3.1.5. List operations#
List operators like in, +, and * are used on lists as well.
The in operator works on lists; it checks whether a given element appears in the list.
fruits = [ "apple", "banana", "cherry" ]
'banana' in fruits ### True
True
And 10 is not considered to be an element of t because it is an element of a nested list, not t.
t = ['spam', 2.0, 5, [10, 20]]
10 in t ### False
False
The + operator concatenates lists.
t1 = [1, 2]
t2 = [3, 4]
t1 + t2 ### [1, 2, 3, 4]
[1, 2, 3, 4]
The * operator repeats a list a given number of times.
spams = ['spam'] * 4 ### ['spam', 'spam', 'spam', 'spam']
nums = [ 1, 2, 3 ] * 3 ### [1, 2, 3, 1, 2, 3, 1, 2, 3]
print(spams)
print(nums)
['spam', 'spam', 'spam', 'spam']
[1, 2, 3, 1, 2, 3, 1, 2, 3]
2.3.2. List Functions#
Some Python built-in functions are useful when working with lists:
len()
min()
max()
sum()
sorted()
The len function returns the length of a list. The length of an empty list is 0.
fruits = ['apple', 'banana', 'cherry']
empty = []
print(len(fruits))
print(len(empty))
3
0
No other mathematical operators work with lists, but the built-in function sum adds up the elements. And min and max find the smallest and largest elements.
nums = [1, 2, 3, 4 ,5 ]
print(sum(nums))
print(min(nums)) ### 1
print(max(nums)) ### 5
15
1
5
nums = [ 1, 5, 3, 4, 2 ]
nums_sort = sorted(nums) ### save to nums_sort
print(nums_sort)
print(nums) ### original list unchanged
[1, 2, 3, 4, 5]
[1, 5, 3, 4, 2]
2.3.3. List Methods#
Python also has a set of built-in methods that you can use on list objects. Commonly used list methods include:
append()
extend()
pop()
remove()
sort()
reverse()
index()
count()
insert()
clear()
copy()
Use the dot operator (.) to access the methods a list object has. Just type the name of the list, followed by ., and hit the Tab key:
>>> nums = [ 1, 2, 3 ]
>>> nums.
t1.append( t1.copy() t1.extend( t1.insert( t1.remove( t1.sort(
t1.clear() t1.count( t1.index( t1.pop( t1.reverse()
2.3.3.1. append()#
Python provides methods that operate on lists. For example, append
adds a new element to the end of a list:
letters = ['a', 'b', 'c', 'd']
print(letters) ### ['a', 'b', 'c', 'd']
letters.append("e")
print(letters) ## ['a', 'b', 'c', 'd', 'e']
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd', 'e']
2.3.3.2. extend()#
extend takes a list as an argument (not values) and appends all of the elements to the original list:
two_letters = ['f', 'g']
letters.extend(two_letters)
print(letters) ### ['a', 'b', 'c', 'd', 'e', 'f', 'g']
['a', 'b', 'c', 'd', 'e', 'f', 'g']
2.3.3.3. pop()#
There are two methods that remove elements from a list: pop() and remove()
If you know the index of the element you want, you can use pop:
If you provide an index as the argument, pop() removes and returns the element at that index.
If you do not provide an index as an argument, pop() removes and returns the *last element in the list.
nums = [1, 2, 3, 4 ,5]
nums.pop(1) ### pop(1) removes and returns the 2nd element in list.
nums
nums.pop() ### pop() removes & returns the last element.
nums
[1, 3, 4]
2.3.3.4. remove()#
On the other hand, if you know the element you want to remove (but not the index), you can use remove:
nums = [1, 2, 3, 4 ,5]
nums.remove(3)
nums
[1, 2, 4, 5]
If the element you ask for is not in the list, that’s a ValueError.
%%expect ValueError
nums = [1, 2, 4, 5]
nums.remove(6) ### This will raise a ValueError because 6 is not in the list.
ValueError: list.remove(x): x not in list
2.3.3.5. sort()#
The sort() function:
sorts the original list in-place
modifies the original list permanently
returns
None(save to nothing)
nums = [ 1, 5, 3, 4, 2 ]
nums_sort = nums.sort() ### save to nothing
print(nums_sort)
print(nums) ### original list changed
None
[1, 2, 3, 4, 5]
### Exercise: Append
### 1. Create an empty list, call it nums
### 2. Use a for loop (with range) to append numbers 0 to 9 to nums
### 3. Print the list
### Your code starts here
### Your code ends here
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2.3.4. Lists and strings#
function/method |
syntax |
example |
|---|---|---|
list( ) |
||
split( ) |
||
join( ) |
string.join(iterable) |
A string is a sequence of characters, and a list is a sequence of values, but a list of characters is not the same as a string. To convert from a string to a list of characters, you can use the list function/constructor.
s = 'spam'
t = list(s) ### make string sequence 'spam' into a list of its characters
print(t) ### new list ['s', 'p', 'a', 'm']
print(s)
['s', 'p', 'a', 'm']
spam
Note that s is unchanged. list(s) returns a new list, which we save to t. This “unchanging” behavior is actually a safety feature in Python called immutability-style processing. It ensures that you don’t accidentally destroy your original data when you just want to see it in a different order.
Another example:
s = 'M S&T'
t = list(s)
print(t) ### ['M', ' ', 'S', '&', 'T']
print(s) ### unchanged
['M', ' ', 'S', '&', 'T']
M S&T
The split function breaks a string into individual words. If you want to break a string into words, you can use the split method:
s = 'apple banana cherry'
t = s.split() ### Split the string into a list of words
t ### ['pining', 'for', 'the', 'fjords']
['apple', 'banana', 'cherry']
An optional argument called a delimiter specifies which characters to use as word boundaries. The following example uses a hyphen as a delimiter.
s = 'ex-parrot'
t = s.split('-') ### note that this returns a list
print(t) ### ['ex', 'parrot']
print(s)
['ex', 'parrot']
ex-parrot
If you have a list of strings, you can concatenate them into a single string using join. join is a string method, so you have to invoke it on the delimiter and pass the list as an argument.
delimiter = ' '
t = ['BIT,', 'Kummer', 'College,', 'S&T'] ### list
s = delimiter.join(t) ### string
print(t)
print(s)
print(type(s))
['BIT,', 'Kummer', 'College,', 'S&T']
BIT, Kummer College, S&T
<class 'str'>
In this case the delimiter is a space character, so join puts a space
between words. To join strings without spaces, you can use the empty string, '', as a delimiter.
2.3.4.1. Sorting lists#
sorted( )
Python provides a built-in function, sorted, that sorts the elements of a list. sorted works with any kind of sequence, not just lists. So we can sort the letters in a string like this.
sorted('letters')
['e', 'e', 'l', 'r', 's', 't', 't']
The result is a list. To convert the list to a string, we can use join. With an empty string as the delimiter, the elements of the list are joined with nothing between them.
''.join(sorted('letters')) ### empty delimiter
'eelrstt'
'-'.join(sorted('letters')) ### '-' as delimiter
'e-e-l-r-s-t-t'
Another example of sorted function:
scramble = ['c', 'a', 'b', 'd']
scrambled = sorted(scramble)
print(scrambled)
print(scramble)
['a', 'b', 'c', 'd']
['c', 'a', 'b', 'd']
The original list scramble is unchanged.
Comparing with the sort() method, the original list is changed:
scramble = ['c', 'a', 'b', 'd']
scramble.sort() ### no return
print(scramble)
['a', 'b', 'c', 'd']
2.3.5. Objects and Values#
If we run these assignment statements below to create a and b. We know that a and b both refer to a string object (a literal value), but we don’t know whether they refer to the same string. We can use
equivalency comparison (
==),object identity comparison (
is), andid()to test this relationship.
a = 'banana'
b = 'banana'
print(a == b)
print(a is b)
print(id(a))
print(id(b))
True
True
139883638215728
139883638215728
In the example above, Python only created one string object, and both a and b refer to it.
Now let us create two lists and perform the same tests.
c = [ 1, 2, 3 ]
d = [ 1, 2 ,3 ]
print(c == d)
print(c is d)
print(id(c))
print(id(d))
True
False
139883634835904
139883634808704
We see that c and d are lists and they are two different objects.
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
2.3.6. Aliasing#
If num_a refers to an object and you assign num_b = num_a, then both variables refer to the same object.
The association of a variable with an object is called a reference. When a variable refers to an object, it simply points to that object in memory; it does not store a copy of the object.
If an object has more than one reference, we say the object is aliased. This means multiple variable names point to the same exact object. If that object is mutable, changing it through one reference will affect what the other reference sees. In other words, different names, same object. This is critical in data science—modifying a DataFrame variable that’s aliased to another will affect both. We’ll revisit this with NumPy and Pandas.
nums_a = [ 1, 2, 3, 4, 5 ]
nums_b = nums_a
print(nums_a == nums_b)
print(nums_a is nums_b)
print(id(nums_a))
print(id(nums_b))
True
True
139883634818880
139883634818880
Now let’s update nums_b:
nums_a = [1, 2, 3, 4, 5] # nums_a refers to a list object
nums_b = nums_a # nums_b refers to the *same* list (alias)
nums_a[0] = 1000 # modify through nums_a
print(nums_a) # [1000, 2, 3, 4, 5]
print(nums_b) # [1000, 2, 3, 4, 5] → also changed, because it's the same object
[1000, 2, 3, 4, 5]
[1000, 2, 3, 4, 5]
So we would say that nums_a “sees” this update to 1,000. Although this behavior can be useful, it is error-prone. In general, it is safer to avoid aliasing when you are working with mutable objects.
Aliasing happens when you assign a list to a new variable using the assignment = operator (e.g., list_b = list_a). In this case, both variables point to the same object in memory. If you change one, you change both because they are essentially just two names for the same thing.
However, you can save a list under a different variable name without compromising the original; you just need to create a copy rather than an alias.
For immutable objects like strings, aliasing is less of a problem. In this example:
string_a = 'banana'
string_b = 'banana'
It almost never makes a difference whether a and b refer to the same
string or not.
2.3.7. Making a word list#
With the following code, we can download the file words.txt. You can see the file in the same directory as where you run the following code.
from os.path import basename, exists
def download(url):
filename = basename(url)
if not exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print("Downloaded " + str(local))
return filename
download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/words.txt');
To read a file in Python, we have three common ways:
Read the whole file
Read line by line
Read all lines into a list
2.3.7.1. Read whole file#
To read the whole file, the common syntax is to use the with keyword to create a file object called f as seen below. Note that:
The advantage of using
withis that the file will auomatically close when the block ends so no need to callf.close()to close the file pipeline.The file is represented as one string and you can see the
\ncharacters after each string, which represent new lines.
# with open("../../data/words.txt", "r") as f:
with open("words.txt", "r") as f:
content = f.read(100) ### read 100 characters of the WHOLE file because it's too big to display here
content
'aa\naah\naahed\naahing\naahs\naal\naalii\naaliis\naals\naardvark\naardvarks\naardwolf\naardwolves\naas\naasvogel\na'
With print() function, the \n new line characters are honored. Note this is the content variable, which contains only 50 characters of the whole file (fron ‘aa’ to ‘aar’) with \n shown as new lines.
print(content)
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
aardvarks
aardwolf
aardwolves
aas
aasvogel
a
### Exercise:
# 1. Show the whole content of the file (use print)
# 2. Count how many words are in the file (use len)
# Your code begins here:
# Your code ends here.
The length of the file words.txt is 1016511 characters.
2.3.7.2. Read by append#
For a data file, instead ofreading the entire file, which is not efficient, it is better to read the file once and put the words in a list object. The following code shows how this is done by adding (append) one line at a time to build the list word_list.
Before the loop, the word_list is initialized with an empty list.
Each time through the loop, the append method adds a word to the end.
When the loop is done, there are more than 113,000 words in the list.
word_list = []
for line in open('words.txt'):
word = line.strip()
word_list.append(word)
We now have a list named word_list. Let us take a look at the list (first 10 elements):
print(word_list[:10])
['aa', 'aah', 'aahed', 'aahing', 'aahs', 'aal', 'aalii', 'aaliis', 'aals', 'aardvark']
We know that we have a lot more than 10 words in this list. The number of elements of this list object can be seen using the len() function:
len(word_list)
113783
2.3.7.3. Read by readlines#
The same thing can be achieved by using f.readlines() but readlines() returns the entire list into a list at once.
with open("words.txt", "r") as f:
lines = f.readlines()
lines[:10]
['aa\n',
'aah\n',
'aahed\n',
'aahing\n',
'aahs\n',
'aal\n',
'aalii\n',
'aaliis\n',
'aals\n',
'aardvark\n']
When printing, we see each string element has a new line character \n as shown below:
print(lines[:10])
['aa\n', 'aah\n', 'aahed\n', 'aahing\n', 'aahs\n', 'aal\n', 'aalii\n', 'aaliis\n', 'aals\n', 'aardvark\n']
Or, if you do not want to see the \n character:
for line in lines[:10]:
print(line.strip())
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
Another way to do the same thing is to use read to read the entire file into a string.
string = open('words.txt').read()
len(string)
1016511
string[:55]
'aa\naah\naahed\naahing\naahs\naal\naalii\naaliis\naals\naardvark'
The result is a single string with more than a million characters.
We can use the split method to split it into a list of words.
word_list = string.split()
len(word_list)
113783
2.3.7.4. Search#
Now, to check whether a string appears in the list, we can use the in operator.
For example, 'demotic' is in the list.
'demotic' in word_list
True
But 'contrafibularities' is not.
'contrafibularities' in word_list
False