Dictionaries

4.2. Dictionaries#

This section presents a built-in type called a dictionary. It is one of Python’s best features – and the building block of many efficient and elegant algorithms.

A great example in this part is the use of dictionaries to compute the number of unique words in a book and the number of times each one appears.

4.2.1. A dictionary is a mapping#

The curly braces, {}, represent an empty dictionary
A dictionary is like a list, but more general; meaning that:
- the flexibility in how you access and organize that data through key-value pairs
- keys are unique and can be descriptive/meaningful labels.

In a list, the indices have to be integers; in a dictionary they can be (almost) any type. For example, suppose we make a list of numbers, like this:

>>> num_lst  = [ 1, 2, 3, 4, 5 ]

We can use an integer as an index to get the corresponding word.

>>> num_lst[1]
2

Suppose we want to go in the other direction, and look up a word to get the corresponding integer. We can’t do that with a list, but we can with a dictionary. We’ll start by creating an empty dictionary and assigning it to numbers.

>>> numbers = {}
>>> numbers                         
{}

To add items to the dictionary, we’ll use square brackets.

>>> numbers['zero'] = 0

This assignment adds to the dictionary an item, which represents the association of a key and a value. In this example, the key is the string 'zero' and the value is the integer 0. If we display the dictionary, we see that it contains one item, which contains a key and a value separated by a colon, :.

>>> numbers
{'zero': 0}

We can add more items like this.

>>> numbers['one'] = 1
>>> numbers['two'] = 2
>>> numbers
{'zero': 0, 'one': 1, 'two': 2}

Now the dictionary contains three items.

To look up a key and get the corresponding value, we use the bracket operator.

>>> numbers['two']
2

If the key isn’t in the dictionary, we get a KeyError.

>>> numbers['three']
Traceback (most recent call last):
  File "<python-input-132>", line 1, in <module>
    numbers['three']
    ~~~~~~~^^^^^^^^^
KeyError: 'three'

The len function works on dictionaries; it returns the number of items.

>>> len(numbers)
3

In mathematical language, a dictionary represents a mapping from keys to values, so you can also say that each key “maps to” a value. In this example, each number word maps to the corresponding integer. A dictionary is represented by a box with the word “dict” outside and the items inside. Each item is represented by a key and an arrow pointing to a value. The quotation marks indicate that the keys here are strings, not variable names.

4.2.2. Creating dictionaries#

In the previous section we created an empty dictionary and added items one at a time using the bracket operator. Instead, we could have created the dictionary all at once like this.

numbers = {'zero': 0, 'one': 1, 'two': 2}

Each item consists of a key and a value separated by a colon. The items are separated by commas and enclosed in curly braces.

Another way to create a dictionary is to use the dict function. We can make an empty dictionary like this.

empty = dict()
empty                           ### {}

And we can make a copy of a dictionary like this.

numbers_copy = dict(numbers)
numbers_copy

It is often useful to make a copy before performing operations that modify dictionaries.

4.2.3. The in operator#

The in operator works on dictionaries, too; it tells you whether something appears as a key in the dictionary.

'one' in numbers

The in operator does not check whether something appears as a value.

1 in numbers

To see whether something appears as a value in a dictionary, you can use the method values, which returns a sequence of values, and then use the in operator.

1 in numbers.values()

The items in a Python dictionary are stored in a hash table, which is a way of organizing data that has a remarkable property: the in operator takes about the same amount of time no matter how many items are in the dictionary. That makes it possible to write some remarkably efficient algorithms.

download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/words.txt');

To demonstrate, we’ll compare two algorithms for finding pairs of words where one is the reverse of another – like stressed and desserts. We’ll start by reading the word list.

word_list = open('words.txt').read().split()
len(word_list)

And here’s reverse_word from the previous chapter.

def reverse_word(word):
    return ''.join(reversed(word))

The following function loops through the words in the list. For each one, it reverses the letters and then checks whether the reversed word is in the word list.

def too_slow():
    count = 0
    for word in word_list:
        if reverse_word(word) in word_list:
            count += 1
    return count

This function takes more than a minute to run. The problem is that the in operator checks the words in the list one at a time, starting at the beginning. If it doesn’t find what it’s looking for – which happens most of the time – it has to search all the way to the end.

To measure how long a function takes, we can use %time which is one of Jupyter’s “built-in magic commands”. These commands are not part of the Python language, so they might not work in other development environments.

# %time too_slow()

And the in operator is inside the loop, so it runs once for each word. Since there are more than 100,000 words in the list, and for each one we check more than 100,000 words, the total number of comparisons is the number of words squared – roughly – which is almost 13 billion.

len(word_list)**2

We can make this function much faster with a dictionary. The following loop creates a dictionary that contains the words as keys.

word_dict = {}
for word in word_list:
    word_dict[word] = 1

The values in word_dict are all 1, but they could be anything, because we won’t ever look them up – we will only use this dictionary to check whether a key exists.

Now here’s a version of the previous function that replaces word_list with word_dict.

def much_faster():
    count = 0
    for word in word_dict:
        if reverse_word(word) in word_dict:
            count += 1
    return count

This function takes less than one hundredth of a second, so it’s about 10,000 times faster than the previous version.

In general, the time it takes to find an element in a list is proportional to the length of the list. The time it takes to find a key in a dictionary is almost constant – regardless of the number of items.

%time much_faster()

4.2.4. A collection of counters#

Suppose you are given a string and you want to count how many times each letter appears. A dictionary is a good tool for this job. We’ll start with an empty dictionary.

counter = {}

As we loop through the letters in the string, suppose we see the letter 'a' for the first time. We can add it to the dictionary like this.

counter['a'] = 1

The value 1 indicates that we have seen the letter once. Later, if we see the same letter again, we can increment the counter like this.

counter['a'] += 1

Now the value associated with 'a' is 2, because we’ve seen the letter twice.

counter

The following function uses these features to count the number of times each letter appears in a string.

def value_counts(string):
    counter = {}
    for letter in string:
        if letter not in counter:
            counter[letter] = 1
        else:
            counter[letter] += 1
    return counter

Each time through the loop, if letter is not in the dictionary, we create a new item with key letter and value 1. If letter is already in the dictionary we increment the value associated with letter.

Here’s an example.

counter = value_counts('brontosaurus')
counter

The items in counter show that the letter 'b' appears once, 'r' appears twice, and so on.

4.2.5. Looping and dictionaries#

If you use a dictionary in a for statement, it traverses the keys of the dictionary. To demonstrate, let’s make a dictionary that counts the letters in 'banana'.

counter = value_counts('banana')
counter

The following loop prints the keys, which are the letters.

for key in counter:
    print(key)

To print the values, we can use the values method.

for value in counter.values():
    print(value)

To print the keys and values, we can loop through the keys and look up the corresponding values.

for key in counter:
    value = counter[key]
    print(key, value)

In the next chapter, we’ll see a more concise way to do the same thing.

4.2.6. Lists and dictionaries#

You can put a list in a dictionary as a value. For example, here’s a dictionary that maps from the number 4 to a list of four letters.

d = {4: ['r', 'o', 'u', 's']}
d

But you can’t put a list in a dictionary as a key. Here’s what happens if we try.

%%expect TypeError
letters = list('abcd')
d[letters] = 4

I mentioned earlier that dictionaries use hash tables, and that means that the keys have to be hashable.

A hash is a function that takes a value (of any kind) and returns an integer. Dictionaries use these integers, called hash values, to store and look up keys.

This system only works if a key is immutable, so its hash value is always the same. But if a key is mutable, its hash value could change, and the dictionary would not work. That’s why keys have to be hashable, and why mutable types like lists aren’t.

Since dictionaries are mutable, they can’t be used as keys, either. But they can be used as values.

4.2.7. Accumulating a list#

For many programming tasks, it is useful to loop through one list or dictionary while building another. As an example, we’ll loop through the words in word_dict and make a list of palindromes – that is, words that are spelled the same backward and forward, like “noon” and “rotator”.

In the previous chapter, one of the exercises asked you to write a function that checks whether a word is a palindrome. Here’s a solution that uses reverse_word.

def is_palindrome(word):
    """Check if a word is a palindrome."""
    return reverse_word(word) == word

If we loop through the words in word_dict, we can count the number of palindromes like this.

count = 0

for word in word_dict:
    if is_palindrome(word):
        count +=1
        
count

By now, this pattern is familiar.

Before the loop, count is initialized to 0.
Inside the loop, if word is a palindrome, we increment count.
When the loop ends, count contains the total number of palindromes.

We can use a similar pattern to make a list of palindromes.

palindromes = []

for word in word_dict:
    if is_palindrome(word):
        palindromes.append(word)

palindromes[:10]

Here’s how it works:

Before the loop, palindromes is initialized with an empty list.
Inside the loop, if word is a palindrome, we append it to the end of palindromes.
When the loop ends, palindromes is a list of palindromes.

In this loop, palindromes is used as an accumulator, which is a variable that collects or accumulates data during a computation.

Now suppose we want to select only palindromes with seven or more letters. We can loop through palindromes and make a new list that contains only long palindromes.

long_palindromes = []

for word in palindromes:
    if len(word) >= 7:
        long_palindromes.append(word)
        
long_palindromes

Looping through a list like this, selecting some elements and omitting others, is called filtering.

4.2.8. Glossary#

dictionary: An object that contains key-value pairs, also called items.

item: In a dictionary, another name for a key-value pair.

key: An object that appears in a dictionary as the first part of a key-value pair.

value: An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.

mapping: A relationship in which each element of one set corresponds to an element of another set.

hash table: A collection of key-value pairs organized so that we can look up a key and find its value efficiently.

hashable: Immutable types like integers, floats and strings are hashable. Mutable types like lists and dictionaries are not.

hash function: A function that takes an object and computes an integer that is used to locate a key in a hash table.

accumulator: A variable used in a loop to add up or accumulate a result.

filtering: Looping through a sequence and selecting or omitting elements.

call graph: A diagram that shows every frame created during the execution of a program, with an arrow from each caller to each callee.

memo: A computed value stored to avoid unnecessary future computation.